qai_hub.submit_compile_and_link_jobs

submit_compile_and_link_jobs(models, device, name=None, input_specs=None, graph_names=None, compile_options='', link_options='', retry=True, embed_in_onnx=False, project=None)

Compiles and links multiple models or model(s) with multiple input_specs variants into a single weight-shared (multi-graph) QNN context binary. If embed_in_onnx is set, the QNN context binary is embedded inside an ONNX model. To specify multiple input_specs variants, the model needs to be ONNX with dynamic shapes or TorchScript (.pt).

Parameters:

models (Union[Model, TopLevelTracedModule, ScriptModule, ExportedProgram, ModelProto, bytes, str, Path, None, list[Union[Model, TopLevelTracedModule, ScriptModule, ExportedProgram, ModelProto, bytes, str, Path, None]]]) – A list of models. To represent multiple variants for a model, its entries in the list are repeated.
device (Device | list[Device]) – Device or list of devices. Results are per-device.
input_specs (Union[None, Mapping[str, tuple[int, ...] | tuple[tuple[int, ...], str]], list[Mapping[str, tuple[int, ...] | tuple[tuple[int, ...], str]] | None]]) – None | InputSpecs | list[InputSpecs | None]. Each InputSpecs in list corresponds to the model at the same index in models. Mandatory for TorchScript models. Please refer to the following example for usage.
graph_names (Optional[list[str]]) – list[str] | None. Graph names are used as keys to access model variants from the generated QNN Context Binary. If a list of models is provided, then graph_names are mandatory. All graph names must be unique. Each graph name in list corresponds to model at the same index in models.
name (Optional[str]) – Optional name for all Compile and Link jobs. Job names need not be unique.
compile_options (str | list[str]) – Cli-like flag options for the compile job. See Compile Options. --target_runtime qnn_dlc is automatically appended (the only supported target_runtime for this API). Can be a single string (broadcasted to all input_specs) or a List[str] corresponding to the model at same index in models. Do not specify graph name in compile_options, use graph_names argument instead.
link_options (str) – CLI-like flags for link job. See Link Options. It is a single string (broadcasted to all devices).
retry (bool) – If job creation fails due to rate-limiting, keep retrying periodically until creation succeeds.
embed_in_onnx (bool) – Whether or not to embed the linked QNN Context Binary model inside an ONNX model.

Return type:

Returns:

If a single device – (list[CompileJob], LinkJob | None)
If multiple devices – list[tuple[list[CompileJob], LinkJob | None]]
LinkJob is None if any compile job failed.

Constraints / Validation

If multiple variants are provided for a model, that model needs to be ONNX with dynamic shapes or TorchScript (.pt).
InputSpecs must be provided for TorchScript models.
Number of models, input_specs variants, compile_options variants, and graph_names must match.
All graph names must be unique.
Do not specify graph name in compile_options, use graph_names argument instead.
--target_runtime flag in compile_options is auto-set to --target_runtime qnn_dlc.

Examples

Submit two models with multiple I/O spec variants for compilation and linking:

import torch
import numpy as np
import qai_hub as hub

client = hub.Client()
pt_model1 = torch.jit.load("encoder.pt")
pt_model2 = torch.jit.load("decoder.pt")

input_specs1 = [
    {"x": ((1, 3, 224, 224), "float32")},
    {"x": ((1, 3, 192, 192), "float32")},
]
# Compile options are repeated to match the number of model input_specs variants
# Each input_spec can have its own compile options
compile_options1 = ["--force_channel_last_input x --quantize_io"] * 2

input_specs2 = [
    {"x": ((1, 3, 224, 224), "float32")},
    {"x": ((1, 3, 192, 192), "float32")},
    {"x": ((1, 3, 160, 160), "float32")},
]
compile_options2 = ["--qnn_options default_graph_htp_precision=FLOAT16"] * 3

# Model entries in list are repeated to match their respective number of input_specs variants
models = [pt_model1, pt_model1, pt_model2, pt_model2, pt_model2]

jobs = client.submit_compile_and_link_jobs(
    models,
    device=hub.Device("Samsung Galaxy S23"),
    name="encoder + decoder",
    input_specs=[*input_specs1, *input_specs2],
    graph_names=[
        "encoder_224", "encoder_192",
        "decoder_224", "decoder_192", "decoder_160",
    ],
    compile_options=[*compile_options1, *compile_options2],
    link_options="--qnn_options default_graph_htp_optimizations=O=3",
)