運行推理

在具有專用硬件的移動和邊緣設備上運行任何模型可能與在其參考環境中運行有所不同。例如，雖然您的 PyTorch 實現以 float32 精度運行推理，但目標硬件可能使用 float16 甚至 int8 進行計算。這可能導致數值差異，以及可能的下溢和上溢。這對運行結果是否有不利影響取決於您的模型和數據分佈。

推論工作提供一種方式，讓您上傳輸入資料、在真實硬體上執行推論，並下載輸出結果。透過將這些結果與您的參考實作直接比較，您可以判斷最佳化後的模型是否如預期運作。推論僅支援已最佳化的模型：例如 PyTorch 與 ONNX 等原始格式的模型，必須先透過 submit_compile_job() 或類似方式進行編譯。

推論作業使用 --qairt_version 參數來選擇特定的 Qualcomm® AI Runtime 版本。若未指定，將根據版本選擇自動選擇版本。

使用 TensorFlow Lite 模型運行推理

此示例使用 TensorFlow Lite 模型 SqueezeNet10.tflite 進行推理。

import numpy as np

import qai_hub as hub

client = hub.Client()

sample = np.random.random((1, 224, 224, 3)).astype(np.float32)


inference_job = client.submit_inference_job(
    model="SqueezeNet10.tflite",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    inputs=dict(x=[sample]),
)

assert isinstance(inference_job, hub.InferenceJob)
inference_job.download_output_data()

推理的輸入必須是一個字典，其中鍵是特徵的名稱，值是張量。張量可以是 numpy 數組列表，也可以是單個 numpy 數組（如果是單個數據點的話）。
inference_job 是 InferenceJob 的實例。

可透過在 submit_inference_job() API 中提供一個 Device 物件清單，同時啟動多個推論工作。

使用 QNN DLC 與 Context Binary 執行推論

此範例會將一個 TorchScript 模型（mobilenet_v2.pt）編譯為 QNN DLC 或 QNN context binary 格式，接著使用編譯完成的目標模型在裝置上執行推論。

import numpy as np

import qai_hub as hub

client = hub.Client()

sample = np.random.random((1, 3, 224, 224)).astype(np.float32)

compile_job = client.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    options="--target_runtime qnn_dlc",
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

inference_job = client.submit_inference_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    inputs=dict(image=[sample]),
)
assert isinstance(inference_job, hub.InferenceJob)

import numpy as np

import qai_hub as hub

client = hub.Client()

input_shape = (1, 3, 224, 224)
sample = np.random.random(input_shape).astype(np.float32)

compile_job = client.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    options="--target_runtime qnn_context_binary",
    input_specs=dict(image=input_shape),
)
assert isinstance(compile_job, hub.CompileJob)

inference_job = client.submit_inference_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
    inputs=dict(image=[sample]),
)
assert isinstance(inference_job, hub.InferenceJob)

使用推理工作在設備上驗證模型準確性

此範例示範如何在裝置上驗證 QNN DLC 模型的數值正確性

重用來自效能分析範例的模型（mobilenet_v2.pt）

import torch

import qai_hub as hub

client = hub.Client()

device_s23 = hub.Device(name="Samsung Galaxy S23 (Family)")
compile_job = client.submit_compile_job(
    model="mobilenet_v2.pt",
    device=device_s23,
    input_specs={"x": (1, 3, 224, 224)},
    options="--target_runtime qnn_dlc",
)

assert isinstance(compile_job, hub.CompileJob)
on_device_model = compile_job.get_target_model()

我們可以使用這個優化的 .so 模型並在特定設備上使用輸入數據運行推理。此示例中使用的輸入圖像可以下載 - input_image1.jpg。

https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/input_image1.jpg

import numpy as np
from PIL import Image

# Convert the image to numpy array of shape [1, 3, 224, 224]
image = Image.open("input_image1.jpg").resize((224, 224))
img_array = np.array(image, dtype=np.float32)

# Ensure correct layout (NCHW) and re-scale
input_array = np.expand_dims(np.transpose(img_array / 255.0, (2, 0, 1)), axis=0)

# Run inference using the on-device model on the input image
inference_job = client.submit_inference_job(
    model=on_device_model,
    device=device_s23,
    inputs=dict(x=[input_array]),
)
assert isinstance(inference_job, hub.InferenceJob)

我們可以使用這個設備上的原始輸出生成類別預測並將其與參考實現進行比較。您需要 imagenet classes - imagenet_classes.txt。

# Get the on-device output
on_device_output: dict[str, list[np.ndarray]] = inference_job.download_output_data()  # type: ignore

# Load the torch model and perform inference
torch_model = torch.jit.load("mobilenet_v2.pt")
torch_model.eval()

# Calculate probabilities for torch model
torch_input = torch.from_numpy(input_array)
torch_output = torch_model(torch_input)
torch_probabilities = torch.nn.functional.softmax(torch_output[0], dim=0)

# Calculate probabilities for the on-device output
output_name = list(on_device_output.keys())[0]
out = on_device_output[output_name][0]
on_device_probabilities = np.exp(out) / np.sum(np.exp(out), axis=1)

# Read the class labels for imagenet
with open("imagenet_classes.txt") as f:
    categories = [s.strip() for s in f.readlines()]

# Print top five predictions for the on-device model
print("Top-5 On-Device predictions:")
top5_classes = np.argsort(on_device_probabilities[0], axis=0)[-5:]
for c in reversed(top5_classes):
    print(f"{c} {categories[c]:20s} {on_device_probabilities[0][c]:>6.1%}")

# Print top five prediction for torch model
print("Top-5 PyTorch predictions:")
top5_prob, top5_catid = torch.topk(torch_probabilities, 5)
for i in range(top5_prob.size(0)):
    print(
        f"{top5_catid[i]:4d} {categories[top5_catid[i]]:20s} {top5_prob[i].item():>6.1%}"
    )

上述代碼生成的結果如下：

Top-5 On-Device predictions:
cup                   71.3%
coffee mug            16.4%
espresso               7.8%
soup bowl              1.3%
mixing bowl            1.2%

Top-5 PyTorch predictions:
cup                   71.4%
coffee mug            16.1%
espresso               8.0%
soup bowl              1.4%
mixing bowl            1.2%

設備上的結果幾乎與參考實現相同。這告訴我們模型沒有遭受正確性損失，並使我們有信心它在部署後會按預期運行。

為了增強這種信心，請考慮使用多個圖像並使用定量摘要，例如測量 KL 散度或比較準確性（如果標籤已知）。這也使得在目標設備上更容易驗證。

使用先前上傳的資料集和模型來進行推理

類似於模型，Qualcomm® AI Hub Workbench 提供一個 API，讓使用者能夠上傳可重複使用的資料。

import numpy as np

import qai_hub as hub

client = hub.Client()

data = dict(
    x=[
        np.random.random((1, 224, 224, 3)).astype(np.float32),
        np.random.random((1, 224, 224, 3)).astype(np.float32),
    ]
)
hub_dataset = client.upload_dataset(data)

您現在可以使用上傳的數據集運行推理工作。此示例使用 SqueezeNet10.tflite。

# Submit job
job = client.submit_inference_job(
    model="SqueezeNet10.tflite",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    inputs=hub_dataset,
)