運行推理
在具有專用硬件的移動和邊緣設備上運行任何模型可能與在其參考環境中運行有所不同。例如,雖然您的 PyTorch 實現以 float32 精度運行推理,但目標硬件可能使用 float16 甚至 int8 進行計算。這可能導致數值差異,以及可能的下溢和上溢。這對運行結果是否有不利影響取決於您的模型和數據分佈。
Inference jobs provide you with a way to upload input data, run inference on
real hardware, and download the output results. By comparing these results
directly to your reference implementation, you can determine whether or not
the optimized model works as expected. Inference is only supported for optimized
models: models in source formats such as PyTorch and ONNX must be compiled
with submit_compile_job() or similar.
推論作業使用 --qairt_version 參數來選擇特定的 Qualcomm® AI Runtime 版本。若未指定,將根據 版本選擇 自動選擇版本。
使用 TensorFlow Lite 模型運行推理
此示例使用 TensorFlow Lite 模型 SqueezeNet10.tflite 進行推理。
import numpy as np
import qai_hub as hub
client = hub.Client()
sample = np.random.random((1, 224, 224, 3)).astype(np.float32)
inference_job = client.submit_inference_job(
model="SqueezeNet10.tflite",
device=hub.Device("Samsung Galaxy S23 (Family)"),
inputs=dict(x=[sample]),
)
assert isinstance(inference_job, hub.InferenceJob)
inference_job.download_output_data()
推理的輸入必須是一個字典,其中鍵是特徵的名稱,值是張量。張量可以是 numpy 數組列表,也可以是單個 numpy 數組(如果是單個數據點的話)。
inference_job是InferenceJob的實例。
Multiple inference jobs can be launched at the same time by providing a list of
Device objects to the
submit_inference_job() API.
Running Inference with QNN DLC and Context Binaries
This example compiles a TorchScript model (mobilenet_v2.pt) to QNN DLC or QNN context binary format. Then inference is run on device with the compiled target model.
import numpy as np
import qai_hub as hub
client = hub.Client()
sample = np.random.random((1, 3, 224, 224)).astype(np.float32)
compile_job = client.submit_compile_job(
model="mobilenet_v2.pt",
device=hub.Device("Samsung Galaxy S23 (Family)"),
options="--target_runtime qnn_dlc",
input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)
inference_job = client.submit_inference_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S23 (Family)"),
inputs=dict(image=[sample]),
)
assert isinstance(inference_job, hub.InferenceJob)
import numpy as np
import qai_hub as hub
client = hub.Client()
input_shape = (1, 3, 224, 224)
sample = np.random.random(input_shape).astype(np.float32)
compile_job = client.submit_compile_job(
model="mobilenet_v2.pt",
device=hub.Device("Samsung Galaxy S24 (Family)"),
options="--target_runtime qnn_context_binary",
input_specs=dict(image=input_shape),
)
assert isinstance(compile_job, hub.CompileJob)
inference_job = client.submit_inference_job(
model=compile_job.get_target_model(),
device=hub.Device("Samsung Galaxy S24 (Family)"),
inputs=dict(image=[sample]),
)
assert isinstance(inference_job, hub.InferenceJob)
使用推理工作在設備上驗證模型準確性
This example demonstrates how to validate the numerics of a QNN DLC model on-device.
重用 the profiling example (mobilenet_v2.pt)
import torch
import qai_hub as hub
client = hub.Client()
device_s23 = hub.Device(name="Samsung Galaxy S23 (Family)")
compile_job = client.submit_compile_job(
model="mobilenet_v2.pt",
device=device_s23,
input_specs={"x": (1, 3, 224, 224)},
options="--target_runtime qnn_dlc",
)
assert isinstance(compile_job, hub.CompileJob)
on_device_model = compile_job.get_target_model()
我們可以使用這個優化的 .so 模型並在特定設備上使用輸入數據運行推理。此示例中使用的輸入圖像可以下載 - input_image1.jpg。
import numpy as np
from PIL import Image
# Convert the image to numpy array of shape [1, 3, 224, 224]
image = Image.open("input_image1.jpg").resize((224, 224))
img_array = np.array(image, dtype=np.float32)
# Ensure correct layout (NCHW) and re-scale
input_array = np.expand_dims(np.transpose(img_array / 255.0, (2, 0, 1)), axis=0)
# Run inference using the on-device model on the input image
inference_job = client.submit_inference_job(
model=on_device_model,
device=device_s23,
inputs=dict(x=[input_array]),
)
assert isinstance(inference_job, hub.InferenceJob)
我們可以使用這個設備上的原始輸出生成類別預測並將其與參考實現進行比較。您需要 imagenet classes - imagenet_classes.txt。
# Get the on-device output
on_device_output: dict[str, list[np.ndarray]] = inference_job.download_output_data() # type: ignore
# Load the torch model and perform inference
torch_model = torch.jit.load("mobilenet_v2.pt")
torch_model.eval()
# Calculate probabilities for torch model
torch_input = torch.from_numpy(input_array)
torch_output = torch_model(torch_input)
torch_probabilities = torch.nn.functional.softmax(torch_output[0], dim=0)
# Calculate probabilities for the on-device output
output_name = list(on_device_output.keys())[0]
out = on_device_output[output_name][0]
on_device_probabilities = np.exp(out) / np.sum(np.exp(out), axis=1)
# Read the class labels for imagenet
with open("imagenet_classes.txt") as f:
categories = [s.strip() for s in f.readlines()]
# Print top five predictions for the on-device model
print("Top-5 On-Device predictions:")
top5_classes = np.argsort(on_device_probabilities[0], axis=0)[-5:]
for c in reversed(top5_classes):
print(f"{c} {categories[c]:20s} {on_device_probabilities[0][c]:>6.1%}")
# Print top five prediction for torch model
print("Top-5 PyTorch predictions:")
top5_prob, top5_catid = torch.topk(torch_probabilities, 5)
for i in range(top5_prob.size(0)):
print(
f"{top5_catid[i]:4d} {categories[top5_catid[i]]:20s} {top5_prob[i].item():>6.1%}"
)
上述代碼生成的結果如下:
Top-5 On-Device predictions:
968 cup 71.3%
504 coffee mug 16.4%
967 espresso 7.8%
809 soup bowl 1.3%
659 mixing bowl 1.2%
Top-5 PyTorch predictions:
968 cup 71.4%
504 coffee mug 16.1%
967 espresso 8.0%
809 soup bowl 1.4%
659 mixing bowl 1.2%
設備上的結果幾乎與參考實現相同。這告訴我們模型沒有遭受正確性損失,並使我們有信心它在部署後會按預期運行。
為了增強這種信心,請考慮使用多個圖像並使用定量摘要,例如測量 KL 散度或比較準確性(如果標籤已知)。這也使得在目標設備上更容易驗證。
使用先前上傳的資料集和模型來進行推理
類似於模型,Qualcomm® AI Hub Workbench 提供一個 API,讓使用者能夠上傳可重複使用的資料。
import numpy as np
import qai_hub as hub
client = hub.Client()
data = dict(
x=[
np.random.random((1, 224, 224, 3)).astype(np.float32),
np.random.random((1, 224, 224, 3)).astype(np.float32),
]
)
hub_dataset = client.upload_dataset(data)
您現在可以使用上傳的數據集運行推理工作。此示例使用 SqueezeNet10.tflite。
# Submit job
job = client.submit_inference_job(
model="SqueezeNet10.tflite",
device=hub.Device("Samsung Galaxy S23 (Family)"),
inputs=hub_dataset,
)