모델 프로파일링

디바이스에 신경망 모델을 배포할 때 많은 중요한 질문이 발생합니다:

타깃 하드웨어 전체에서 추론 지연 시간은 얼마입니까?
모델이 특정 메모리 예산 내에 맞습니까?
모델이 신경 처리 장치(NPU)를 활용할 수 있습니까?

프로파일 작업은 클라우드에서 물리적 디바이스에서 모델을 실행하고 성능을 분석하여 이러한 질문에 대한 답을 제공합니다.

프로파일 작업도 --qairt_version 플래그를 사용하여 특정 Qualcomm® AI Runtime 버전을 선택할 수 있습니다. 지정하지 않으면 버전 선택 에 따라 버전이 선택됩니다.

이전에 컴파일된 모델 프로파일링

Qualcomm® AI Hub Workbench supports profiling a previously compiled model. In this example, we optimize and profile a model that is previously compiled using a submit_compile_job(). Note how we were able to use the compiled model from compile_job using get_target_model().

import qai_hub as hub

client = hub.Client()

# Profile the previously compiled model
profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

반환 값은 ProfileJob 의 인스턴스입니다. 모든 작업 목록을 보려면 /jobs/ 로 이동하십시오.

PyTorch 모델 프로파일링

이 예제에서는 PyTorch 가 필요하며, 다음과 같이 설치할 수 있습니다.

pip3 install "qai-hub[torch]"

In this example, we optimize and profile a PyTorch model using Qualcomm® AI Hub Workbench.

import torch

import qai_hub as hub

client = hub.Client()


class SimpleNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(5, 2)

    def forward(self, x):
        return self.linear(x)


input_shapes: list[tuple[int, ...]] = [(3, 5)]
torch_model = SimpleNet()

# Trace the model using random inputs
torch_inputs = tuple(torch.randn(shape) for shape in input_shapes)
pt_model = torch.jit.trace(torch_model, torch_inputs)

# Submit compile job
compile_job = client.submit_compile_job(
    model=pt_model,
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(x=input_shapes[0]),
)
assert isinstance(compile_job, hub.CompileJob)

# Submit profile job using results form compile job
profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

For more information on options when uploading, compiling, and submitting a job, see upload_model(), submit_compile_job(), and submit_profile_job().

TorchScript 모델 프로파일링

이미 저장된 추적된 또는 스크립트된 torch 모델이 있는 경우(torch.jit.save 로 저장됨), 이를 직접 제출할 수 있습니다. 우리는 mobilenet_v2.pt 를 예제로 사용할 것입니다. 이전 예제와 마찬가지로, TorchScript 모델은 적절한 타겟으로 컴파일된 후에만 프로파일링할 수 있습니다.

import qai_hub as hub

client = hub.Client()

# Compile previously saved torchscript model
compile_job = client.submit_compile_job(
    model="mobilenet_v2.pt",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
    input_specs=dict(image=(1, 3, 224, 224)),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

ONNX 모델 프로파일링

Qualcomm® AI Hub Workbench also supports ONNX models. ONNX models can be profiled by either compiling them to a target such as TensorFlow Lite, or profiled directly using the ONNX Runtime. We will use mobilenet_v2.onnx as an example of both methods. This example compiles to a TensorFlow Lite target model.

import qai_hub as hub

client = hub.Client()

compile_job = client.submit_compile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(compile_job, hub.CompileJob)

profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

이 예제는 ONNX Runtime 를 사용하여 ONNX 모델을 직접 프로파일링합니다.

import qai_hub as hub

client = hub.Client()

profile_job = client.submit_profile_job(
    model="mobilenet_v2.onnx",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

QNN 컨텍스트 바이너리가 있는 사전 컴파일된 QNN ONNX 모델도 직접 프로파일링할 수 있습니다. 이 예제에서는 Compiling to a Precompiled QNN ONNX 의 컴파일 예제를 계속 진행하여 모델을 프로파일링합니다.

import qai_hub as hub

# Profile the previously compiled model
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Snapdragon 8 Elite QRD"),
)
assert isinstance(profile_job, hub.ProfileJob)

Profiling an ONNX model on a device that does not support floating point

Some devices have HTPs that do not support floating point (FP32 or FP16) models. When running on those devices using the ONNX Runtime, Qualcomm® AI Hub Workbench will not automatically enable the QNN Execution Provider. This behavior allows all models to succeed by running them on the CPU. If you have a fully quantized model, better performance may be achieved on the NPU by enabling the QNN Execution Provider with the option: --onnx_execution_providers=qnn. In this example, we run a fully quantized model on the CPU using default options, and then run the model on the NPU by enabling the QNN Execution Provider. We will use resnet50_w8a8.onnx as an example of both methods.:

import qai_hub as hub

client = hub.Client()

# Profile a quantized model with default options - the model will run on the CPU
profile_job = client.submit_profile_job(
    "resnet50_w8a8.onnx",
    device=hub.Device("Dragonwing RB3 Gen 2 Vision Kit"),
)

assert isinstance(profile_job, hub.ProfileJob)

# Profile a quantized model with the QNN Execution Provider
profile_job = client.submit_profile_job(
    "resnet50_w8a8.onnx",
    device=hub.Device("Dragonwing RB3 Gen 2 Vision Kit"),
    options="--onnx_execution_providers=qnn",
)

assert isinstance(profile_job, hub.ProfileJob)

QNN DLC 프로파일링

Qualcomm® AI Hub Workbench supports QNN DLC format for profiling. In this example, we continue the example from PyTorch 모델을 QNN DLC로 컴파일 and profile the model:

import qai_hub as hub

client = hub.Client()

# Profile the previously compiled model
profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

QNN 컨텍스트 바이너리 프로파일링

Qualcomm® AI Hub Workbench supports QNN context binary format for profiling. In this example, we continue the example from PyTorch 모델을 QNN 컨텍스트 바이너리로 컴파일하기 and profile the model:

import qai_hub as hub

client = hub.Client()

# Profile the previously compiled model
profile_job = client.submit_profile_job(
    model=compile_job.get_target_model(),
    device=hub.Device("Samsung Galaxy S24 (Family)"),
)
assert isinstance(profile_job, hub.ProfileJob)

TensorFlow Lite 모델 프로파일링

Qualcomm® AI Hub Workbench supports profiling a model in the .tflite format as well. We will use the SqueezeNet10 model.

import qai_hub as hub

client = hub.Client()

# Profile TensorFlow Lite model (from file)
profile_job = client.submit_profile_job(
    model="SqueezeNet10.tflite",
    device=hub.Device("Samsung Galaxy S23 (Family)"),
)

여러 디바이스에서 모델 프로파일링

종종 여러 디바이스에서 성능을 모델링하는 것이 중요합니다. 이 예제에서는 최근의 Snapdragon® 8 Gen 1 및 Snapdragon® 8 Gen 2 디바이스에서 프로파일링하여 좋은 테스트 커버리지를 제공합니다. 우리는 TensorFlow Lite 예제에서 SqueezeNet 모델 을 재사용하지만, 이번에는 두 디바이스에서 프로파일링합니다.

import qai_hub as hub

client = hub.Client()

devices = [
    hub.Device("Samsung Galaxy S23 (Family)"),  # Snapdragon 8 Gen 2
    hub.Device("Samsung Galaxy S24 (Family)"),  # Snapdragon 8 Gen 3
]

jobs = client.submit_profile_job(model="SqueezeNet10.tflite", device=devices)

각 디바이스에 대해 별도의 프로파일 작업이 생성됩니다.

프로파일링을 위해 모델 업로드

프로파일 작업을 제출하지 않고 모델(e.g. SqueezeNet10.tflite)을 업로드할 수 있습니다.

import qai_hub as hub

client = hub.Client()
hub_model = client.upload_model("SqueezeNet10.tflite")
print(hub_model)

이제 업로드된 모델의 model_id 를 사용하여 프로파일 작업을 실행할 수 있습니다.:

import qai_hub as hub

client = hub.Client()

# Retrieve model using ID
hub_model = client.get_model("mabc123")

# Submit job
profile_job = client.submit_profile_job(
            model=hub_model,
            device=hub.Device("Samsung Galaxy S23 (Family)"),
)

이전에 업로드된 모델 프로파일링

이전 작업에서 모델을 재사용하여 새 프로파일 작업(e.g., 다른 디바이스에서)을 시작할 수 있습니다. 이렇게 하면 동일한 모델을 여러 번 업로드할 필요가 없습니다.:

import qai_hub as hub

client = hub.Client()

# Get the model from the profile job
profile_job = client.get_job("jabc123")
hub_model = profile_job.model

# Run the model from the job
new_profile_job = client.submit_profile_job(
    model=hub_model,
    device=hub.Device("Samsung Galaxy S22 Ultra 5G"),
)