Qualcomm® AI Hub

Qualcomm® AI Hub Workbench helps to optimize, validate, and deploy machine learning models on-device for vision, audio, speech, and multi-modal use cases.

With Qualcomm® AI Hub Workbench, you can:

  • Convert trained models from frameworks like PyTorch and ONNX for optimized on-device performance on Qualcomm® devices.

  • Profile models on-device to obtain detailed metrics including runtime, load time, and compute unit utilization.

  • Verify numerical correctness by performing on-device inference.

  • Easily deploy models using Qualcomm® AI Engine Direct, TensorFlow Lite, or ONNX Runtime.

Qualcomm® AI Hub Models is our collection of pre-optimized models that we use to help us understand the performance characteristics of a wide range of models running on Qualcomm® devices. We compile, profile, and run inference on Qualcomm® devices using Qualcomm® AI Hub Workbench every couple of weeks. If you’re looking for a model to start with, check them out!

Qualcomm® AI Hub Apps is our collection of sample applications, to help bring Qualcomm® AI Hub Models on-device. To set up your desired runtime to match performance metrics and deploy model assets obtained from Qualcomm® AI Hub Workbench you can reference Qualcomm® AI Hub Apps.

How does it work?

Qualcomm® AI Hub Workbench automatically handles model translation from source framework to device runtime, applying hardware-aware optimizations, and performs physical performance/numerical validation. The system automatically provisions devices in the cloud for on-device profiling and inference. The following image shows the steps taken to analyze a model using Qualcomm® AI Hub.

_images/How-It-Works.png

To use Qualcomm® AI Hub Workbench, you require:

  • A trained model which can be in PyTorch, TorchScript, ONNX or TensorFlow Lite format.

  • Working knowledge of the deployment target. This can be a specific device (e.g., Samsung Galaxy S23 Ultra) or a range of devices.

The following three steps can be used to deploy trained models to Qualcomm® devices:

Step 1: Optimize for on-device execution

Qualcomm® AI Hub Workbench contains a collection of hosted compiler tools that can optimize a trained model for the chosen target platform. Hardware-aware optimizations are then performed to ensure the target hardware is best utilized. The models can be optimized for deployment on either Qualcomm® AI Engine Direct, TensorFlow Lite, or ONNX Runtime. All format conversions are automatically handled.

Step 2: Perform on-device inference

The system can run the compiled model on a physical device to gather metrics such as the mapping of model layers to compute units, inference latency, and peak memory usage. The tools can also run the model using your input data in order to validate numerical correctness. All analyses are performed on real hardware automatically provisioned in the cloud.

Step 3: Deploy

Model results are displayed on Qualcomm® AI Hub Workbench, providing insights to understand the model performance and opportunities for further improvements. The optimized model is available for deployment to a variety of platforms. Check out Qualcomm® AI Hub Apps to walk through this process for a specific use case.

Index