The process of the model from training to conversion to running on the development board is shown below:

Model Training: It is the process of getting a usable model by using public deep learning frameworks such as TensorFlow, PyTorch, Caffe, etc. The trained model will serve as the input for the model conversion stage. The toolchain does not provide training-related libraries or tools. For the detailed supported public learning frameworks, please refer to the instructions in the Floating-point Model Preparation.
Model Conversion: Taking the floating-point model obtained from model training as input, this stage transforms floating-point models into hybrid heterogeneous models that can run efficiently on the Horizon computing platform through important steps such as model structure optimization and model calibration quantification. To verify the usability of the heterogeneous model, the toolchain also provides you with performance analysis, accuracy analysis, and a rich set of exception debugging tools and recommendations. For more information, please refer to Model Quantization and Compilation section.
Embedded Application development: The toolchain supports application development in both X86 emulation environment and real embedded environment. In case you are not convenient to use the development board, you can debug the program and verify the calculation results in the emulation environment. In order to reduce the cost of simulation verification, the toolchain provides the exact same simulation library interface as the embedded interface, only with different compilation configurations. For more information, please refer to Embedded Application Development section.
Model conversion is the process of converting the original floating-point model to a Horizon hybrid heterogeneous model.
The original floating-point model (also referred to as a floating-point model in sections of the document) is an available model trained by a DL framework such as TensorFlow/PyTorch, with computation precision of float32; the hybrid heterogeneous model is a model format suitable for running on the Horizon computing platform.
This section will repeatedly use the two model terms. To avoid ambiguity, please understand the concept before reading the following section.
The complete the model development process with the Horizon toolchain involves five important stages: Floating-point Model Preparation, Model Checking, Model Conversion, Performance Evaluation, and Accuracy Evaluation, as shown in the figure below.
The Floating-point model, as the output of the Floating-point Model Preparation stage, will serve as the input of the model conversion tool. The floating-point model is usually trained on basis of some open source deep learning frameworks. Note that the model must be exported to a format supported by Horizon Robotics. For more information, please refer to the Floating-point Model Preparation.
The Model Checking stage is used to ensure that the model is computing platform compliant. Horizon Robotics provides specified tools to complete model validation, and for non-compliance, such tools will explicitly give you the specific operator information for the non-compliance, so that you can easily adjust the model with the description of the operator constraints. For more information, please refer to the Model Checking.
The Model Conversion stage converts the floating-point model to the hybrid heterogeneous model supported by Horizon Robotics. To run models efficiently on the Horizon computing platform, critical steps such as model optimization, quantization, and compilation are completed by Horizon's model conversion tools. Horizon's model quantization method has undergone long-term technological and production validation, and can guarantee an accuracy loss of less than 1% on most typical deep learning models. For more details about model conversion please refer to the Prepare Calibration Data and Model Quantization and Compilation.
The Performance Evaluation stage contains a series of tools to evaluate the model performance. Before deploying your application, you can use these tools to verify that the model performance meets application requirements. For some cases where the performance is not as good as expected, you can optimize the models based on Horizon's model optimization advices. For more information, please refer to the Model Performance Analysis.
The Accuracy Evaluation stage contains a series of tools to evaluate the accuracy of the model. In most cases, Horizon's converted-models can maintain almost the same accuracy as the original floating-point model. Before application deployment, you can use these tools to verify that the accuracy of the model meets the expectations. For some cases where the accuracy is not as good as expected, you can optimize the models based on Horizon's model optimization advices. For more information about evaluation, please refer to the Model Accuracy Analysis.
So how to convert the floating-point models trained by using the opensource ML frameworks (such as Caffe, TensorFlow, PyTorch, etc.) to the Horizon hardware supported fixed-point models?
In most cases, the threshold values and weights of the floating-point models obtained from either the opensource ML frameworks or trained by yourself are floating-point numbers (float32) and each number occupies 4 bytes.
However, by converting the floating-point numbers to fixed-point numbers (int8), each number occupies only 1 byte, thus the computation operations in the embedded runtime can be dramatically reduced.
Therefore, it brings significant performance boost by converting the floating-point models to fixed-point models with no loss or very small loss.
Typically, model conversion can be divided into the following steps:
Check if there are unsupported OPs in the models to be converted.
Prepare 20~100 images for calibration use at the conversion stage.
Convert the floating-point models to fixed-point models using the floating-point conversion tools.
Evaluate the performance and accuracy of the converted models to ensure that there isn't huge difference in model accuracy before and after the conversion.
Run models in simulator/dev board to validate model performance and accuracy.
If you need to do this process in the sample folder, you need to execute the 00_init.sh script in the folder first to get the corresponding original model and dataset.
Before converting the floating-point models into the fixed-point models, we should check if there are Horizon hardwares unsupported OPs in the floating-point models using the hb_compile tool.
If yes, the tool will report the unsupported OP(s). The usage of the hb_compile tool for checking the model can be found in Model Checking section.
More information about Horizon hardware supported OPs, refer to Toolchain Operator Support Constraint List-ONNX Operator Support List.
If you need to do this process in the sample folder, you need to execute the 00_init.sh script in the folder first to get the corresponding original model and dataset.
When converting the floating-point models, you need to prepare 20~100 images for calibration use at the calibration stage.
Input image formats may vary by input type and layout. In this stage, because both original (JPG, etc.) and the processed images are valid, you can either feed the calibration images used in the model training or feed your own processed images.
The format of the calibration dataset should be npy .
We recommend you preprocessing the calibration images on your own, you need to complete the operations such as image channel (BGR/RGB), data layout (NHWC/NCHW), and image resizing/padding (Resize&Padding). The tool will then feed the images to the calibration stage after loading them as npy format.
Taking MobileNet as an example, the required transformer operations are as follows:
When you confirm that the floating-point model can be successfully converted, you can then convert the floating-point model to a Horizon hardware supported fixed-point model by using the hb_compile tool.
This process requires you to pass a configuration file(*.yaml) containing conversion requirements. For specific configuration file settings and the insturctions of each parameter, refer to the descriptions in sections Specific Parameter Information and Configuration File Template .
When the model conversion process ends, it also prints the level of similarity between the floating-point model and fixed-point model to the log, you can therefore judge the similarity before and after conversion according to the Quantized Cosine field.
If the value of Quantized Cosine is very close to 1, so the performance of the fixed-point model should be very close to that of the floating-point model before the conversion.
The CosineSimilarity in the log refers to the very first image in the calibration images, it cannot fully represent the model accuracy before and after the conversion.
After a successful model conversion, the model outputs and information files for each phase will be generated in the generated folder (model_output by default), where the model output files will be used in subsequent phases.
03_classification/01_mobilenetv1/03_build.sh script to experience the hb_compile tool doing the model quantized compilation.The accuracy of the fixed-point model generated by the floating-point conversion must be evaluated.
You should have good understanding of the input/output structures of the model. You should also be able to accurately preprocess the input images of the model, postprocess the model outputs, and write the model execution scripts on your own.
You can refer to the sample code in 04_inference.sh of the sample in Horizon model convert sample package.
The code logic is as follows:
Once the script used, you can verify its own accuracy by inputting a single image.
For example, the input to this script is a picture of a zebra, preprocessing the image data from rgb to the data type configured by input_type_rt(for informaiton about intermediate types, refer to the Model Conversion Interpretation ).
Then, infer the model by passing the above image data by using the HB_HBIRRuntime command, post-processing after inference, and finally print out 5 of its most likely types.
The output of the script is shown as follows with the most possible class being label: 340:
label uses the ImageNet label classes, where the corresponding class of 340 is zebra, so the inference result is correct.
It's insufficient to determine the model accuracy by single image inference, so you still need to use scripts to evalute the model accuracy after the conversion.
To do so, you need some coding work to enable the model to loop the image inference and compare the inference results with standard results to get model accuracy results.
In model accuracy evaluations, images must be pre-processed and the model output must be post-processed, so here we provide a Python script as a sample.
The logic of this script is the same as that of single image inference, yet it must run on the entire dataset.
The script can evaluate the model output results and generate evaluation results.
Because it takes a long time to run the script, you can set the number of threads to run the evaluation by specifying the PARALLEL_PROCESS_NUM environment variable.
After the execution of this script, you can get the accuracy of the converted fixed-point model from the output of the script.
We currently support the following calibration methods:
Default is a strategy that automatically searches the calibrated quantization parameters to obtain a relatively good combination.
Mix is a search strategy that integrates multiple calibration methods, which automatically identifies quantization-sensitive nodes, selects the best from a group of calibration methods at node granularity, and finally build a hybrid calibration method absorbing the advantages of multiple calibration methods.
KL learns from the Solution proposed by TensorRT, uses the KL entropy value to traverse the data distribution of each quantized layer, and determines threshold value by searching for the lowest KL entropy value.
As this method can cause more data saturation and smaller data quantization granularity, it more suitable than max for those neural network models with more concentrated data distribution.
Max refers to a calibration method that automatically selects the max value in quantized layer as the threshold.
This method can cause oversized quantization granularity; however, it also causes less saturated points than the KL method, which makes it suitable for those neural network models with more discrete data distribution.
If the model performance is the only concern without precision requirements, you can try the skip calibration method, which uses max + internally generated random calibration data for calibration and does not require you to prepare calibration data, so it is more suitable for the first attempts to validate the model structure.
As the skip method uses max + internally generated random calibration data for calibration, the model obtained cannot be used for accuracy verification.
For more information about the operators and corresponding constraints currently supported by Horizon Algorithm Toolchain, please refer to Toolchain Operator Support Constraint List.