The conversion of the floating-point model to the Horizon Hybrid Heterogeneous Model will be completed in the Convert Model phase, after which you will get a model that can run on the Horizon computing platform. Before performing the conversion, make sure you have successfully passed the model check as described in the Check the Model section.
During the conversion, some important procedures such as model optimization and calibration quantization must prepare the data in line with model pre-processing requirements. You can refer to Prepare Calibration Data section to prepare the calibration data in advance.
The model conversion process is performed using the hb_compile tool, please refer to section Model Quantized Compilation for the usage of the tool and the related specific configuration and parameters.
Model conversion is completed from a floating-point model to a hybrid heterogeneous model supported by Horizon's computing platform. To make this heterogeneous model run quickly and efficiently on the embedded end, model conversion focuses mainly on two phases: input data processing and model optimization compilation, and this section will focus on these two problems in turn.
In terms of Input data processing, Horizon's edge computing platform can provide hardware-level solutions for specific types of input channels, but the output of these solutions may not comply with the input requirements of your models.
For example, the video processing sub-systems for video channels have the abilities to crop and scale images or optimize the image quality. The output of these sub-systems are mostly in the YUV420 format, however, the algorithm models are often trained based on commonly-used image formats such as bgr/rgb.
To solve this problem, Horizon provides 2 kinds of input descriptions for each converted model: The one is used for the original floating-point model input (input_type_train and input_layout_train); while the other one is used for the input data ( input_type_rt ) of the edge platform that you are going to use.
For the frequently-used image data pre-processing, such as mean and scale, the edge platform data formats such as yuv420 are no longer suitable for such operations, therefore, we integrate these common image pre-processing into the model. After the above two processes, the input part of the converted heterogeneous model will be shown as follows
There are only 2 types of data layouts in the above diagram: NCHW and NHWC.
Wherein, N denotes quantity, C denotes channel, H denotes height and W denotes width.
The two different layouts reflect different memory access characteristics.
The NHWC layout are more often used by the TensorFlow models; while the NCHW layout is used by the Caffe models.
Although Horizon's edge platform doesn't restrict the data layout, there is still a requirement: the input_layout_train must be consistent with the data layout of the original floating-point model, as specifying correct data layout is the basis for smooth data parsing.
Model Optimization and Compilation: It includes several important steps, including model parsing, model optimization, model calibration and quantification, and model compilation, and its internal working process is shown in the figure below.
Model Parse Stage: It completes the conversion from Caffe floating-point model to ONNX floating-point model. This stage will name the operator (with a unique name) for the unnamed node/tensor, producing an original_float_model.onnx, and the computing accuracy of this ONNX model is float32.
Model Optimization Stage: It implements some operator optimization strategies for the model that are applicable to the Horizon platform, such as BN fusion to Conv, etc. The output of this phase is an optimized_float_model.onnx. The computational accuracy of this ONNX model is still float32, which will not affect the computational results of the model after optimization.
Model Calibration Stage: It uses the calibration data you provide to calculate the necessary quantization parameters, and the quantization parameters corresponding to each node calculated from the calibration data will be saved in the calibration node. The output of this phase is a calibrated_model.onnx. After that, the model pre-quantization process is performed and the output of this process is ptq_model.onnx.
Model Quantization Stage: It uses Horizon's model compiler, which uses pre-quantized generated model(ptq_model.onnx), to perform model quantization according to your pre-processing configuration (including the color conversion between input_type_rt to input_type_train, the handling of mean/scale, etc). The output of this phase is a quantized_model.bc. The loss of accuracy due to model quantization can be evaluated using this model.
Please note that if input_type_rt is nv12, the input layout of quantized.bc is NHWC.
This section will introduce the interpretation of successful model conversion status and the analysis of unsuccessful conversions in turn.
To confirm the success of the model conversion, you need to check the compile status information, the similarity information and the working_dir output. For the compile status information, after a successful conversion, the model's dependencies and parameters will be output on the console.
Similarity information will be printed in the console output after compile, which takes the following form:
As shown above:
The Node and NodeType represents node name and type.
The ON represents the node executed device, include BPU, CPU and JIT(first generated BPU instructions by the CPU then BPU performs the computation).
The Threshold represents to the calibration threshold at each layer, which is used to provide feedback to Horizon technical support in abnormal states and is not of concern in normal conditions.
The Calibrated Cosine represents the cosine similarity between the outputs of the optimized model (optimized_float_model.onnx) and the calibrated model (calibrated_model.onnx) in the nodes indicated by the Node.
The Quantized Cosine represents the cosine similarity between the outputs of the optimized model (optimized_float_model.onnx) and the quantized model (quantized_model.bc) generated after model quantization in the nodes indicated by the Node.
The Output Data type represents the node output data type, range with ['si8', 'si16', 'si32', 'si64', 'ui8', 'ui16', 'ui32', 'ui64', 'f32'].
Note that the cosine similarity field only serves as a reference to indicate the stability of the quantized data. It cannot directly tell the model accuracy loss. In general, there is a significant loss of accuracy if the similarity of the output nodes is below 0.8. Of course, since there is no absolute direct correlation with accuracy, a fully accurate accuracy situation should be described in Model Accuracy Analysis section.
The conversion output is stored in the path specified by the conversion configuration parameter working_dir.
You can get the following files in this directory (* part is what you specify by the conversion configuration parameter output_model_file_prefix).
*_original_float_model.onnx
*_optimized_float_model.onnx
*_calibrated_model.onnx
*_ptq_model.onnx
*_quantized_model.bc
*.hbm
*_advice.json
*_quant_info.json
*_node_info.csv
*.html
*.json
The * indicates the model file prefix you specify via the output_model_file_prefix parameter.
The Interpret Conversion Output section explains the function of each model output. However, before running on the board, we strongly recommend you to proceed the procedures as described in the sections Model Performance Analysis and Model Accuracy Analysis , to avoid extending the model conversion problem to the subsequent embedded terminal.
If any of the above-mentioned 3 outputs of verifying the success of the model conversion is missing, there must be something wrong with the conversion.
In such cases, the compile tool will output error messages to your console in case of errors. For example, if we do not configure the prototxt and caffe_model parameters during the Caffe model conversion, the tool gives the following message:
The outputs of the successful conversion of the model mentioned above. This section explains the use of each output.
The output process of *_original_float_model.onnx can be found in Model Conversion Interpretation. The computing accuracy of this model is the same as the original floating-point model. In general, you don't need to use this model. In case of errors in the conversion results, it would be helpful to provide this model to Horizon's technical support to help you solve the problem quickly.
The output process of *_optimized_float_model.onnx can be found in Model Conversion Interpretation. This model undergoes some operator-level optimization operations, commonly known as operator fusion. You can visually compare it with the original_float model, and clearly find out some operator structural changes, which will not affect the computational accuracy of the model. In general, you do not need to use this model. In case of errors in the conversion results, it would be helpful to provide this model to Horizon's technical support to help you solve the problem quickly.
The output process of *_calibrated_model.onnx can be found in Model Conversion Interpretation. This model is an intermediate product obtained by the model transformation tool chain by taking the floating-point model after structural optimization, calculating the quantization parameters corresponding to each node from the calibration data and saving them in the calibration node.
The output process of *_ptq_model.onnx can be found in Model Conversion Interpretation. This model is the product of pre-quantization of the model obtained from calibration by the model conversion toolchain.
The output process of the *_quantized_model.bc can be found in Model Conversion Interpretation. This model has completed the calibration and quantization process, and the quantized accuracy loss can be viewed here. This model is a mandatory model in the accuracy verification process, please refer to the introduction of Model Accuracy Analysis.
The *.hbm is the model that can be used to load and run on the Horizon computing platform. After reading Embedded Application Development. You can then deploy the model to run on the computing platform quickly. However, to ensure that the performance and accuracy of the model is as good as you expect, we strongly recommend completing the Model Performance Analysis and Model Accuracy Analysis , before moving on to application development and development.
The *_advice.json file contains the results printed by the Horizon Model Compiler op checker.
The *_quant_info.json file contains the calibrated quantization information of the operators.
The *_node_info.csv file contains the result of the cosine similarity and other information of the operator after successful conversion, which is the same as the similarity information output in the console after successful execution of hb_compile.
The *.json is the model static performance evaluation file.
The *.html is the model static performance evaluation file (better readability).