Model Quantized Compilation
During the model quantized compilation, the tool will generate the intermediate stage onnx model as well as the runtime model for simulating the on-board situations according to the configuration file.
Usage
The hb_compile provides two modes when doing the model quantized compilation, fast performance evaluation mode (with fast-perf turned on) and traditional model conversion compilation mode (without fast-perf turned on).
The fast performance evaluation mode, when turned on, will generate the hbm model that can have the highest performance when running on the board side during the conversion process, and the tool internally performs the following operations:
- Run BPU executable operators on the BPU whenever possible (i.e. you can specify the operators to run on the BPU via the node_info parameter in the yaml file).
- Remove CPU operators that are removable at the beginning and end of the model, including: Quantize/Dequantize, Transpose, Cast, Reshape, etc.
If you want to use the fast performance evaluation mode (i.e., turn on fast-perf), the reference command is as follows:
hb_compile --fast-perf --model ${caffe_model/onnx_model} \
--proto ${caffe_proto} \
--march ${march} \
--input-shape ${input_node_name} ${input_shape}
Attention
- Please note that if you need to enable fast performance evaluation mode, do not configure the
-config parameter as the tool uses the built-in high-performance configuration in this mode.
- When using hb_compile for model quantized compilation, the
--input-shape parameter configuration only works in fast performance evaluation mode (i.e. fast-perf is turned on).
If you want to use the traditional model conversion compilation mode (without fast-perf enabled), the reference command is as follows:
hb_compile --config ${config_file}
Parameters Introduction
| PARAMETER | DESCRIPTION |
-h, --help | Show help information and exit. |
-c, --config | Configuration file for the model compilation, in YAML format. |
--fast-perf | Turn on fast-perf, which will generate the hbm model that can have the highest performance when running on the board side during the conversion process, so that you can easily use it for the model performance evaluation later. If you turned on fast-perf, you also need to configure the following:
-m, --model, Floating-point model file of Caffe/ONNX.
--proto, Specify the Prototxt file of the Caffe model.
--march, BPU's micro architectures, for J6E processor, you should specify it tonash-e, for J6M processor, you should specify it to nash-m.
-i, --input-shape, Optional parameter that specifies the shape of the input node of the model. When using hb_compile for model quantized compilation, this configuration only takes effect when fast-perf is turned on currently. It is used in the following way:
- Specify the shape information of a single input node, example of how to use:
--input-shape input_1 1x3x224x224. - Specify the shapes of multiple input nodes, example of how to use:
--input-shape input_1 1x3x224x224 --input-shape input_2 1x3x224x224.
Attention: When the model is a single input node, --input-shapecan be left unconfigured and the tool will automatically read the size information from the model file. However, for dynamic input nodes, if --input-shape, is not specified, the tool will only support models where the first dimension of the dynamic input node is [-1, 0, ?], by default, the first dimension of the dynamic input node will be set to 1. |
The log file generated by the compilation will be stored in the directory where the command is executed under the default name hb_compile.log.
Configuration File Template
A complete configuration file template is shown as below:
Note
Below configuration file is only for display, in an actual configuration file of a model, the caffe_model and onnx_model parameters are not coexisting.
The model should be either a Caffe or an ONNX model. That is, caffe_model + prototxt or onnx_model , you need to choose one of the two when configuring.
# model parameters
model_parameters:
# The descriptive file of the original Caffe floating-point model
prototxt: '***.prototxt'
# The original Caffe model file
caffe_model: '****.caffemodel'
# The original ONNX model file
onnx_model: '****.onnx'
# The target processor architecture of conversion
march: 'nash-e'
# The prefix of the converted model file which will run on the dev board
output_model_file_prefix: 'mobilenetv1'
# The directory where the conversion results will be saved
working_dir: './model_output_dir'
# To specify whether the converted Hybrid heterogeneous model retains the ability to output the intermediate layer results for each layer after conversion
layer_out_dump: False
# Specify the output node of the model
output_nodes: "OP_name"
# Batch delete nodes of a certain type
remove_node_type: "Dequantize"
# Delete the node with the specified name
remove_node_name: "OP_name"
# input information parameters
input_parameters:
# The input node name of the floating-point model
input_name: "data"
# The input data format of the original floating-point model (quantity/sequence consistent with the input_name)
input_type_train: 'bgr'
# The input data layout of the original floating-point model (quantity/sequence consistent with the input_name)
input_layout_train: 'NCHW'
# The input data size of the original floating-point model
input_shape: '1x3x224x224'
# The data batch_size input to the neural network when the network is actually executed
input_batch: 1
# The data pre-processing method to be added into the model
norm_type: 'data_mean_and_scale'
# The mean value of the image subtracted by the preprocessing method, if it is the channel mean, the values must be separated by a space
mean_value: '103.94 116.78 123.68'
# The image scaling of the preprocessing method, if it is a channel scaling, the values must be separated by a space
scale_value: '0.017'
# The input data format which the converted heterogeneous model needs to match
# (quantity/sequence consistent with the input_name)
input_type_rt: 'yuv444'
# Special input data format
input_space_and_range: 'regular'
# Calibration parameters
calibration_parameters:
# The directory where the calibration samples will be saved
cal_data_dir: './calibration_data'
# Type of algorithms used for calibration
calibration_type: 'kl'
# max calibration parameter
max_percentile: 1.0
# Specify whether to calibrate for each channel
per_channel: False
# compilation parameters
compiler_parameters:
# Select compilation strategy
compile_mode: 'latency'
# Number of cores to run the model
core_num: 1
# Select the priority of model compilation
optimize_level: 'O2'
# Specify the input data source with the name data
input_source: {"data": "pyramid"}
# Specify the maximum continuous execution time for each function call of the model
max_time_per_fc: 1000
# Specify the number of processes when compiling the model
jobs: 8
The configuration file mainly contains model parameters, input information parameters, calibration parameters, compilation parameters. All parameter groups must exist in your configuration file. Parameters can be divided into the optional and the mandatory, while the optional parameters can be left unconfigured.
The following is the specific parameter information, the parameters will be more, we follow the above parameter group order to introduce. Required/Optional indicates whether this parameter must be specified in the Yaml file.
Specific Parameter Information
Model Parameters
| PARAMETER | DESCRIPTION | Required/Optional |
prototxt | PURPOSE: This parameter specifies the prototxt filename of the floating-point Caffe model.
PARAMETER TYPE: String.
RANGE: Model path.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter must be specified when the model is a Caffe model. | Caffe module required |
caffe_model | PURPOSE: This parameter specifies the caffemodel filename of the floating-point Caffe model.
PARAMETER TYPE: String.
RANGE: Model path.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter must be specified when the model is a Caffe model. | Caffe module required |
onnx_model | PURPOSE: This parameter specifies the onnx filename of the floating-point ONNX model.
PARAMETER TYPE: String.
RANGE: Model path.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter must be specified when the model is a ONNX model. | ONNX module required |
march | PURPOSE: This parameter specifies the platform architecture to run the converted heterogeneous model.
PARAMETER TYPE: String.
RANGE: 'nash-e' or 'nash-m'.
DEFAULT VALUE: None.
DESCRIPTIONS: The two optional configuration values correspond to J6E and J6M processors in that order. Depending on the platform you are using, you can choose between the two options. | required |
output_model_file_prefix | PURPOSE: This parameter specifies the prefix of the converted heterogeneous model filename.
PARAMETER TYPE: String.
RANGE: None.
DEFAULT VALUE:'model'.
DESCRIPTIONS: This parameter specifies the prefix of the converted heterogeneous model filename. | optional |
working_dir | PURPOSE: This parameter specifies the directory to save the conversion results.
PARAMETER TYPE: String.
RANGE: None.
DEFAULT VALUE:'model_output'.
DESCRIPTIONS: The tool will create a new directory automatically if it doesn't exist. | optional |
layer_out_dump | PURPOSE: This parameter specifies whether the heterogeneous model retains the ability to output intermediate layer values.
PARAMETER TYPE: Bool.
RANGE: True,False.
DEFAULT VALUE: False.
DESCRIPTIONS: Dumping the intermediate layer results is a debugging method, please do not enable it unless it is necessary.
Attention: It is not supported to configure input_source to be resizer when layer_out_dump is True. | optional |
output_nodes | PURPOSE: This parameter specifies model output node(s).
PARAMETER TYPE: String.
RANGE: Specific node name of the model.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter is used to support you to specify the node as the model output, the value should be the specific node name of the model. When there are multiple values, please refer to param_value Configuration. | optional |
remove_node_type | PURPOSE: This parameter sets the type of the deleted node.
PARAMETER TYPE: String.
RANGE: "Quantize", "Transpose", "Dequantize", "Cast", "Reshape", "Softmax". Different types should be split by;.
DEFAULT VALUE: None.
DESCRIPTIONS: No settings or set to null doesn't affect the model conversion process. This parameter is used to support you in setting the type information of the node to be deleted.
The deleted node must be at the beginning or end of the model, connected to the input or output of the model.
Attention: After setting this parameter, we will match the deletable nodes of the model according to your configuration. If the node type you configured to be deleted meets the deletion conditions, it will be deleted and this process will be repeated until the deletable nodes can not be matched with the configured node type. | optional |
remove_node_name | PURPOSE: This parameter sets the name of the deleted node.
PARAMETER TYPE: String.
RANGE: The name of the node in the model to be deleted. Different names should be split by;.
DEFAULT VALUE: None.
DESCRIPTIONS: No settings or set to null doesn't affect the model conversion process. This parameter is used to support you in setting the name of the node to be deleted. The deleted node must be at the beginning or the end of the model, connected to the input or output of the model.
Attention: After setting this parameter, we will match the deletable nodes of the model according to your configuration. If the node name you configured to be deleted meets the deletion conditions, it will be deleted and this process will be repeated until the deletable nodes can not be matched with the configured node name. | optional |
debug_mode | PURPOSE: Save calibration data for accuracy debug analysis.
PARAMETER TYPE: String.
RANGE: "dump_calibration_data"
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter serves to save the calibration data for the accuracy debug analysis and the data format is .npy. This data can be fed directly into the model for inference via np.load(). If you don't set this parameter, you can also save the data yourself and use the accuracy debug tool for accuracy analysis. | optional |
node_info | PURPOSE: Support setting the input and output data type of the specified op to be int16 and forcing the specified operator to run on the CPU or BPU.
PARAMETER TYPE: String.
RANGE: The range of operators supporting the configuration of int16 you can refer to Toolchain Operator Support Constraint List-ONNX Operator Support List. The operators that can be specified to run on the CPU or BPU need to be the ones included in the model.
DEFAULT VALUE: None.
DESCRIPTIONS:
node_info parameter usage: - Specify only that the OP runs on BPU/CPU (BPU is used as an example below, CPU method is the same):
node_info: {
"node_name": {
'ON': 'BPU',
}
}
- Configure only the node data type:
node_info: 'node_name1:int16;node_name2:int16'
For the configuration of multiple values, please refer to param_value Configuration。
- Specifies that the OP runs on the BPU and also configures the input and output data types of the OP:
node_info: {
"node_name": {
'ON': 'BPU',
'InputType': 'int16',
'OutputType': 'int16'
}
}
'InputType': 'int16' means that all input data types for the specified operator are int16. To specify the InputType of an operator-specific input, configure it by specifying a number after InputType. E.g.:
'InputType0': 'int16' means that the first input data type of the specified operator is int16.
'InputType1': 'int16' means that the second input data type of the specified operator is int16, and so on.
Attention: 'OutputType' does not support specifying an operator-specific output OutputType, which takes effect on all outputs of the operator when configured; configuring 'OutputType0', 'OutputType1', etc. is not supported. | optional |
Input Information Parameters
| PARAMETER | DESCRIPTION | Required/Optional |
input_name | PURPOSE: This parameter specifies the input node names of the original floating-point model.
PARAMETER TYPE: String.
RANGE: Single input: "" or the input node name, Multiple inputs: "input_name1; input_name2; input_name3..."
DEFAULT VALUE: None.
DESCRIPTIONS: No configuration is required if there is only one input node. If there are more than one nodes, it must be configured so as to guarantee the accuracy of subsequent types and input sequence of the calibration data. For configuration methods of multiple values, please refer to param_value Configuration . | Single input: optional Multiple inputs: required |
input_type_train | PURPOSE: This parameter specifies the input data type of the original floating-point model.
PARAMETER TYPE: String.
RANGE: 'rgb', 'bgr','yuv444', 'yuv444_128', 'gray' and 'featuremap'.
DEFAULT VALUE: 'featuremap'.
DESCRIPTIONS: Each input node needs to be configured with a defined input data type. If there are multiple input nodes, the order of the nodes must be strictly consistent with the order in the input_name. For configuration methods of multiple values, please refer to param_value Configuration For the selection of data types, please refer to: Model Conversion Interpretationsection. | required |
input_type_rt | PURPOSE: This parameter specifies the input data format that the converted heterogeneous model must match.
PARAMETER TYPE: String.
RANGE: 'rgb', 'bgr','yuv444_128''nv12','gray' and 'featuremap'.
DEFAULT VALUE: 'featuremap'.
DESCRIPTIONS: Here is an indication of the data format you need to use. It doesn't have to be the same as the data format of the original model, but note that this is the format that will actually feed into your model when running on the computing platform. Each input node needs to be configured with a defined input data layout. If there are multiple input nodes, the sequence of the configured nodes must be strictly consistent with the input_name sequence. For configuration methods of multiple values, please refer to param_value Configuration. For the selection of data types, please refer to Model Conversion Interpretationsection. | required |
input_layout_train | PURPOSE: This parameter specifies the input data layout of the original floating-point model.
PARAMETER TYPE: String.
RANGE: 'NHWC', 'NCHW'.
DEFAULT VALUE: None.
DESCRIPTIONS: Each input node needs to be configured with a defined input data layout that shall be the same as the layout of the original floating-point model. If there are multiple input nodes, the order of the nodes must be strictly consistent with the input_name sequence. For configuration methods of multiple values, please refer to param_value Configuration. For more about data layout, please refer to Model Conversion Interpretationsection. | required |
input_space_and_range | PURPOSE: This parameter specifies special data formats.
PARAMETER TYPE: String.
RANGE: 'regular' and 'bt601_video'.
DEFAULT VALUE: 'regular'.
DESCRIPTIONS: The purpose of this parameter is to deal with the YUV420 format dumped by different ISP and it will only become valid when the input_type_rt is specified as nv12, if the format is not nv12, an error will be reported, and the process will exit.
regular is a common YUV420 format ranged between [0,255].bt_601_video is another YUV420 video format ranged between [16,235]. For more information about bt601, please feel free to Google it.
Attention: You don't need to configure this parameter without explicit requirements. | optional |
input_shape | PURPOSE: This parameter specifies the input data size of the original floating-point model.
PARAMETER TYPE: String.
RANGE: None.
DEFAULT VALUE: None.
DESCRIPTIONS: Dimensions of shape should be separated by x, e.g. '1x3x224x224'. You don't need to configure this parameter unless there are more input nodes in the model, because the tool can read the size information from model files automatically. When there are multiple input nodes, the sequence of configured nodes must be strictly consistent with the input_name sequence. For configuration methods of multiple values, please refer to param_value Configuration. | optional |
input_batch | PURPOSE: This parameter specifies the input data batch size that the converted heterogeneous model must match.
PARAMETER TYPE: Int.
RANGE: 1-4096.
DEFAULT VALUE: 1.
DESCRIPTIONS: This parameter specifies the input data batch size that the converted heterogeneous model must match, but does not affect the input data batch size of the converted bc model. If you don't configure this parameter, the default value is 1.
Attention: - This parameter can only be used when the first dimension of
input_shapeis 1 for a single-input model. - This parameter only effective when the original onnx model itself supports multi-batch inference. However, due to the complexity of the operators, if during the model conversion process, However, due to the complexity of the operators, if during the model conversion process, you meet the conversion failure log which indicates that the model does not support the configuration of the input_batch parameter, please try to directly export a multi-batch onnx model and correctly configure the size of the calibration data to re-convert it (at this time, you no longer need to configure this arameter).
| optional |
separate_batch | PURPOSE: This parameter specifies whether to enable the separated batch mode.
PARAMETER TYPE: Bool.
RANGE: True, False.
DEFAULT VALUE: False.
DESCRIPTIONS: If you don't configure this parameter, the default value is False, that is, the separated batch mode is not enabled. When the separated batch mode is not enabled, the input need to be allocated in a contiguous memory area. For example, if the model input is 1x3x224x224 and input_batch is set to N, you will need to prepare an on-board model input of Nx3x224x224. When the separate batch mode is enabled, the input nodes with this mode enabled will be separated, the number of which to be separated is the value you specified with input_batch. You can prepare the input for each batch individually, which no longer requires the input to be allocated in a contiguous area of memory in this mode. For example, if the model input is 1x3x224x224 and input_batch is set to N, then you need to prepare N 1x3x224x224 on-board model inputs. | optional |
norm_type | PURPOSE: This parameter specifies the pre-processing method to deal with the model input data.
PARAMETER TYPE: String.
RANGE: 'data_mean_and_scale', 'data_mean', 'data_scale' and 'no_preprocess'.
DEFAULT VALUE: 'no_preprocess'.
DESCRIPTIONS: no_preprcess means that no pre-processing method will be used.
data_mean means subtracting mean value.
data_scale means providing multiply scale factor preprocessing.
data_mean_and_scale means first subtracting mean value and then multiplying scale factor. When there are multiple input nodes, the sequence of configured nodes must be strictly consistent with the input_name sequence. For configuration methods of multiple values, please refer to param_value Configuration. For the influence of this parameter, please refer to Model Conversion Interpretationsection.
Attention: When input_type_rt is configured as a featuremap non-four-dimensional input, norm_type can only be configured as no_preprocess. | optional |
mean_value | PURPOSE: This parameter specifies the mean value to be subtracted by the pre-processing method.
PARAMETER TYPE: Float.
RANGE: None.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter will be valid when the norm_type is specified as data_mean_and_scale or data_mean. Each input node has 2 configuration methods. If only one value is specified, then all channels will subtract the same mean value. Otherwise, you need to specify the mean values for each channel and the number of values (separated by space) must be consistent with the numbers of channel. The number of configured input nodes must be consistent with the node number specified by norm_type. If there is a node that doesn't require mean processing, it should be specified as 'None'. For configuration methods of multiple values, please refer to param_value Configuration. | optional |
scale_value | PURPOSE: This parameter specifies the scale factor of the pre-processing method .
PARAMETER TYPE: Float.
RANGE: None.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter will be valid when the norm_type is specified as data_mean_and_scale or data_scale. Each input node has 2 configuration methods. You can either specify only 1 value for all channels or specify the values (separated by space) for each channel. The number of values must be consistent with number of channels. The number of configured input nodes must be consistent with the node number specified by norm_type. If there is a node that doesn't require scale processing, it should be specified as 'None'. For configuration methods of multiple values, please refer to param_value Configuration. | optional |
input_type_rt/input_type_train additional description
To boost the ASIC performance, 2 assumptions have been made in the design of ASIC micro architecture:
- All inputs are quantized int8 data.
- All camera captured data are NV12.
Therefore, if you use the RGB (NCHW) format in the model training and expect the model to process NV12 data efficiently, then you will need to configure as follow:
input_parameters:
input_type_rt: 'nv12'
input_type_train: 'rgb'
input_layout_train: 'NCHW'
Tip
When Gray format is used in the model training, while in practice, the model input is NV12,
you can set both input_type_rt and input_type_train to gray during model conversion, and use only the y channel address of the NV12 data in embedded application development.
In addition to converting the input data to NV12, you can also use different RGB-orders in the training and runtime infer.
The tool will automatically add data conversion nodes according to the data formats specified by the input_type_rt and input_type_train.
Not any type combination is needed, in order to avoid your misuse, we only open some fixed type combinations in the following table
(Y for supported image types, while N for unsupported image types. The first row of the table is the data types supported in input_type_rt and the first column is the data types supported in input_type_train.):
| input_type_train \ input_type_rt | nv12 | yuv444 | rgb | bgr | gray | featuremap |
|---|
| yuv444 | Y | Y | N | N | N | N |
| rgb | Y | Y | Y | Y | N | N |
| bgr | Y | Y | Y | Y | N | N |
| gray | N | N | N | N | Y | N |
| featuremap | N | N | N | N | N | Y |
Note
To meet the requirements of Horizon ASICs on input data type (int8) and reduce the inference costs,
when input_type_rt is of the RGB(NHWC/NCHW)/BGR(NHWC/NCHW) type, the input data type of the models converted by using the conversion tool will all be int8.
That is, for regular image formats, pixel values should be subtracted by 128, which has already been done by the API and you do not need to do it again.
In the final hbm model obtained from the conversion, the conversion from input_type_rt to input_type_train is an internal process.
You only need to focus on the data format of input_type_rt.
It is of vital importance to understand the requirement of the input_type_rt when preparing the inference data for embedded applications, please refer to the following explanations to each format of the input_type_rt.
- rgb, bgr, and gray are commonly used image format. Note that each value is represented using UINT8.
- yuv444 is a popular image format. Note that each value is represented using UINT8.
- NV12 is a popular YUV420 image format. note that each value is represented using UINT8.
- One special case of NV12 is to specify the
bt601_video of the input_space_and_range.
Compared with typical NV12 format, its value range has changed from [0,255] to [16,235].
Each value is still represented as UINT8. Note that bt601_video is supported configuring via input_space_and_range only when input_type_train is bgr or rgb.
- Featuremap is suitable for cases where the above listed formats failed to meet your needs, and this type uses float32 for each value.
For example, this format is commonly used for model processing such as radar and speech.
Tip
The above input_type_rt and input_type_train are integrated into the toolchain processing procedure.
If you are very sure that no format conversion is required, then set the two input_type to be the same,
so that the same input_type will perform the processing in a straight-through way and will not affect the actual execution performance of the model.
Similarly, data pre-processing also is also integrated into the process.
If you don't need to do any pre-processing, you can disable this function through the norm_type configuration, which will not affect the actual execution performance of the model.
Calibration Parameters
| PARAMETER | DESCRIPTION | Required/Optional |
cal_data_dir | PURPOSE: This parameter specifies the directory to save the calibration samples.
PARAMETER TYPE: String.
RANGE: None.
DEFAULT VALUE: None.
DESCRIPTIONS: The calibration data in the directory must comply with the requirements of input configurations, please refer to the Prepare Calibration Data section. When there are multiple input nodes, the sequence of configured nodes must be strictly consistent with the input_name sequence. For configuration methods of multiple values, please refer to param_value Configuration. When calibration_type is skip, cal_data_dir doesn't need to be set. | optional |
calibration_type | PURPOSE: This parameter specifies the types of algorithms used in the calibration.
PARAMETER TYPE: String.
RANGE: 'default', 'mix', 'kl', 'max' and 'skip'.
DEFAULT VALUE: 'default'.
DESCRIPTIONS: Both the kl and max are public quantization calibration algorithms. Users can learn more from the Internet.
default is an automatic search strategy to try to get a better calibration combination from a series of quantization calibration parameters.
mix is a search strategy that integrates multiple calibration methods. It automatically identifies quantization-sensitive nodes and selects the best method from different calibration methods at the node granularity. Ultimately, a combined calibration approach that incorporates the advantages of multiple calibration methods is constructed. It is recommended to firstly try the default, and if the results fails to meet the expectation, configure different calibration parameters according to PTQ Model Accuracy Optimization section. If you only want to try to verify the model performance, but do not require the accuracy, you can try the skip method of calibration. This method uses max + internally generated random calibration data for calibration and does not require you to prepare calibration data, so it is suitable for first-time attempts to validate the model structure. Attention: When using the skip, the model obtained cannot be used for accuracy verification because it uses max + internally generated random calibration data for calibration. | optional |
max_percentile | PURPOSE: This is the parameter of the max calibration method and it is used to adjust the intercept point of the max calibration.
PARAMETER TYPE: Float.
RANGE: 0.5 - 1.0.
DEFAULT VALUE: 1.0.
DESCRIPTIONS: This parameter is valid only when the calibration_type is specified as max. Typical options: 0.99999/0.99995/0.99990/0.99950/0.99900. It is recommended to firstly specify the calibration_type as default, and if the results fails to meet the expectation, configure different calibration parameters according to the PTQ Model Accuracy Optimization section. | optional |
per_channel | PURPOSE: This parameter determines whether to calibrate each channel of featuremap.
PARAMETER TYPE: Bool.
RANGE: True or False.
DEFAULT VALUE: False.
DESCRIPTIONS: This parameter is valid only when the calibration_type is specified as non-default or non-mix values. You are recommended to firstly try the default, and if the results still fails to meet the expectation, configure different calibration parameters according to the PTQ Model Accuracy Optimization section. | optional |
optimization | PURPOSE: This parameter provides you with several configurable options for tuning, through which you can configure different modes to tune the accuracy/performance.
PARAMETER TYPE: String.
RANGE: set_model_output_int8, set_model_output_int16, set_{NodeKind}_input_int16, set_{NodeKind}_output_int16,set_Softmax_input_int8, set_Softmax_output_int8, asymmetric, bias_correction and lstm_batch_last. Note: Nodekind here needs to be written in standard ONNX operator type, such as Conv, Mul, Sigmoid, etc. (case sensitive). For details, please refer to the official ONNX op documentation or the Toolchain Operator Support Constraint List-ONNX Operator Support List
DEFAULT VALUE: None.
DESCRIPTIONS: Note: the int16 support for operator, you can refer to Toolchain Operator Support Constraint List-ONNX Operator Support List. - When the value is specified as set_model_output_int8, set the model to output in int8 format with low accuracy.
- When the value is specified as set_model_output_int16, set the model to output in int16 format with low accuracy.
- When the value is specified as set_{NodeKind}_input_int16 , specify that a certain type of operator input to the model is quantized to int16, and if a situation arises where int16 is not supported due to the node context, the int8 computation will be rolled back and the log will be printed.
- When the value is specified as set_{NodeKind}_output_int16 , specify that a certain type of operator output to the model is quantized to int16, and if a situation arises where int16 is not supported due to the node context, the int8 computation will be rolled back and the log will be printed.
- When the value is specified as set_Softmax_input_int8/set_Softmax_output_int8, since the current softmax defaults to float to compute non-quantized nodes. These two specified values will quantize the softmax operator to int8 and compute it on the BPU. There is no difference in usage between the two.
- When the value is specified as asymmetric, the asymmetric quantization will be tried to be turned on, which can improve the quantization accuracy of some models.
When calibration_type is configured as default, this parameter will be automatically selected by the algorithm and cannot be explicitly configured at this time. - When the value is specified as bias_correction, the BiasCorrection quantization method will be used, which can improve the quantization accuracy of some models.
- When the value is specified as lstm_batch_last, for J6 BPU, for LSTM when the batch-input size is large, the batch dimension can be converted to W dimension for computation (to guarantee the equivalence), which is more in line with the hardware computation logic, and some scenarios can realize the effect of accelerating the inference performance of J6 BPU.
Since the model deployment inference performance is related to multiple hierarchical levels of optimization, there is no guarantee that this method will achieve performance acceleration.
| optional |
quant_config | PURPOSE: The J6 platform supports the configuration of multiple computation types for a single operator. This parameter supports you configuring the computation type of the operator and configuring the computational accuracy of the operator in the model at different levels to generate the desired mixed-accuracy model.
PARAMETER TYPE: String.
RANGE: Path of json file.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter supports the configuration of computation accuracy from multiple levels(model_config, op_config, node_config), and supports the configuration of multiple computation accuracy data types(int8/int16/float16), please refer to the quant_config additional description below for detailed description. | optional |
quant_config additional description
-
The quant_config supports multiple levels of configuration of computation accuracy:
It supports configuring the computation accuracy from model_config, op_config, and node_config levels.
There is a priority relationship between the three levels, the smaller the configuration granularity, the higher the priority, i.e., priority model_config<op_config<node_config.
When a node is configured by more than one level at the same time, the level with the highest priority takes effect in the end.
For example, if you configure all the Add type nodes in op_config for int16 computation, and configure the Add_2 in node_config for int8 computation,
then ultimately, the Add_2 node is int8 computation, and the rest of the Add operators are int16 computation.
-
The quant_config supports multiple configurations of computation accuracy data types:
It supports the configuration of int8/int16/float16 three kinds of computation accuracy data types, about these three kinds of data types are described as follows:
-
int8: The default quantization type for most operators, which generally does not need to be actively configured by you.
-
int16: You can refer to the section int16 Configuration.
-
float16: When configured as the float16 type, the tool will internally only configure this operator as the float16 computation accuracy type (there will be no computation broadcast updates to the float16 computation operator context operators).
-
Description of the parameters for each level of the quant_config json file:
| Primary Parameter | Secondary Parameter | Tertiary Parameter | Required or Not | Description |
| model_config | all_node_type | None | Optional | Set the inputs of all nodes in the model to the specified type at once, with optional configuration of int16/float16. |
| model_output_type | None | Optional | Set the output tensor of the model to the specified type, with optional configuration of int8/int16. |
| op_config | NodeKind | type | Optional | Configure the input data type of a node of a certain type, with optional configuration of int8/int16/float16. |
| node_config | NodeName | type | Optional | Configure the input data type of a node with a specified name, with optional configuration of int8/int16/float16. |
-
Configuration example of the json template of the quant_config:
The following is an example of a json template configuration of the quant_config with all the configurable options, you can refer to this template for configuration.
{
// Configure model-level parameters
"model_config":{
// Configure input data types for all nodes at once
"all_node_type":"int16/float16",
// Configure the data type of the model output
"model_output_type":"int8/int16"
},
// Configure the parameters of a node type, change the op_name to the node type name, e.g. “Conv”, “Add”, “Softmax”...
"op_config":{
// Configure the type of input data for a certain type of node
"op_name1":{"type":"int8/int16/float16"},
"op_name2":{"type":"int8/int16/float16"}
},
// Configure the parameters of a node,, change the node_name to the name of the node, e.g. “Conv_0”, “Add_1”...
"node_config":{
// Configure the input data type of a node
"node_name1":{"type":"int8/int16/float16"},
"node_name2":{"type":"int8/int16/float16"}
}
}
Compilation Parameters
| PARAMETER | DESCRIPTION | Required/Optional |
compile_mode | PURPOSE: This parameter specifies compilation strategies.
PARAMETER TYPE: String.
RANGE: 'latency', 'bandwidth' and 'balance'.
DEFAULT VALUE: 'latency'.
DESCRIPTIONS: The latency aims to optimize the latency time of inference. The bandwidth aims to optimize the access bandwidth of DDR. The balance aims to balance the optimization of latency and bandwidth, to set this option, you need to specify the balance_factor. It is recommended to use the latency strategy as long as your models don't severely exceed the expected bandwidth. | optional |
balance_factor | PURPOSE: This parameter specifies the balance ratio when the compile_mode is specified as balance.
PARAMETER TYPE: Int.
RANGE: 0-100.
DEFAULT VALUE: None.
DESCRIPTIONS: This parameter is only used when the compile_mode is specified as balance, otherwise the configuration will not take effect.
- Configuration of 0 means that the bandwidth is optimal, which corresponds to the compile strategy with bandwidth as the compile_mode.
- Configuration of 100 means that the performance is optimal, which corresponds to the compile strategy with latency as the compile_mode.
| optional |
core_num | PURPOSE: This parameter specifies the number of cores to run model.
PARAMETER TYPE: Int.
RANGE: 1.
DEFAULT VALUE: 1.
DESCRIPTIONS: Used to configure the number of cores for the model to run on the Horizon platform. | optional |
optimize_level | PURPOSE: This parameter specifies the model optimization levels.
PARAMETER TYPE: String.
RANGE: 'O0' , 'O1','O2'.
DEFAULT VALUE: 'O0'.
DESCRIPTIONS: Optimization level ranges between O0 - O2.
O0: No optimization, fastest compilation speed and lowest optimization level.
O1 to O2: As the optimization level increases, the compiled model is expected to execute faster, but the compilation time is also expected to be longer. | optional |
input_source | PURPOSE: This parameter specifies the input source of dev board hbm models.
PARAMETER TYPE: String.
RANGE: ddr , pyramid and resizer .
DEFAULT VALUE: None, it will be automatically selected from an optional range based on the value of input_type_rt by default:
- When input_type_rt is specified as nv12 or gray, input_source is automatically selected as pyramid by default.
- When input_type_rt is specified as any other value, input_source is automatically selected as ddr by default.
- When this parameter is specified as resizer, input_type_rt only supports specifying as nv12 or gray.
DESCRIPTIONS: This is an option for adapting the engineering environment and you are recommended to configure it after all model validations are complete. The ddr indicates that the data comes from memory. pyramid and resizer indicates the fixed hardware from the processor. To configure the resizer source data in engineering environment requires a call to a proprietary interface, for related constraints and descriptions, please refer to the hbDNNRoiInferV2 interface introduction This parameter is a bit special, e.g., if the model input name is data and the data source is memory (ddr), then this parameter should be configured as {"data": "ddr"}. | optional |
max_time_per_fc | PURPOSE: This parameter specifies the maximum continuous execution time (by μs) of model's each function call.
PARAMETER TYPE: Int.
RANGE: 0 or 1000-4294967295.
DEFAULT VALUE: 0.
DESCRIPTIONS: The inference of the compiled directive model in the BPU are denoted by 1 or multiple function-calls(the function-call is the atomic unit in BPU execution). The value of 0 means no restriction. This parameter is used for specifying the max execution time of each function-call. The model only has a chance to be preempted when the execution of a single function-call is finished. Please refer to the Model Preemption Control section.
Attention:
- Note that this parameter is only used to implement the model preemption function and can be ignored otherwise.
- The model preemption funtion is only supported on the board, not in the simulator.
| optional |
jobs | PURPOSE: This parameter sets the number of processes when compiling the hbm model.
PARAMETER TYPE: Int.
RANGE: Within the maximum number of cores supported by the machine.
DEFAULT VALUE: 16.
DESCRIPTIONS: When you compile the hbm model, it is used to set the number of processes. It can improve the compilation speed to some extent. | optional |
advice | PURPOSE: This parameter is used to indicate the predicted increase in elapsed time in microseconds after the model is compiled.
PARAMETER TYPE: Int.
RANGE: Natural number.
DEFAULT VALUE: 0.
DESCRIPTIONS: During the model compilation process, the toolchain will perform a time consumption analysis internally. In the actual process, the time consumption will be increased when doing operations such as data alignment of operators. After setting this parameter, when the deviation between the actual computation time and the theoretical computation time of a certain OP is larger than the value you specify, the relevant log will be printed, including information about the change in time, the shape and padding ratio before and after data alignment, etc. | optional |
param_value Configuration
You can specify the parameters like this: param_name: 'param_value', while multiple values can be separated by ';': param_name: 'param_value1; param_value2; param_value3'.
Tip
To avoid parameter sequence problems, You are strongly suggested to specify the parameters(such as input_shape etc.) explicitly when there are multi-input models.
Attention
Please note that, if set input_type_rt to nv12 , an odd number cannot appear in the input size of model.
int16 Configuration
In the process of model conversion, most of the operators in the model are quantized to int8 for computation, and by configuring the node_info or the quant_config parameter.
You can specify in detail the input or the output of an op as int16 calculation (The range of operators supporting the configuration of int16 you can refer to Toolchain Operator Support Constraint List-ONNX Operator Support List) The basic principle is as follows:basically as follows.
After you configure an op input/output data type to int16, the model transformation automatically performs an update and check of the op input/output context int16 configuration internally.
For example, when configuring op_1 input/output data type as int16, it actually potentially specifies that the previous/next op of op_1 needs to support computation in int16 at the same time. For unsupported scenarios,
the model conversion tool will print a log indicating that the int16 configuration combination is temporarily unsupported and fall back to int8 computation.