PTQ Model Accuracy Optimization

Accuracy Optimization Advices

Based on previous accuracy analysis results, the accuracy loss problem of quantized model can be divided into the following 2 types:

1.There are apparent accuracy losses (over 4%).

This can be mostly caused by either inappropriate yaml configurations or unbalanced calibration datasets, etc. Please troubleshoot according to the following advices.

2.Accuracy loss is small (1.5%~3%).

If there are still small accuracy losses after the above cause is excluded, it is usually caused by model sensitivity and can be optimized using our accuracy optimization tool.

3.After trying 1 and 2, if the accuracy still does not meet expectations, try further attempts using the accuracy debug tool we provide.

The workflow chart of accuracy loss solution is shown as below:

accuracy_problem

Significant Loss of Accuracy (Over 4%)

Apparent accuracy loss are usually caused by all types of improper configurations, therefore, we suggest that you doublecheck the pipeline, model conversion configurations and consistency.

Doublecheck the Pipeline

Pipeline refers to the entire process of data preparations, inference, post-processing and the accuracy evaluation Metric. Based on the past customer problem follow-up experiences, we find out that the most commonly seen case is that the modifications in the floating-point model training stage are not updated timely to the accuracy validation process during model conversion stage.

Doublecheck the Model Conversion Configurations

According to the PTQ accuracy evaluation and the consistency verification recommendation process, if you locate that the accuracy problem occurs in original_float.onnx, it is recommended that you focus on checking the yaml configuration file as well as the pre-processing and post-processing code to see if there are any errors, among which, the common misunderstandings of the yaml configuration file include:

  • As the input_type_rt and input_type_train parameters are used for distinguishing the data formats of the converted heterogeneous model and the original floating-point model, it must be carefully doublechecked if they can meet the expectation, especially the sequences of BGR and RGB channels.

  • Doublecheck if the norm_type, mean_values and scale_values parameters are specified correctly. Nodes of the mean and scale operations can be directly inserted into the model by specifying the conversion configurations, and it should be confirmed whether repeated mean or scale operations are executed in the validation/evaluation images. Repeated pre-processing operation is another frequently-seen mistake.

Doublecheck Data Processing Consistency

  • The read_mode is not specified correctly: The read_mode can be specified in the 02_preprocess.sh script in each example folder in the package by the --read_mode parameter, which supports opencv, skimage and PIL. In addition, the preprocess.py script sets the reading mode through the imread_mode parameter, which needs to be changed synchronously. The skimage, opencv and PIL are the popular image-reading methods, while there are differences in the output ranges and formats between them.

    When using the skimage, you can get RGB channel sequence, value ranges between 0~1 and float data type.

    When using the opencv, you can get BGR channel sequence, value ranges between 0~225 and uint8 data type.

    When using the PIL, you can get RGB channel sequence, value ranges between 0~255 and uint8 data type.

  • Check the data processing library and the transformer implementation: It is recommended that you still use the data processing libraries that the original floating-point model relied on during the training and validation phase of the Horizon toolchain. For less robust models, typical functions implemented in different libraries such as resize, crop, etc. may affect the model accuracy.

    In the case of using the same data processing library, the implementation of some of the preprocessing operations may also be different, for example, ResizeTransformer, using the default interpolation method of opencv (linear), if other interpolation methods can be directly modified transformer.py source code (located at samples/ai_toolchain/horizon_model_convert_sample/01_common/python/data/transformer.py) to make sure it is consistent with the preprocessing code during training.

  • Validate if datasets are reasonably distributed: The volume of validation dataset should be around 100 and images should cover all scenarios. For example, in cases of multi-task and multi-class classification, the validation dataset should be able to cover all prediction branches or all classes. Meanwhile, try not to use those exceptional images (e.g. the over-exposed).

  • Use the *_original_float_model.onnx model to re-validate model accuracy. Normally, the accuracy of the *_original_float_model.onnx should be accurate to 3~5 decimal places. If your model fails to satisfy this accuracy, please carefully check the data processing.

Smaller Loss of Accuracy(1.5%-3%)

In general, to reduce the difficulty of model accuracy optimization, we recommend that you use the automatic parameter search function in the conversion configuration(Configure calibration_type to default). The default is an automatic search function, based on the first calibration data output node cosine similarity, from max, max-percentile 0.99995 and kl calibration methods to select the optimal solution, the final selected calibration method can be concerned about the conversion log similar to Select kl method.. During the search process, whether to turn on per_channel quantization, asymmetric quantization and other options will also be considered.

  • If per-channel is turned on it will also print: Perchannel quantization is enabled.

  • If asymmetric quantization is enabled it will also print: Asymmetric quantization is enabled.

If you find that the accuracy results of the automatic search still fall short of expectations, the accuracy loss compared with the original floating-point model is in the range of 1.5% to 3%. You can try to improve the accuracy using the following suggestions respectively.

Adjust Calibration Method

  • Try to manually specify the calibration_type, you can select mix first, if the final accuracy still does not meet expectations, then try either kl or max.

  • When the calibration_type is specified as max, configure max_percentile to be a different number of percentiles (ranging from 0.5 to 1). We recommend you to try 0.99999, 0.99995, 0.9999, 0.9995, 0.999, and observe the trend of the model's accuracy through these configurations to find the optimal percentile.

  • Based on the attempts above, choose the option with the highest cosine similarity and try to enable per_channel in the configuration file.

  • The optimization parameter in yaml also provides asymmetric and bias_correction options for accuracy debugging, it was found that these two parameters can improve the quantization accuracy in some of the scenarios, which can be further tried.

Adjust Calibration Dataset

  • Try to increase/decrease the amount of data in the calibration dataset appropriately (usually detection scenarios require less calibration data than classification scenarios).

  • Observe the missed detections of the model output and appropriately increase the calibration data of the corresponding scenes.

  • Don't use anomalous data such as pure black, pure white, etc., try to minimize the use of untargeted background images as calibration data, cover typical task scenarios as comprehensively as possible, and the distribution of the calibration dataset used needs to be approximated with that of the training set.

Rollback Some Tail OPs to CPU High-accuracy Computation

  • Generally we will only try to roll back 1 or 2 operators in the output layer at the tail of the model to the CPU, too many CPU operators will affect the final performance of the model. The basis for judging can be by observing the cosine similarity of the model (if you run_on_cpu some intermediate nodes and find that the accuracy is not improved, this is a normal phenomenon, because repeated weighting may also bring more loss of accuracy, so it is usually only recommended to roll back the tail nodes to the CPU).

  • You can specify the operator runs on CPU by configure the node_info parameter in the yaml file.

Accuracy Debug Tool

After trying the methods provided above, if your accuracy still doesn't meet your expectation, try using our accuracy debug tool. In the process of PTQ model quantization, there are two main reasons for the loss of accuracy: sensitive node quantization problem and node quantization error accumulation problem. For these two situations, in order to facilitate you to locate the problem, we provide the accuracy debug tool to assist you to independently locate the accuracy problems generated during the model quantization process. This tool can help you analyze the quantified error at the node granularity of the calibration model and quickly locate nodes with accuracy anomalies. For a detailed description of the tool and how to use it, you can refer to section Accuracy Debug Tool .

Based on past practical production experience, the above strategies have been able to handle a variety of practical problems.

Tip

You can also try accuracy optimization by configuring some ops to be calculated in int16:

In the process of model conversion, most op will calculate with int8 data by default, and in some scenarios part of op using int8 calculation will lead to significant accuracy loss. The model conversion tool chain already provides the ability to specify the particular op to be calculated in int16 bit. For details, please refer to the description of the node_info parameter configuration in Model Parameters. The accuracy loss problem can be solved in some scenarios by configuring the quantization accuracy loss sensitive op (which can be calculated with cosine similarity as a reference) to be calculated with int16 bit.

Further Improve Model Accuracy Using the QAT Solution

If the above analysis does not reveal any configuration problems, but the accuracy still cannot meet the requirements, it may be a limitation of PTQ itself. In such case, you can utilize the QAT solution to quantize the model.

The Horizon Plugin Pytorch (hereinafter referred to as Plugin) refers to the official PyTorch quantization interface and ideas. The Plugin adopts the Quantization Aware Training (QAT) solution, so users are recommended to read the PyTorch official documentation related to the QAT.

For a more detailed introduction to Horizon Plugin Pytorch, you can refer to the :doc: Quantized Awareness Training (QAT) section.