Modules that do not need to be quantized set a non-None qconfig, e.g., pre/post-processing, loss function, etc.
Correct practice: set qconfig only for modules that need to be quantized.
The march is not set correctly, which may result in model compilation failures or inconsistent deployment accuracy.
Correct practice: select the correct BPU architecture based on the processor to be deployed, e.g. J6 requires Nash:
The model output node is not set to high accuracy output, resulting in quantization accuracy that is not as expected.
An error example is shown below:
Assume that the model is defined as follows:
Correct practice: in order to improve the model accuracy, set the model output node to high accuracy, as shown in the example below:
The Calibration process uses multi-cards.
Due to underlying limitations, currently Calibration does not support multi-cards, please use a single card for Calibration
The model input image data is in a non-centered YUV444 format such as RGB, which may result in inconsistent model deployment accuracy.
Correct practice: since the image format supported by Horizon hardware is centered YUV444, it is recommended that you use the YUV444 format directly as the network input from the beginning of model training.
Use the qat model for model accuracy evaluation and monitoring in quantized awareness training, which leads to the problem of failing to detect the abnormal accuracy at the time of deployment in a timely manner.
Correct practice: the reason for the error between QAT and Quantized is that the QAT stage cannot fully simulate the pure fixed-point computation logic in Quantized, so it is recommended to use the quantized model for model accuracy evaluation and monitoring.
Call the same member defined by FloatFunctional()multiple times.
The error example is as follows:
Correct practice: prohibit calling the same variable defined by FloatFunctional() multiple times in forward.
Some of the operators in the Quantized model have not gone through the calibration or QAT, for example, a post-processing operator wants to be accelerated on the BPU but has not gone through the quantization stage, which will lead to the failure of quantization inference or abnormal accuracy when deployed.
Correct practice: the Quantized phase is not completely unable to add operators directly, such as color space conversion operators, see the document for details on how to add operators. However, not all operators can be added directly, such as cat, this kind of operator must be obtained in the calibration or QAT phase of the statistics of the real quantization parameters in order not to affect the final accuracy, if you have a similar need to adjust the structure of the network, you can consult with the framework developers.
Floating-point model overfitting.
Common determination of model overfitting:
Correct practice: Solve the floating-point model overfitting problem on your own.