In QAT, an important stage is to calculate the quantitative parameter scale. A good scale can significantly improve the accuracy of the model training results and speed up the convergence of the model. The calibration process is to run a few batch data on floating-point models (only run the forward process, not backward), count the distribution histogram and get min_value and max_value. Then, use the min_value and max_value to calculate the scale. When the QAT accuracy is low, calibrating the quantitative parameters like this before QAT can provide better quantitative initilization parameters.
No need to change existing models by default
Similar to the setting of QAT QConfig when defining a quantitative model, it is also necessary to set Calibration QConfig when defining a calibration model. However, the latter is simpler as the HAT has already implemented the default settings for Calibration QConfig for direct use without any modification to the model.
Define submodule Calibration QConfig
By default, Calibration QConfig is set for all modules (inherited from nn.Module) of the model. Therefore, the calibration will count the feature distribution of all modules. If you have special needs, you can customize the implementation of the set_calibration_qconfig method in the model:
HAT comes with built-in calibration functions, whose commands are similar to those in normal training. Simply run the following command:
See the calibration_trainer settings in the config file:
1. Setting Dataset:
The datasets for the calibration cannot be testing datasets (can be training datasets or others). There is no definite conclusions for now about the transforms for data enhancement, you can try either using the transforms consistent with those in normal training or validation, or using the customized transforms.
2. Number of images to be iterated in calibration (for reference):
The number of images is not fixed either, the suggestions above are only experiences summarized from existing experiments, which can be adjusted according to the actual situation.
Setting of averaging_constant:
In the QAT, the update rule for the scale parameter is scale = (1 - averaging_constant) * scale + averaging_constant * current_scale .
It is found in some existing experiments that it may lead to higher accuracy by fixing the scale of the activation after calibration, i.e., by setting averaging_constant=0 of the activation and averaging_constant=1 of the weight.
The setting isn't suitable for all tasks, it can be adjusted according to the actual situation. E.g., Fixing scale in a LiDAR task may result in low accuracy.
Next, you only need to start the training by executing the normal QAT command: