Calibration

In QAT, an important stage is to calculate the quantitative parameter scale. A good scale can significantly improve the accuracy of the model training results and speed up the convergence of the model. The calibration process is to run a few batch data on floating-point models (only run the forward process, not backward), count the distribution histogram and get min_value and max_value. Then, use the min_value and max_value to calculate the scale. When the QAT accuracy is low, calibrating the quantitative parameters like this before QAT can provide better quantitative initilization parameters.

How to Define Calibration Model

  • No need to change existing models by default

    Similar to the setting of QAT QConfig when defining a quantitative model, it is also necessary to set Calibration QConfig when defining a calibration model. However, the latter is simpler as the HAT has already implemented the default settings for Calibration QConfig for direct use without any modification to the model.

  • Define submodule Calibration QConfig

    By default, Calibration QConfig is set for all modules (inherited from nn.Module) of the model. Therefore, the calibration will count the feature distribution of all modules. If you have special needs, you can customize the implementation of the set_calibration_qconfig method in the model:

    class Classifier(nn.Module): def __init__(self,): ... def forward(self, x): ... # Customize the module to be calibrated def set_calibration_qconfig(self, ): # e.g., you can set the qconfig of loss to None to skip # the calibration on loss. # It can relatively reduce unnecessary statistics, reduce # the video memory usage and speed up the calibration. if self.loss is not None: self.loss.qconfig = None

Run Calibration on Floating-point Models

HAT comes with built-in calibration functions, whose commands are similar to those in normal training. Simply run the following command:

python3 tools/train.py --stage calibration ...

See the calibration_trainer settings in the config file:

# Note: The transforms of the dataset during the calibration can be # consistent with that during the training or validation, or customized. # By default, `val_batch_processor` is used. calibration_data_loader = copy.deepcopy(data_loader) calibration_data_loader.pop('sampler') # Calibration do not support DDP or DP calibration_batch_processor = copy.deepcopy(val_batch_processor) calibration_trainer = dict( type="Calibrator", model=model, # 1. Set data_loader and batch_processor data_loader=calibration_data_loader, batch_processor=calibration_batch_processor, # 2. Set the number of batches for the calibration to iterate num_stages=30, ... )

1. Setting Dataset:

The datasets for the calibration cannot be testing datasets (can be training datasets or others). There is no definite conclusions for now about the transforms for data enhancement, you can try either using the transforms consistent with those in normal training or validation, or using the customized transforms.

2. Number of images to be iterated in calibration (for reference):

  • classification: 500~1500 images
  • segmentation && detection: 100~300 images
Note

The number of images is not fixed either, the suggestions above are only experiences summarized from existing experiments, which can be adjusted according to the actual situation.

QAT with Calibration Model

qat_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", # (Optional) Set the scale update factor during QAT training qconfig_params=dict( activation_qkwargs=dict( averaging_constant=0, ), weight_qkwargs=dict( averaging_constant=1, ), ), converters=[ dict(type="Float2QAT"), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "calibration-checkpoint-best.pth.tar" ), ), ], ), )

Setting of averaging_constant:

In the QAT, the update rule for the scale parameter is scale = (1 - averaging_constant) * scale + averaging_constant * current_scale .

It is found in some existing experiments that it may lead to higher accuracy by fixing the scale of the activation after calibration, i.e., by setting averaging_constant=0 of the activation and averaging_constant=1 of the weight.

Note

The setting isn't suitable for all tasks, it can be adjusted according to the actual situation. E.g., Fixing scale in a LiDAR task may result in low accuracy.

Next, you only need to start the training by executing the normal QAT command:

python3 tools/train.py --stage qat ...