Quantized Training

This document only describes the operations needed to perform quantization training in HAT. For the basic principles of quantization and its implementation in the training framework, refer to the documentation of horizon_plugin_pytorch .

In quantized training, the conversion process from a floating-point model to a fixed-point model is as follows:

qat

Most of these steps are already integrated in the HAT training pipeline, and the user only needs to pay attention to implementing the fuse_model method to complete the model fusion when adding a custom model and the set_qconfig method to configure the quantization method. The following points need to be noted when writing the models.

  • HAT will only call the fuse_model method of the outermost module, so the implementation of fuse_model is responsible for the fuse of all submodules.

  • Preference should be given to the base modules provided in hat.models.base_modules, which has already implemented the fuse_model method to reduce the effort and development difficulties.

  • Model registration, all the modules in HAT use the registration mechanism, only when the defined model is registered in the corresponding registration item, can the model be used in the config file as dict(type={$class_name}, ...) .

  • The set_qconfig method needs to be implemented in the outermost module. If there is a special layer in a submodule that needs a separate QConfig setting, the set_qconfig method needs to be implemented in that submodule as well, details of which can be found in the Writing Specifications of set_qconfig and Customization of qconfig sections.

In addition, to make the model transferable to a quantized model, some conditions need to be met, as described in the documentation for horizon_plugin_pytorch.

Introduction to Quantized Training Process

Adding Custom Models

import torch from torch import nn from hat.registry import OBJECT_REGISTRY # Registering the model using decorators @OBJECT_REGISTRY.register_module class ExampleNet(nn.Module): def __init__(self): ... def forward(self, x): ... def fuse_model(self): # You need to call the fuse_model method of all the submodules if hasattr(self.submodule, "fuse_model"): self.submodule.fuse_model() # Descriptions on the fuse interface can be found in the horizon_plugin_pytorch document ... def set_qconfig(self): # Descriptions on the interface of the specific model quantization configuration can be found in the horizon_plugin_pytorch document from hat.utils import qconfig_manager # The default QConfig is obtained by using qconfig_manager.get_default_qat_qconfig() self.qconfig = qconfig_manager.get_default_qat_qconfig() # For submodules requiring special handling, call the submodule's set_qconfig # For set_qconfig of the submodules, you only need to implement the QConfig settings for the special layer if hasattr(self.submodule, "set_qconfig"): self.submodule.set_qconfig() # If a special node does not require any QConfig setting, such as loss, you need to set its QConfig to None if self.loss is Not None: self.loss.qconfig = None ...

Add The Config File

ckpt_dir = ... model = dict(type="ExampleNet") float_trainer = dict( type="distributed_data_parallel_trainer", model=model, data_loader=..., optimizer=..., batch_processor=..., num_epochs=..., device=None, callbacks=..., ..., ) qat_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", converters=[ dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "float-checkpoint-best.pth.tar" ), ), dict(type="Float2QAT"), ], ), data_loader=..., optimizer=..., batch_processor=..., num_epochs=..., device=None, callbacks=..., ..., ) val_callback = dict( type="Validation", data_loader=..., batch_processor=..., callbacks=[val_metric_updater, ...], ) trace_callback = dict( type="SaveTraced", save_dir=ckpt_dir, trace_inputs=deploy_inputs, ) ckpt_callback = dict( type="Checkpoint", save_dir=ckpt_dir, name_prefix=training_step + "-", strict_match=True, mode="max", ) int_trainer = dict( type="Trainer", model=deploy_model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", converters=[ dict(type="Float2QAT"), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "qat-checkpoint-best.pth.tar" ), ), dict(type="QAT2Quantize"), ], ), # Int_trainer contains no training process data_loader=None, optimizer=None, batch_processor=None, num_epochs=0, ################################ device=None, callbacks=[ ckpt_callback, trace_callback, ], )

Training

You only need to simply specify the training phases in order when using the tools/train.py script, and the corresponding solver will be called automatically according to the training phase to execute the training process.

python3 tools/train.py --stage float ... python3 tools/train.py --stage qat ... python3 tools/train.py --stage int_infer ...
  • float:normal floating-point training.
  • qat:QAT training (Quantized Awareness Training), this stage first initializes a floating-point model, loads the trained floating-point model weights, and then converts this floating-point model into a QAT model for training.
  • int_infer:Fixed-point transformation prediction, this stage first initializes a floating-point model, converts the floating-point model into a QAT model and loads the trained QAT model weights, and then converts the QAT model into a fixed-point model. The converted fixed-point model cannot be trained and can only perform validation to obtain the final fixed-point model accuracy.

Resume Training

Unexpectedly interrupted training can be resumed by configuring the resume_optimizer and resume_epoch_or_step fields in {stage}_trainer of config, or by resuming only the optimizer for fine-tuning. For example:

float_trainer = dict( ... model_convert_pipeline=dict( ... converters=[ dict( type="LoadCheckpoint", checkpoint_path="your_checkpoint_path", # checkpoint path ), ], ), resume_optimizer=True, # resume optimizer resume_epoch_or_step=True, # resume epoch and step ..., )

Training recovery has three scenarios:

  1. Full Recovery: This scenario is to resume the training that was unexpectedly interrupted, and will restore all the states of the previous checkpoint, including optimizer, LR, epoch, step, and so on. In this scenario, you only need to configure the resume_optimizer field.

  2. Resume Optimizer for Fine-tuning: This scenario will only restore the state of optimizer and LR, with epoch and step reset to 0 for the fine-tuning of certain tasks. In this scenario, you need to configure both resume_optimizer and resume_epoch_or_step=False.

  3. Load Model Parameters Only: This scenario loads only model parameters and does not restore any other state (optimizeizer, epoch, step, or LR). In this scenario, you need to configure LoadCheckpoint in model_convert_pipeline, resume_optimizer=False, and resume_epoch_or_step=False.

QAT Mode

Effects

Qat_mode is used to set whether to perform the quantization training with BN in the QAT phase. With the help of the FuseBN interface provided by HAT, it can also control whether to perform the training with BN throughout the whole process or with BN being gradually absorbed midway.

Optional Definitions

The following three settings are available for qat_mode:

class QATMode(object): FuseBN = "fuse_bn" WithBN = "with_bn" WithBNReverseFold = "with_bn_reverse_fold"

Principles

Fuse BN

QAT Phase without BN (default quantization training method of HAT)

By setting qat_mode to fuse_bn, in the op fusion process of the floating-point model, the weight and bias of BN are absorbed into that of Conv, and the original combination of Conv + BN will be left with only Conv, and this absorption process is theoretically error-free.

With BN

QAT Phase with BN

By setting qat_mode to with_bn, when the floating-point model is converted to QAT model, BN is not absorbed into Conv, but exists in the quantized model as a fused quantized op in the QAT phase in the form of Conv + BN + Output Quantized Node. Finally, at the end of quantization training, in the step where the model is converted to quantized (also called int infer), the weight and bias of BN will be automatically absorbed into the quantization parameters of Conv, where the quantized op obtained after the absorption remains consistent with the original QAT op calculation result.

In this mode, the user can also choose to absorb the BN into Conv in the middle of QAT. The reason why the forward results of the QAT model before and after user manually absorbing the BN are inconsistent is that after the BN weight is absorbed into the Conv weight, the quantized parameter conv_weight_scale calculated in the previous quantization training is no longer applicable to the current conv_weight and will lead to large errors in the quantization of conv_weight, which requires more quantization training and more adjustments on quantization parameters.

With BN Reverse Fold

QAT Phase with BN

The difference between this mode and with_bn is that, in this mode, the BN weight is considered when calculating conv_weight_scale in the quantization training phase before the BN is absorbed (calculations are not detailed here), so that after absorbing the BN weight, the conv_weight_scale is still applicable to the new conv_weight.

This mode is intended to provide a lossless way of absorbing BNs step by step: absorbing BNs in the middle of the quantization training, the forward result of the model is theoretically identical before and after the absorption, and the user can gradually absorb all the BNs in the model before the end of quantization training and ensure that the loss will not fluctuate too much after each absorption.

In this mode, if there are BNs not absorbed at the end of the quantization training, they will be automatically absorbed when the model is converted from QAT to quantized. In theory, such absorption is lossless.

Usage

Set qat_mode

The user only needs to set qat_mode in model_convert_pipeline.

For example:

model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="with_bn", converters=[ dict(type="Float2QAT"), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "qat-checkpoint-best.pth.tar" ), ), ], )

View Current qat_mode

from horizon_plugin_pytorch.qat_mode import get_qat_mode qat_mod = get_qat_mode()

Set Progressive Absorption BN

In both with_bn and with_bn_reverse_fold modes, you can set FuseBN as a callback function to absorb the BN in the specified module at the specified epoch or step.

FuseBN definition:

class FuseBN(OnlineModelTrick): Args: module: sub model names to fuse BN. step_or_epoch: when to fuseBN, same length as module. update_by: by step or by epoch. inplace: if fuse BN inplace def __init__( self, modules: List[List[str]], step_or_epoch: List[int], update_by: str, inplace: bool = False, )

Use the FuseBN example in the config file:

from hat.callbacks import FuseBN # Define the callback function # BN in the backbone module is absorbed at the 1000th step # BN in the neck module is absorbed at the 1500th step fuse_bn_callback = FuseBN( modules=[['backbone'], ['neck']], step_or_epoch=[1000, 1500], update_by='step', ) # Add the callback function to the trainer qat_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", converters=[ dict(type="Float2QAT"), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "qat-checkpoint-best.pth.tar" ), ), ], ), data_loader=..., optimizer=..., batch_processor=..., num_epochs=..., device=None, callbacks=[ callbacks0, ..., fuse_bn_callback, callbacks99 ], ..., )

Qat_mode Summary

QAT ModeBN Absorbed TimeBN Absorbing MethodForward Result Changes After Absorption (Theoretically )?
fuse_bnMust be in the floating-point model op fusion processAbsorbed after executing fuse_moduleNo Changes
with_bnCan be in the middle of quantized training processBy setting a callback function to absorb in the specified epoch or batchYes
with_bnCan be in the conversion process of the model from QAT to quantizedAuto completes with the model conversionYes
with_bn_reverse_foldCan be in the middle of quantized training processBy setting a callback function to absorb in the specified epoch or batchYes
with_bn_reverse_foldCan be in the conversion process of the model from QAT to quantizedAuto completes with the model conversionYes

In general, a training process starts from the floating-point training, and when the desired accuracy is met, move on to the quantization training, where only fuse_bn is used. Only when the floating-point training is skipped, i.e., it starts with the quantization training, the quantized training mode with BN is needed to ensure the model converges.

Note

The reason why we say "theoretically lossless before and after absorption" or "no change" in this document is that because there is a low probability that the results of the two floating-point calculations before and after the absorption will not match at the later decimal places in the actual calculation. The small variation combined with the quantization operation may result in an absolute error in the output scale of some values of Conv after absorbing BN compared to the output of Conv + BN before absorbing.