This tutorial uses Deform-detr as an example of how to train a detection model for a fixed point using the HAT algorithm package. Before starting Quantized Awareness Training (QAT), which is also known as fixed-point model training, you first need to train a pure floating-point model with high accuracy, and then you can quickly train a fixed-point model by doing finetune based on this pure floating-point model. So we start by training a pure floating-point Deform-detr model.
As with other tasks, for all training and evaluation tasks, HAT is unified in the form of tools + config.
After preparing the raw dataset, the following process can be used to conveniently completes the entire training process.
Before we start training the model, the first step is to prepare the dataset, here we download MSCOCO's train2017.zip and val2017.zip as the training and validation sets for the network, and we also need to download the corresponding labeled data annotations_trainval2017.zip , and the structure of the data directory after unzipping is shown below:
Meanwhile, in order to improve the speed of training, we have done a packing of the original dataset in jpg format and converted it to lmdb format dataset. Simply run the following script for successful conversion:
The above two commands correspond to converting the training dataset and the validation dataset, respectively. After the packaging is completed, the file structure in the data directory should be as follows:
train_lmdb and val_lmdb are the packaged training dataset and validation dataset, and the final dataset read by the network.
Once the dataset is ready, it is time to start training the floating-point Deformable Detr detection network. You can use the following commands to test the amount of computation and the number of parameters of the network first before the network training starts:
If you simply want to start such a training task, just run the following command:
The above commands complete the training of floating-point model and fixed-point model respectively, where the training of fixed-point model needs to be based on the trained floating-point model. For details, please refer to Quantized Awareness Training (QAT) section.
After completing the quantization training, you can start exporting the fixed-point model. It can be exported with the following command:
After completing the training, the trained floating-point, quantized or fixed-point model can be obtained.
Similar to the training method, we can use the same method to do metrics validation on the trained model, obtaining metrics as Float, Calibration, and Quantized,
which are floating-point, quantized, and fully fixed-point metrics, respectively.
Similar to training models, --stage followed by "float" and "calibration" can be used to validate trained floating-point models and quantized models, respectively.
HAT provides the infer_hbir.py script to provide a visual presentation of the inference results from the fixed-point model:
In addition to the model validation described above, we also provide an accuracy validation method that is identical to that of the upper board, which can be accomplished in the following manner:
The quantization training toolchain integrated in HAT is primarily intended for Horizon's computing platforms, and is therefore necessary for the checking and compilation of quantization models.
We provide an interface for model checking in HAT, which allows the user to define a quantitative model and then check whether it can work properly on BPU:
After the model has been trained, the compile_perf_hbir script can be used to compile the quantization model into a hbm file that can be run on the board,
and the tool can also predict the performance of the run on the BPU:
The above is the whole process from data preparation to generation of quantized deployable models.
The network structure of Deformable DETR can be found in Paper, which is not described in detail here.
We can easily define and modify the model by defining a dict variable like model in the config file.
Where type under model indicates the name of the defined model, and the remaining variables indicate the other components of the model.
The advantage of defining the model this way is that we can easily replace the structure we want.
For example, if we want to train a model where backbone is efficientnet, just replace backbone under model and set neck accordingly.
As with the definition of model, the process of data enhancement is realized by defining the dicts data_loader and val_data_loader in the config configuration file,
which correspond to the processing flow of the training set and validation set, respectively. Take the data_loader as an example:
Where type directly uses pytorch's own interface torch.utils.data.DataLoader, which represents the combination of pictures of batch_size size.
The only thing to focus on here is probably the dataset variable, CocoFromLMDB, which indicates the image to be read from the lmdb dataset.
The path is the same path we mentioned in the first part of the dataset preparation.
A number of data enhancements are included under transforms. The val_data_loader imposes no data enhancements.
You can also achieve your desired data enhancement by inserting new dicts into transforms.
In order to train a model with high accuracy, a good training strategy is essential.
For each training task, the corresponding training strategy is also defined in one of the config files. This can be seen in the variable float_trainer.
The float_trainer defines our training approach in the big picture, including the use of multi-card distributed training (distributed_data_parallel_trainer), the number of epochs for which the model will be trained, and the choice of optimizer.
One of the optimizers is the AdamW optimizer wrapped by custom_param_optimizer, which allows for finer control of optimization parameters for backbone and norm layers in the model.
The callbacks also represent the small strategies used by the model during training and the actions that the user wants to implement, including the way the learning rate is transformed (StepDecayLrUpdater),
the way the model is validated during training (Validation), and the action of saving (Checkpoint) the model.
Of course, if you have your own operations that you want the model to implement during training, you can also add them in this dict way.
If you need to reproduce the accuracy, the training strategy in config is best left unchanged. Otherwise unexpected training situations may occur.
With the above introduction, you should have a clearer understanding of the function of the config file. Then with the training script mentioned earlier, a highly accurate pure floating-point detection model can be trained. Of course, training a good detection model is not our ultimate goal, it just serves as a pretrain for us to train the fixed-point model later.
Once we have a pure floating-point model, we can start training the corresponding fixed-point model. In the same way as the floating point training, we can train the fixed point model simply by running the following script.
As you can see, our configuration file has not changed, only the type of stage.
Deformable DETR is recommended to be added to the calibration process during quantization training.
Calibration can provide a better initialization parameter for quantization training of QAT.
At this point we use the training strategy from the calibration_trainer and qat_trainer in the config file.
We first obtain a good initialization parameter for QAT through calibration, calibration loads the trained floating-point model, and then searches for quantization parameters using mse's calibration, where Float2Calibration converts the model from a floating-point model to a calibrated model.
Next, we perform QAT quantization training based on the calibration model.
The initial learning rate is set to one-tenth of the floating-point training, and the number of training epochs is greatly reduced.
Notice that Float2QAT in converter converts the model from a floating point model to a QAT model and then the loaded calibration model weights.
After quantization training, the accuracy of the Deformable DETR quantization model can reach more than 99% of the accuracy of the floating-point model.