FCOS Detection Model Training

This tutorial shows how to train a fixed-point detection model using the HAT algorithm toolkit by using FCOS-efficientnet as an example. Fcos suggests adding the calibration process in the quantization aware training. Calibration can provide a better initialization parameter for quantization aware training.

Before starting the quantization aware training, namely, the fixed-point model training, you need to first train a pure floating-point model with high accuracy. Then, by finetuning this pure floating-point model, you can train the fixed-point model quickly.

Let's start from training a pure floating-point FCOS-efficientnet model.

Training Process

If you just want to simply train the GaNet model, then you can read this section first.

Similar to other tasks, HAT performs all training tasks and evaluation tasks in the form of tools + config.

After preparing the original dataset, take the following process to complete the whole training process.

Dataset Preparation

Before starting to train the model, the first stage is to prepare the dataset, here we download MSCOCO's train2017.zip and val2017.zip as the training and validation sets for the network, and we need to download the corresponding label data annotations_trainval2017.zip.

After unpacking, the data directory structure is shown as below:

tmp_data |-- mscoco |-- annotations_trainval2017.zip |-- train2017.zip |-- val2017.zip |-- annotations |-- train2017 |-- val2017

Also, to improve the training speed, we packaged the original JPG format dataset and converted it to the LMDB format. You can perform the conversion by simply running the following script:

python3 tools/datasets/mscoco_packer.py --src-data-dir ./tmp_data/mscoco/ --target-data-dir ./tmp_data/mscoco --split-name train --pack-type lmdb python3 tools/datasets/mscoco_packer.py --src-data-dir ./tmp_data/mscoco/ --target-data-dir ./tmp_data/mscoco --split-name val --pack-type lmdb

These two commands are for training dataset conversion and validation dataset conversion respectively. When the packing is done, the file structure of data should look as follows:

tmp_data |-- mscoco |-- annotations |-- train2017 |-- train_lmdb |-- val2017 |-- val_lmdb

The above train_lmdb and val_lmdb are the packed training dataset and validation dataset, which are also the final datasets read by the network.

Floating-point Model Training

Once datasets are ready, you can start training the floating-point FCOS-efficientnet detection network. Before the network training starts, you can first test the number of operations and parameters of the network using the following commands:

python3 tools/calops.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --input-shape "1,3,1024,1024"

If you simply want to start such a training task, just run the following command:

python3 tools/train.py --stage float --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Since the HAT algorithm toolkit uses an ingenious registration mechanism, each training task can be started in the form of train.py plus a config file.

train.py is a uniform training script and independent of the task. The task we need to train, the dataset we need to use, and the hyperparameters we need to set for the training are all in the specified config file.

The config file provides the key dict for model building, data reading, etc.

Export FixedPoint Model

Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:

python3 tools/export_hbir.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Model Verification

After completing the training, we get the trained floating-point, quantitative, or fixed-point model. Similar to the training method, we can use the same method to complete metrics validation on the trained model and get the metrics of Float, Calibration, and Quantized, which are floating-point, quantitative, and fully fixed-point metrics, respectively.

python3 tools/predict.py --stage "float" --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py python3 tools/predict.py --stage "calibration" --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Similar to the model training, we can use --stage followed by "float", "calibration", to validate the trained floating-point model, and quantitative model, respectively.

The following command can be used to verify the accuracy of a fixed-point model, but it should be noted that hbir must be exported first:

python3 tools/predict.py --stage "int_infer" --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Model Inference

HAT provides the infer_hbir.py script to visualize the inference results for the fixed-point model:

python3 tools/infer_hbir.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --model-inputs img:${img-path} --save-path ${save_path}

Simulation Board Accuracy Verification

In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:

python3 tools/validation_hbir.py --stage "align_bpu" --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Fixed-point Model Checking and Compilation

As the quantitative training toolchain integrated in HAT is mainly prepared for Horizon's processors, it is a must to check and compile the quantitative models.

We provide an interface for model checking in HAT, which allows the user to define a quantitative model and then check whether it can work properly on the BPU first.

python3 tools/model_checker.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into an HBM file that supports on-board running. The tool can also predict the performance on the BPU.

python3 tools/compile_perf_hbir.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

The above is the whole process from data preparation to the generation of quantitative and deployable models.

Training Details

In this note, we explain some things that need to be considered for model training, mainly including settings related to config.

Model Building

The network structure of fcos can be found in the Paper and here we will skip the details.

We can easily define and modify the model by defining a dict type variable like model in the config file.

model = dict( type="FCOS", backbone=dict( type="efficientnet", bn_kwargs=bn_kwargs, model_type='b0', num_classes=1000, include_top=False, activation='relu', use_se_block= False, ), neck=dict( type="BiFPN", in_strides=[2, 4,8, 16, 32], out_strides=[8, 16, 32, 64, 128], stride2channels=dict({2:16, 4: 24, 8:40, 16:112, 32:320}), out_channels=64, num_outs=5, stack=3, start_level=2, end_level=-1, fpn_name="bifpn_sum" ), head=dict( type="FCOSHead", num_classes=num_classes, in_strides=[8, 16, 32, 64, 128], out_strides=[8, 16, 32, 64, 128], stride2channels = dict({8:64, 16:64, 32:64, 64:64, 128:64}), upscale_bbox_pred=True, feat_channels=64, stacked_convs=4, int8_output=False, int16_output=True, dequant_output=True ), targets=dict( type="DynamicFcosTarget", strides=[8, 16, 32, 64, 128], cls_out_channels=80, background_label=80, topK=10, loss_cls=dict( type="FocalLoss", loss_name="cls", num_classes=80 + 1, alpha=0.25, gamma=2.0, loss_weight=1.0, reduction="none" ), loss_reg=dict( type="GIoULoss", loss_name="reg", loss_weight=2.0, reduction="none" ), ), post_process=dict(type="FCOSDecoder", num_classes=80, strides=[8, 16, 32, 64, 128], nms_use_centerness=True, nms_sqrt=True, rescale=True, test_cfg =dict(score_thr=0.05, nms_pre=1000, nms=dict(name = 'nms', iou_threshold=0.6, max_per_img=100) ) ), loss_cls=dict( type="FocalLoss", loss_name="cls", num_classes=80 + 1, alpha=0.25, gamma=2.0, loss_weight=1.0, ), loss_centerness=dict( type="CrossEntropyLossV2", loss_name="centerness", use_sigmoid=True ), loss_reg=dict( type="GIoULoss", loss_name="reg", loss_weight=1.0, ) )

The type under model is the name of the defined model, and the remaining variables stand for the other components of the model.

By defining the model in this way, we can easily replace the structure we want. For example, if we want to train a model with a backbone of resnet50, we just need to replace backbone under model.

Data Enhancement

Like the definition of model, the data enhancement process is implemented by defining two dicts, data_loader and val_data_loader, in the config file, corresponding to the processing process of the training set and validation set, respectively.

Take data_loader as an example:

dataset=dict( type="Coco", data_path="./tmp_data/coco/train_lmdb/", transforms=[ dict( type="Resize", img_scale=(512, 512), ratio_range=(0.5, 2.0), keep_ratio=True, ), dict( type="RandomCrop", size=(512, 512) ), dict( type="Pad", divisor=512, ), dict( type="RandomFlip", px=0.5, py=0, ), dict( type = "AugmentHSV", hgain=0.015, sgain=0.7, vgain=0.4 ), dict( type="ToTensor", to_yuv=True, ), dict( type="Normalize", mean=128.0, std=128.0, ), ], ), sampler=dict(type=torch.utils.data.DistributedSampler), batch_size=batch_size_per_gpu, shuffle=False, num_workers=4, pin_memory=True, collate_fn=dict(type="Collate2D"), )

Where the type directly uses Pytorch's own interface torch.utils.data.DataLoader, which means to combine images of the size batch_size together. The only thing to note here is probably the dataset variable. CocoFromLMDB means to read images from the LMDB dataset, and the path uses the one mentioned in the first section Dataset Preparation. The transforms contains a series of data enhancements. The data transformation in val_data_loader is the same as data_loader except for image flipping (RandomFlip). You can also impelement your own data enhancements by inserting new dict in transforms.

Training Strategy

A good training strategy is essential for training a model with high accuracy.

For each training task, the corresponding training strategy is also defined in the config file, as can be seen from the float_trainer variable.

float_trainer = dict( type='distributed_data_parallel_trainer', model=model, data_loader=data_loader, optimizer=dict( type=torch.optim.SGD, params={"weight": dict(weight_decay=4e-5)}, lr=0.14, momentum=0.937, nesterov=True, ), batch_processor=batch_processor, num_epochs=300, device=None, callbacks=[ stat_callback, loss_show_update, dict(type="ExponentialMovingAverage"), dict( type="CosLrUpdater", warmup_len=2, warmup_by="epoch", stage_log_interval=1, ), val_callback, ckpt_callback, ], train_metrics=dict( type="LossShow", ), sync_bn=True, val_metrics=dict( type="COCODetectionMetric", ann_file="./tmp_data/coco/annotations/instances_val2017.json", ) )

float_trainer defines the training approach in general, including the use of multi-card distributed training (distributed_data_parallel_trainer), the number of epochs for model training, and the choice of optimizer.

The callbacks reflects the small strategies used by the model during the training and the operations that the user wants to implement, including the way to transform the learning rate (WarmupStepLrUpdater), the metrics to validate the model during training (Validation), and the operations to save (Checkpoint) the model.

Of course, if you have your own operations that you want the model to implement during the training, you can also add them in this way using dict.

The float_trainer is responsible for linking the entire training logic together, which will also be responsible for the pretrain of the model.

Note

If you need reproduce the accuracy, then better not change the training strategy in config to avoid unexpected training situations.

With the above information, you should have a clearer understanding of the functions of the config file.

The training script mentioned earlier can help you train a pure floating point detection model with high accuracy. However, a well-trained detection model is not our goal, it is just a pretrain for us to train a fixed-point model later.

Quantitative Model Training

With a pure floating-point model in place, we can start training the corresponding fixed-point model. Similar to the floating-point training, we can train the fixed-point model simply by running the following script.

By the way, it is recommended to add a calibration stage before quantization aware training. Calibration can provide better initialization parameters for QAT.

python3 tools/train.py --stage calibration --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py python3 tools/train.py --stage qat --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

As you can see, the configuration file is not changed except the stage type. At this point, the training strategy we use comes from the qat_trainer in the config file.

qat_trainer = dict( type='distributed_data_parallel_trainer', model=model, data_loader=data_loader, optimizer=dict( type=torch.optim.SGD, params={"weight": dict(weight_decay=4e-5)}, lr=0.001, momentum=0.9, ), batch_processor=batch_processor, num_epochs=10, device=None, callbacks=[ stat_callback, loss_show_update, dict( type="StepDecayLrUpdater", lr_decay_id=[4], stage_log_interval=500, ), val_callback, ckpt_callback, ], train_metrics=dict( type="LossShow", ), val_metrics=dict( type="COCODetectionMetric", ann_file="./tmp_data/coco/annotations/instances_val2017.json", ) )

With Different Quantitative Parameters

By setting quantize=True when training quantitative models, the corresponding floating-point model can be converted into a quantitative model, as below:

model.fuse_model() model.set_qconfig() horizon.quantization.prepare_qat(model, inplace=True)

For key stages in quantitative training, e.g., preparing the floating-point model, operator substitution, inserting quantization and inverse quantitative nodes, setting quantitative parameters, and operator fusion, etc., please read the Quantized Awareness Training (QAT) section.

With Different Training Strategies

As previously mentioned, the quantitative training is actually the finetuning based on the pure floating-point training. Therefore, in the quantitative training, we set the initial learning rate to one-tenth of the floating-point training, and the number of epochs of the training will greatly decrease as well. More importantly, when defining model, we need to set pretrained to the address of the trained pure floating-point model.

After these simple adjustments, we can start training our quantitative model.

Model Validation

After the model is trained, we can also validate the performance of the trained model. Since we provide two stages of training, float and QAT, we can validate the performance of the trained model in these two stages.

Run the following two commands:

python3 tools/predict.py --stage float --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --ckpt ${float-checkpoint-path} python3 tools/predict.py --stage qat --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --ckpt ${qat-checkpoint-path}

Also, we provide performance tests for the quantitative model by running the following command:

python3 tools/predict.py --stage int_infer configs/detection/fcos/fcos_efficientnetb0_mscoco.py --ckpt ${int-infer-checkpoint-path}

This displayed accuracy is the real accuracy of the final int8 model, which of course should be very close to the accuracy of the QAT verification phase.

Simulation of On-board Accuracy Validation

In addition to the above model validation, we also provide the exact same accuracy validation method simulating the on-board conditions, as below:

python3 tools/align_bpu_validation.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py

Result Visualization

If you want to see the results of the trained model detecting a single image, we also provide scripts for single image prediction and visualization under our tools folder. Run the following script:

python3 tools/infer.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --model-inputs imgs:${img-path} --save-path ${save_path}

Model Checking and Compilation

After the training, you can use the compile tool to compile the quantitative model into a board-ready HBM file. The compile tool can also predict the on-computing-platform running performance. Run the following script:

python3 tools/compile_perf.py --config configs/detection/fcos/fcos_efficientnetb0_mscoco.py --out-dir ./ --opt 3