Bev Multi-task Model Training

The BEV reference algorithm is developed based on Horizon Torch Samples (Horizon's own deep learning framework), and you can refer to the Horizon Torch Samples usage documentation for an introduction to the use of Horizon Torch Samples. The training config for the BEV reference algorithm is located under the HAT/configs/bev/ path. The following part takes HAT/configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py as an example to describe how to configure and train the BEV reference algorithm.

Training Process

If you just want to simply train the BEV model, then you can read this section first.

Similar to other tasks, HAT performs all training tasks and evaluation tasks in the form of tools + config.

After preparing the original dataset, take the following process to complete the whole training process.

Dataset Preparation

Here is an example of the nuscense dataset, which can be downloaded from https://www.nuscenes.org/nuscenes . Also, in order to improve the speed of training, we have done a packing of the original jpg format dataset to convert it to lmdb format. Just run the following script and it will be successful to achieve the conversion.

python3 tools/datasets/nuscenes_packer.py --src-data-dir WORKSAPCE/datasets/nuscenes/ --pack-type lmdb --target-data-dir . --version v1.0-trainval --split-name val python3 tools/datasets/nuscenes_packer.py --src-data-dir WORKSAPCE/datasets/nuscenes/ --pack-type lmdb --target-data-dir . --version v1.0-trainval --split-name train

The above two commands correspond to transforming the training dataset and the validation dataset, respectively. After the packing is completed, the file structure in the data directory should look as follows.

tmp_data |-- nuscenes |-- metas |-- v1.0-trainval |-- train_lmdb |-- val_lmdb

The train_lmdb and val_lmdb are the training and validation datasets after packaging, and are the datasets that the network will eventually read. metas is the map information needed for the segmentation model.

Model Training

Before the network starts the training, you can first calculate the number of network operations and parameters using the following command:

python3 tools/calops.py --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

The next step is to start the training. Training can also be done with the following script. Before training, you need to make sure that the dataset path specified in the configuration has already been changed to the path of the packaged dataset.

python3 tools/train.py --stage "float" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py python3 tools/train.py --stage "calibration" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

Since the HAT algorithm package uses the registration mechanism, it allows each training task to be started in the form of train.py plus a config file. The train.py is a uniform training script and independent of the task, and the tasks we need to train, the datasets we need to use, and the hyperparameter settings related to training are all in the specified config file.

The parameters after --stage in the above command can be "float", "calibration", which, respectively, indicates the training of the floating-point model and the quantitative model, and the conversion of the quantitative model to the fixed-point model, where the training of the quantitative model depends on the floating-point model produced by the previous floating-point training.

Export FixedPoint Model

Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:

python3 tools/export_hbir.py --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

Model Verification

After completing the training, we get the trained floating-point, quantitative, or fixed-point model. Similar to the training method, we can use the same method to complete metrics validation on the trained model and get the metrics of Float, Calibration, and Quantized, which are floating-point, quantitative, and fully fixed-point metrics, respectively.

python3 tools/predict.py --stage "float" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py python3 tools/predict.py --stage "calibration" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

Similar to the model training, we can use --stage followed by "float", "calibration", to validate the trained floating-point model, and quantitative model, respectively.

The following command can be used to verify the accuracy of a fixed-point model, but it should be noted that hbir must be exported first:

python3 tools/predict.py --stage "int_infer" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

Model Inference

HAT provides the infer_hbir.py script to visualize the inference results for the fixed-point model:

python3 tools/infer_hbir.py --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py --model-inputs img:${img-path} --save-path ${save_path}

Simulation Board Accuracy Verification

In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:

python3 tools/validation_hbir.py --stage "align_bpu" --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

Fixed-point Model Checking and Compilation

As the quantitative training toolchain integrated in HAT is mainly prepared for Horizon's processors, it is a must to check and compile the quantitative models.

We provide an interface for model checking in HAT, which allows the user to define a quantitative model and then check whether it can work properly on the BPU first.

python3 tools/model_checker.py --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into an HBM file that supports on-board running. The tool can also predict the performance on the BPU.

python3 tools/compile_perf_hbir.py --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

The above is the whole process from data preparation to the generation of quantitative and deployable models.

Training Details

In this note, we explain some things that need to be considered for model training, mainly including settings related to config.

Model Construction

model = dict( type="BevStructure", bev_feat_index=-1, backbone=dict( type="efficientnet", bn_kwargs=bn_kwargs, model_type="b0", num_classes=1000, include_top=False, activation="relu", use_se_block=False, ), neck=dict( type="FastSCNNNeck", in_channels=[40, 320], feat_channels=[64, 64], indexes=[-3, -1], bn_kwargs=bn_kwargs, ), view_transformer=dict( type="WrappingTransformer", bev_upscale=2, bev_size=bev_size, num_views=6, drop_prob=0.1, grid_quant_scale=grid_quant_scale, ), bev_transforms=[ dict( type="BevRotate", bev_size=bev_size, rot=(-0.3925, 0.3925), ), dict(type="BevFlip", prob_x=0.5, prob_y=0.5, bev_size=bev_size), ], bev_encoder=dict( type="BevEncoder", backbone=dict( type="efficientnet", bn_kwargs=bn_kwargs, model_type="b0", num_classes=1000, include_top=False, activation="relu", use_se_block=False, in_channels=64, ), neck=dict( type="BiFPN", in_strides=[2, 4, 8, 16, 32], out_strides=[2, 4, 8, 16, 32], stride2channels=dict({2: 16, 4: 24, 8: 40, 16: 112, 32: 320}), out_channels=48, num_outs=5, stack=3, start_level=0, end_level=-1, fpn_name="bifpn_sum", ), ), bev_decoders=[ dict( type="BevSegDecoder", name="bev_seg", use_bce=use_bce, task_weight=10.0, bev_size=bev_size, task_size=task_map_size, head=dict( type="DepthwiseSeparableFCNHead", input_index=0, in_channels=48, feat_channels=48, num_classes=seg_classes, dropout_ratio=0.1, num_convs=2, bn_kwargs=bn_kwargs, ), target=dict( type="FCNTarget", ), loss=dict( type="CrossEntropyLossV2", loss_name="decode", reduction="mean", ignore_index=-1, use_sigmoid=use_bce, class_weight=2.0 if use_bce else [1.0, 5.0, 5.0, 5.0], ), decoder=dict( type="FCNDecoder", upsample_output_scale=1, use_bce=use_bce, bg_cls=-1, ), ), dict( type="BevDetDecoder", name="bev_det", task_weight=1.0, head=dict( type="CenterPoint3dHead", in_channels=48, tasks=tasks, share_conv_channels=48, share_conv_num=1, common_heads=dict( reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2), ), head_conv_channels=48, num_heatmap_convs=2, final_kernel=3, ), target=dict( type="CenterPoint3dTarget", class_names=NuscenesDataset.CLASSES, tasks=tasks, gaussian_overlap=0.1, min_radius=2, out_size_factor=1, norm_bbox=True, max_num=500, bbox_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2], ), loss_cls=dict(type="GaussianFocalLoss", loss_weight=1.0), loss_reg=dict( type="L1Loss", loss_weight=0.25, ), decoder=dict( type="CenterPoint3dDecoder", class_names=NuscenesDataset.CLASSES, tasks=tasks, bev_size=bev_size, out_size_factor=1, use_max_pool=True, max_pool_kernel=3, score_threshold=0.1, nms_type=[ "rotate", "rotate", "rotate", "circle", "rotate", "rotate", ], min_radius=[4, 12, 10, 1, 0.85, 0.175], nms_threshold=[0.2, 0.2, 0.2, 0.2, 0.2, 0.5], decode_to_ego=True, ), ), ] )

Where type under model indicates the name of the defined model, and the remaining variables indicate the other components of the model. The advantage of defining the model this way is that we can easily replace the structure we want. For example, if we want to train a model with a backbone of resnet50, we just need to replace backbone under model .

Data Augmentation

Like the definition of model, the data enhancement process is implemented by defining two dicts data_loader and val_data_loader in the config configuration file, corresponding to training set and the processing flow of the validation set. Take data_loader as an example.

data_loader = dict( type=torch.utils.data.DataLoader, dataset=dict( type="NuscenesDataset", data_path=os.path.join(data_rootdir, "train_lmdb"), transforms=[ dict(type="BevImgResize", scales=(0.6, 0.8)), dict(type="BevImgCrop", size=(512, 960), random=True), dict(type="BevImgFlip", prob=0.5), dict(type="BevImgRotate", rot=(-5.4, 5.4)), dict( type="BevImgTransformWrapper", transforms=[ dict(type="PILToTensor"), dict(type="BgrToYuv444", rgb_input=True), dict(type="Normalize", mean=128.0, std=128.0), ], ), ], bev_size=bev_size, map_size=map_size, map_path=meta_rootdir, ), sampler=dict(type=torch.utils.data.DistributedSampler), batch_size=batch_size_per_gpu, shuffle=True, num_workers=dataloader_workers, pin_memory=True, collate_fn=hat.data.collates.collate_nuscenes, ) val_data_loader = dict( type=torch.utils.data.DataLoader, dataset=dict( type="NuscenesDataset", data_path=os.path.join(data_rootdir, "val_lmdb"), transforms=[ dict(type="BevImgResize", size=(540, 960)), dict(type="BevImgCrop", size=(512, 960)), dict( type="BevImgTransformWrapper", transforms=[ dict(type="PILToTensor"), dict(type="BgrToYuv444", rgb_input=True), dict(type="Normalize", mean=128.0, std=128.0), ], ), ], bev_size=bev_size, map_size=map_size, map_path=meta_rootdir, ), sampler=dict(type=torch.utils.data.DistributedSampler), batch_size=batch_size_per_gpu, shuffle=False, num_workers=dataloader_workers, pin_memory=True, collate_fn=hat.data.collates.collate_nuscenes, )

Where type directly uses the interface torch.utils.data.DataLoader that comes with pytorch, which represents the combination of batch_size size images together. The only thing to be concerned about here is probably the dataset variable, CocoFromLMDB means reads the image from the lmdb dataset, and the path is the same path we mentioned in the first part of the dataset preparation. transforms contains a series of data enhancements underneath. Except for the image flip (RandomFlip), the other data transformations of the val_data_loader are the same as data_loader . You can also achieve the data augmentation you want by inserting a new dict in transforms.

Training Strategies

In order to train a model with high accuracy, a good training strategy is essential. For each training task, the corresponding training strategy is defined in the config file as well, as can be seen from the variable float_trainer.

float_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", converters=[ dict( type="LoadCheckpoint", checkpoint_path=( "./tmp_pretrained_models/efficientnet_imagenet/float-checkpoint-best.pth.tar" # noqa: E501 ), allow_miss=True, ignore_extra=True, ), ], ), data_loader=data_loader, optimizer=dict( type=torch.optim.AdamW, params={"weight": dict(weight_decay=weight_decay)}, lr=start_lr, ), batch_processor=batch_processor, device=None, num_epochs=train_epochs, callbacks=[ stat_callback, loss_show_update, dict( type="CyclicLrUpdater", target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, step_log_interval=500, ), val_callback, ckpt_callback, ], sync_bn=True, train_metrics=dict( type="LossShow", ), )

The float_trainer defines our training approach in the big picture, including the use of distributed_data_parallel_trainer, the number of epochs for model training, and the choice of optimizer. Also, the callbacks reflect the small strategies used by the model during training and the operations that the user wants to implement, including the way to transform the learning rate (WarmupStepLrUpdater), the metrics to validate the model during training (Validation), and the operations to save (Checkpoint) the model. Of course, if you have operations that you want the model to implement during training, you can also add them in this dict way.

Note

If reproducibility accuracy is needed, the training strategy in config is best not modified. Otherwise, unexpected training situations may occur.

With the above introduction, you should have a clearer understanding of the functions of the config file. Then you can train a high-precision pure floating-point detection model by the training script mentioned earlier. Of course, training a good detection model is not our ultimate goal, it is only used as a pretrain for us to train a fixed-point model later.

Quantized Model Training

Once we have a pure floating-point model, we can start training the corresponding fixed-point model. As with floating-point training, we only need to run the following script to get a pseudo-quantization model, which can achieve the goal using only calibration.

python3 tools/train.py --stage calibration --config configs/bev/bev_ipm_efficientnetb0_multitask_nuscenes.py

As you can see, our config file has not changed, only the type of stage has been changed. The training strategy we use at this point comes from calibration_trainer in the config file.

calibration_trainer = dict( type="Calibrator", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", converters=[ dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "float-checkpoint-best.pth.tar" ), allow_miss=True, verbose=True, ), dict(type="Float2Calibration", convert_mode=convert_mode), ], ), data_loader=calibration_data_loader, batch_processor=calibration_batch_processor, num_steps=calibration_step, device=None, callbacks=[ val_callback, ckpt_callback, ], log_interval=calibration_step / 10, )

The Value of Quantize Parameter is Different

When we train the quantized model, we need to set quantize=True, at this time the corresponding floating point model will be converted into a quantized model, the relevant code is as follows.

model.fuse_model() model.set_qconfig() horizon.quantization.prepare_qat(model, inplace=True)

For key steps in quantization training, such as preparing the floating-point model, operator substitution, inserting quantization and inverse quantization nodes, setting quantization parameters, and fusing operators, please read the Quantized Awareness Training (QAT) section.

Different Training Strategies

As we said before, quantization training is in fact finetune based on pure floating-point training, so when quantization training, our initial learning rate is set to one-tenth of the floating-point training, the number of epochs for training is largely reduced, most importantly, when defining the model , our pretrained needs to be set to the address of a pure floating-point model that has already been trained.

After making these simple adjustments, we can start training our quantitative model.