lidarMultiTask Model Training

This tutorial focuses on how to use HAT to train a lidarMultiTask model on the lidar point cloud dataset nuscenes from scratch, including floating-point, QAT and quantized models.

Dataset Preparation

Before starting to train the model, the first step is to prepare the dataset, download the full dataset (v1.0) and nuScenes-lidarseg from nuscenes dataset.

After downloading, unzip and organize the folder structure as follows which can refer to nuscenes Tutorials

├── data │ ├── nuscenes │ │ ├── lidarseg │ │ ├── maps │ │ ├── samples │ │ ├── sweeps │ │ ├── v1.0-mini │ │ ├── v1.0-trainval

To improve the speed of training, we did a package of data information files to convert them into lmdb format datasets from nuscenes lidar files. The conversion can be successfully achieved by simply running the following script:

python3 tools/datasets/nuscenes_packer.py --src-data-dir data/nuscenes/ --pack-type lmdb --target-data-dir tmp_data/nuscenes/lidar_seg/v1.0-trainval --version v1.0-trainval --split-name val --only-lidar python3 tools/datasets/nuscenes_packer.py --src-data-dir data/nuscenes/ --pack-type lmdb --target-data-dir tmp_data/nuscenes/lidar_seg/v1.0-trainval --version v1.0-trainval --split-name train --only-lidar

The above two commands correspond to transforming the training dataset and the validation dataset, respectively. After the packing is completed, the file structure in the data directory should look as follows:

├── tmp_data │ ├── nuscenes │ │ ├── lidar_seg │ │ │ ├── v1.0-trainval │ │ │ │ ├── train_lmdb # Newly generated lmdb │ │ │ │ ├── val_lmdb # Newly generated lmdb │ │ ├── meta │ │ │ ├── maps │ │ │ ├── v1.0-mini │ │ │ ├── v1.0-trainval

train_lmdb and val_lmdb are the packaged training and validation datasets, which are also the final datasets read by the network. meta contains information for metrics initialization, copying from nuscenes.

Also, for nuscenes point cloud data training, it is necessary to generate database files for each individual training target in the dataset and store it using a .bin format file in tmp_nuscenes/lidar/nuscenes_gt_database. The path can be modified as needed. At the same time, a file containing these database information in .pkl format needs to be generated. In addition, we need to collect category information of all samples in train dataset and resample the whole dataset, thus we can generate these information and save them in .pkl format to accelerate training. Then, create these files by running the following command:

python3 tools/create_data.py --dataset nuscenes --root-dir ./tmp_data/nuscenes/lidar_seg/v1.0-trainval --extra-tag nuscenes --out-dir tmp_nuscenes/lidar

After executing the above command, the following file directory is generated:

├── tmp_data │ ├── nuscenes │ │ ├── lidar_seg │ │ │ ├── v1.0-trainval │ │ │ │ ├── train_lmdb # Newly generated lmdb │ │ │ │ ├── val_lmdb # Newly generated lmdb │ │ ├── meta │ │ │ ├── maps │ │ │ ├── v1.0-mini │ │ │ ├── v1.0-trainval ├── tmp_nuscenes │ ├── lidar │ │ ├── nuscenes_gt_database # Newly generated nuscenes_gt_database │ │ │ ├── xxxxx.bin │ │ ├── nuscenes_dbinfos_train.pkl # Newly generated nuscenes_dbinfos_train.pkl │ │ ├── nuscenes_infos_train.pkl # Newly generated nuscenes_infos_train.pkl

nuscenes_gt_database and nuscenes_dbinfos_train.pkl are the samples that are used for sampling during training, and nuscenes_infos_train.pkl can be used to accelerate train dataset initialization.

Floating-point Model Training

Once the dataset is ready, you can start training the floating-point lidarMultiTask network. Before the network training starts, you can test the computation and the number of parameters of the network by using the following command.

python3 tools/calops.py --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

If you simply want to start such a training task, just run the following command:

python3 tools/train.py --stage float --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

Since the HAT algorithm package uses an ingenious registration mechanism, each training task can be started in the form of this train.py plus the config configuration file. The train.py is the unified training script has nothing to do with the task. What kind of task we need to train, what kind of dataset to use, and training-related hyperparameter settings are all in the specified config configuration file. The config file provides key dicts such as model building and data reading.

Model Building

The network structure of lidarMultiTask can refer to CenterPoint model with modified neck and additional segmentation head. You can find details in the config file. We can easily define and modify the model by defining a dict-type variable such as model in the config configuration file.

model = dict( type="LidarMultiTask", feature_map_shape=get_feature_map_size(point_cloud_range, voxel_size), pre_process=dict( type="CenterPointPreProcess", pc_range=point_cloud_range, voxel_size=voxel_size, max_voxels_num=max_voxels, max_points_in_voxel=max_num_points, norm_range=[-51.2, -51.2, -5.0, 0.0, 51.2, 51.2, 3.0, 255.0], norm_dims=[0, 1, 2, 3], ), reader=dict( type="PillarFeatureNet", num_input_features=5, num_filters=(64,), with_distance=False, pool_size=(max_num_points, 1), voxel_size=voxel_size, pc_range=point_cloud_range, bn_kwargs=norm_cfg, quantize=True, use_4dim=True, use_conv=True, hw_reverse=True, ), scatter=dict( type="PointPillarScatter", num_input_features=64, use_horizon_pillar_scatter=True, quantize=True, ), backbone=dict( type="MixVarGENet", net_config=net_config, disable_quanti_input=True, input_channels=64, input_sequence_length=1, num_classes=1000, bn_kwargs=bn_kwargs, include_top=False, bias=True, output_list=[0, 1, 2, 3, 4], ), neck=dict( type="Unet", in_strides=(2, 4, 8, 16, 32), out_strides=(4,), stride2channels=dict( { 2: 64, 4: 64, 8: 64, 16: 96, 32: 160, } ), out_stride2channels=dict( { 2: 128, 4: 128, 8: 128, 16: 128, 32: 160, } ), factor=2, group_base=8, bn_kwargs=bn_kwargs, ), lidar_decoders=[ dict( type="LidarSegDecoder", name="seg", task_weight=80.0, task_feat_index=0, head=dict( type="DepthwiseSeparableFCNHead", input_index=0, in_channels=128, feat_channels=64, num_classes=2, dropout_ratio=0.1, num_convs=2, bn_kwargs=bn_kwargs, int8_output=False, ), target=dict( type="FCNTarget", ), loss=dict( type="CrossEntropyLoss", loss_name="seg", reduction="mean", ignore_index=-1, use_sigmoid=False, class_weight=[1.0, 10.0], ), decoder=dict( type="FCNDecoder", upsample_output_scale=4, use_bce=False, bg_cls=-1, ), ), dict( type="LidarDetDecoder", name="det", task_weight=1.0, task_feat_index=0, head=dict( type="DepthwiseSeparableCenterPointHead", in_channels=128, tasks=tasks, share_conv_channels=64, share_conv_num=1, common_heads=common_heads, head_conv_channels=64, init_bias=-2.19, final_kernel=3, ), target=dict( type="CenterPointLidarTarget", grid_size=[512, 512, 1], voxel_size=voxel_size, point_cloud_range=point_cloud_range, tasks=tasks, dense_reg=1, max_objs=500, gaussian_overlap=0.1, min_radius=2, out_size_factor=4, norm_bbox=True, with_velocity=with_velocity, ), loss=dict( type="CenterPointLoss", loss_cls=dict(type="GaussianFocalLoss", loss_weight=1.0), loss_bbox=dict( type="L1Loss", reduction="mean", loss_weight=0.25, ), with_velocity=with_velocity, code_weights=[ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2, ], ), decoder=dict( type="CenterPointPostProcess", tasks=tasks, norm_bbox=True, bbox_coder=dict( type="CenterPointBBoxCoder", pc_range=point_cloud_range[:2], post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], max_num=100, score_threshold=0.1, out_size_factor=4, voxel_size=voxel_size[:2], ), # test_cfg max_pool_nms=False, score_threshold=0.1, post_center_limit_range=[ -61.2, -61.2, -10.0, 61.2, 61.2, 10.0, ], min_radius=[4, 12, 10, 1, 0.85, 0.175], out_size_factor=4, nms_type="rotate", pre_max_size=1000, post_max_size=83, nms_thr=0.2, box_size=9, ), ), ], )

Among them, the type under model means the name of the defined model, and the remaining variables mean the other components of the model. The advantage of defining the model this way is that we can easily replace the structure we want. After starting, the training script calls the build_model interface to convert such a model of type dict into a model of type torch.nn.Module.

def build_model(cfg, default_args=None): if cfg is None: return None assert "type" in cfg, "type is need in model" return build_from_cfg(cfg, MODELS)

Data Enhancement

Like the definition of model, the data enhancement process is implemented by defining two dicts data_loader and val_data_loader in the config file, which correspond to the processing of the training and validation sets, respectively. Here we take data_loader as an example:

train_dataset = dict( type="NuscenesLidarWithSegDataset", num_sweeps=9, data_path=os.path.join(data_rootdir, "train_lmdb"), info_path=os.path.join(gt_data_root, "nuscenes_infos_train.pkl"), load_dim=5, use_dim=[0, 1, 2, 3, 4], pad_empty_sweeps=True, remove_close=True, use_valid_flag=True, classes=det_class_names, transforms=[ dict( type="LidarMultiPreprocess", class_names=det_class_names, global_rot_noise=[-0.3925, 0.3925], global_scale_noise=[0.95, 1.05], db_sampler=db_sampler, ), dict( type="ObjectRangeFilter", point_cloud_range=point_cloud_range, ), dict( type="AssignSegLabel", bev_size=[512, 512], num_classes=2, class_names=[0, 1], point_cloud_range=point_cloud_range, voxel_size=voxel_size[:2], ), dict(type="LidarReformat", with_gt=True), ], ) data_loader = dict( type=torch.utils.data.DataLoader, dataset=dict(type="CBGSDataset", dataset=train_dataset), sampler=dict(type=torch.utils.data.DistributedSampler), batch_size=batch_size_per_gpu, shuffle=False, num_workers=4, pin_memory=False, collate_fn=hat.data.collates.collate_lidar3d, )

Here, the type directly uses the interface torch.utils.data.DataLoader that comes with pytorch, which represents the combination of batch_size size samples together. Here you may only need to pay attention the dataset variable, and the data_path path is the path we mentioned in the first part of the dataset preparation. transforms contains a series of data augmentations. In val_data_loader, only point cloud reading, segmentation labels generation and data reformat are available. You can also implement your desired data augmentation operations by inserting a new dict in transforms.

Training Strategy

To train a model with high accuracy, a good training strategy is essential. For each training task, the corresponding training strategy is also defined in the config file, which can be seen from the variable float_trainer.

float_trainer = dict( type="distributed_data_parallel_trainer", model=model, data_loader=data_loader, optimizer=dict( type=torch.optim.AdamW, betas=(0.95, 0.99), lr=2e-4, weight_decay=0.01, ), batch_processor=batch_processor, num_epochs=20, device=None, callbacks=[ stat_callback, loss_show_update, dict( type="CyclicLrUpdater", target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, step_log_interval=200, ), grad_callback, val_callback, ckpt_callback, ], sync_bn=True, train_metrics=dict( type="LossShow", ), )

The float_trainer defines our training approach from the big picture, including the use of distributed_data_parallel_trainer, the number of epochs for model training, and the choice of optimizer. At the same time, callbacks reflects the small strategies used by the model in the training process and the operations that the user wants to implement, including the transformation method of the learning rate (CyclicLrUpdater), the indicator (Validation), and save (Checkpoint) the operation of the model. Of course, if you have operations that you want the model to implement during training, you can also add it in this dict way. float_trainer is responsible for concatenating the entire training logic, which is also responsible for model pretraining.

Note

If you need to reproduce the accuracy, it is best not to modify the training strategy in the config. Otherwise, unexpected training situations may arise.

Through the above introduction, you should have a clear understanding of the role of the config file. Then, through the training script mentioned above, a high-precision pure floating-point model can be trained. Of course, training a good model is not our ultimate goal, it is just a pre-training for our future training of quantitative models.

Quantitative Model Training

When we have a floating-point model, we can start training the corresponding QAT model. In the same way as floating-point training, we can train a QAT model just by running the following script: BTW, it is recommended to add a calibration stage before quantization aware training. Calibration can provide better initialization parameters for QAT.

python3 tools/train.py --stage calibration --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py python3 tools/train.py --stage qat --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

As you can see, our configuration file has not changed, only the type of stage has been changed. At this point, the training strategy we use comes from the qat_trainer in the config file.

qat_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", qconfig_params=dict( activation_qat_qkwargs=dict( averaging_constant=0, ), weight_qat_qkwargs=dict( averaging_constant=1, ), ), converters=[ dict(type="Float2QAT", convert_mode=convert_mode), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "calibration-checkpoint-last.pth.tar" ), ), ], ), data_loader=data_loader, optimizer=dict( type=torch.optim.SGD, weight_decay=0.0, lr=1e-4, momentum=0.9, ), batch_processor=batch_processor, num_epochs=10, device=None, callbacks=[ stat_callback, loss_show_update, dict( type="CyclicLrUpdater", target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, step_log_interval=200, ), grad_callback, val_callback, ckpt_callback, ], train_metrics=dict( type="LossShow", ), )

With Different model_convert_pipeline Parameters

By setting model_convert_pipeline when training quantitative models, the corresponding floating-point model can be converted into a quantitative model, as below:

model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", converters=[ dict(type="Float2QAT", convert_mode=convert_mode), dict( type="LoadCheckpoint", checkpoint_path=os.path.join( ckpt_dir, "qat-checkpoint-best.pth.tar" ), ), dict(type="QAT2Quantize", convert_mode=convert_mode), ], )

For the key steps in quantitative training, such as preparing floating-point models, operator replacement, inserting quantization and dequantization nodes, setting quantization parameters, and operator fusion, please read the Quantized Awareness Training (QAT) section.

With Different Training Strategies

As we said before, the quantitative training is actually finetue on the basis of pure floating-point training. Therefore, when quantized training, our initial learning rate is set to half of the floating-point training, the number of epochs for training is also reduced, and most importantly, when model is defined, our pretrained needs to be set to the address of the pure floating-point model that has been trained or the calibration model.

After making these simple adjustments, we can start training our quantitative model.

Export FixedPoint Model

Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:

python3 tools/export_hbir.py --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

Model Validation

After the model is trained, we can also validate the performance of the trained model. Since we provide two stages of training process, float and qat, we can validate the performance of the model trained in these two stages. Run the following two commands:

python3 tools/predict.py --stage float --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py python3 tools/predict.py --stage qat --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

At the same time, we also provide a performance test of the quantization model, just run the following command, but it should be noted that hbir must be exported first:

python3 tools/predict.py --stage "int_infer" --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

The displayed accuracy is the real accuracy of the final int8 model. Of course, this accuracy should be very close to the accuracy of the qat verification stage.

Simulation of On-board Accuracy Validation

In addition to the model validation described above, we offer an accuracy validation method that is identical to the board side, you can refer to the following:

python3 tools/validation_hbir.py --stage "align_bpu" --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py

Model Inference and Results Visualization

If you want to see the detection effect of the trained model for a lidar point cloud file, we also provide point cloud prediction and visualization scripts in our tools folder, you just need to run the following script.

python3 tools/infer_hbir.py --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py --model-inputs input_points:${lidar-pointcloud-path} --save-path ${save_path}

Model Checking and Compilation

After training, the quantized model can be compiled into an hbm file that can be run on the board by using the compile_perf_hbir tool. At the same time, the tool can also estimate the running performance on the BPU. The following scripts can be used:

python3 tools/compile_perf_hbir.py --config configs/lidar_multi_task/centerpoint_mixvargnet_multitask_nuscenes.py