PointPillars Detection Model Training (No config)

This tutorial focuses on how to use HAT to train a PointPillars model on the radar point cloud dataset KITTI-3DObject from scratch, including floating-point, quantized and fixed-point models.

Dataset Preparation

Before starting to train the model, the first step is to prepare the dataset, download the 3DObject dataset . The following 4 files are included:

  1. Left color images of object data set
  2. Velodyne point clouds
  3. Camera calibration matrices of object data set
  4. Training labels of object data set

After downloading the above 4 files, unzip and organize the folder structure as follows:

├── tmp_data │ ├── kitti3d │ │ ├── testing │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── velodyne │ │ ├── training │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── label_2 │ │ │ ├── velodyne

In order to create KITTI point cloud data, the original point cloud data needs to be loaded and the associated data annotation file containing the target labels and annotation boxes needs to be generated. It is also necessary to generate the point cloud data for each individual training target for the KITTI dataset and store it in a .bin format file in data/kitti/gt_database. In addition, a file containing data information in .pkl format needs to be generated for either the training data or the validation data. Then, create the KITTI data by running the following command:

mkdir ./tmp_data/kitti/ImageSets # Download dataset split files from the community wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/test.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/train.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/val.txt wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/trainval.txt python3 tools/create_data.py --dataset "kitti3d" --root-dir "./tmp_data/kitti3d"

After executing the above command, the following file directory is generated:

├── tmp_data │ ├──── kitti3d │ │ ├── imagesets │ │ │ ├── test.txt │ │ │ ├── train.txt │ │ │ ├── trainval.txt │ │ │ ├── val.txt │ │ ├── testing │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── velodyne │ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced │ │ ├── training │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── label_2 │ │ │ ├── velodyne │ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced │ │ ├── kitti3d_gt_database # Newly generated kitti_gt_database │ │ │ ├── xxxxx.bin │ │ ├── kitti3d_infos_train.pkl # Newly generated kitti_infos_train.pkl │ │ ├── kitti3d_infos_val.pkl # Newly generated kitti_infos_val.pkl │ │ ├── kitti3d_dbinfos_train.pkl # Newly generated kitti_dbinfos_train.pkl │ │ ├── kitti3d_infos_test.pkl # Newly generated kitti_infos_test.pkl │ │ ├── kitti3d_infos_trainval.pkl # Newly generated kitti_infos_trainval.pkl

Also, to improve the speed of training, we did a package of data information files to convert them into lmdb format datasets. The conversion can be successfully achieved by simply running the following script:

python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --split-name train --pack-type lmdb python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --anno-file val --pack-type lmdb

The above two commands correspond to transforming the training dataset and the validation dataset, respectively. After the packing is completed, the file structure in the data directory should look as follows:

├── tmp_data │ ├──── kitti3d │ │ ├── pack_data # Newly generated lmdb │ │ │ ├── train │ │ │ ├── val │ │ ├── ImageSets │ │ │ ├── test.txt │ │ │ ├── train.txt │ │ │ ├── trainval.txt │ │ │ ├── val.txt │ │ ├── testing │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── velodyne │ │ │ ├── velodyne_reduced │ │ ├── training │ │ │ ├── calib │ │ │ ├── image_2 │ │ │ ├── label_2 │ │ │ ├── velodyne │ │ │ ├── velodyne_reduced │ │ ├── kitti3d_gt_database │ │ │ ├── xxxxx.bin │ │ ├── kitti3d_infos_train.pkl │ │ ├── kitti3d_infos_val.pkl │ │ ├── kitti3d_dbinfos_train.pkl │ │ ├── kitti3d_infos_test.pkl │ │ ├── kitti3d_infos_trainval.pkl

train_lmdb and val_lmdb are the packaged training and validation datasets, which are also the final datasets read by the network. kitti3d_gt_database and kitti3d_dbinfos_train.pkl are the samples that are used for sampling during training.

Floating-point Model Training

Once the dataset is ready, you can start training the PointPillars detection network.

Model Building

The network structure of PointPillars can refer to Paper , this is not described in detail here.

The entire process from model training to compilation is roughly as follows:

model_pipeline

From the above figure, we know that three stages of models are mainly used, namely, Float Model , QAT Model, and Quantized Model: Where,

  • Float Model : General floating-point model.
  • QAT Model : Model for quantization aware training.
  • Quantized Model : Quantized model, with the parameter of INT8 type.

In addition, models with different structures (or states) are used in training, compilation, and other processes:

  • model: A complete model structure, including pre processing, network structure, and post processing of the model. It is mainly used for training and evaluation.
  • deploy_model : Only contains the network structure (which can be compiled into hbm), excluding pre processing and post processing. It is mainly used for compilation.

We define a PointPillarsModel class to define all the content related to the model structure, including the above three stages: Float Model, QAT Model, and Quantized Model, as well as two states: model and deploy_model:

class PointPillarsModel: task_name = "pointpillars_kitti_car" @classmethod def model(cls): """model structure""" model = cls._build_pp_model(cls, is_deploy=False) return model @classmethod def deploy_model(cls): """deploy model, for compile""" deploy_model = cls._build_pp_model(cls, is_deploy=True) return deploy_model @classmethod def deploy_inputs(cls): """deploy inputs, for compile""" deploy_inputs = dict( # noqa C408 points=[ torch.randn(150000, 4), ], ) return deploy_inputs @classmethod def float_model(cls, pretrain_ckpt=None): """float model""" model = cls.model() if pretrain_ckpt: ckpt_loader = LoadCheckpoint( pretrain_ckpt, allow_miss=False, ignore_extra=False, verbose=True, ) model = ckpt_loader(model) return model @classmethod def qat_model(cls, pre_step_ckpt=None, pretrain_ckpt=None): """QAT model""" float_model = cls.float_model(pre_step_ckpt) qat_model = Float2QAT()(float_model) if pretrain_ckpt: ckpt_loader = LoadCheckpoint( pretrain_ckpt, allow_miss=False, ignore_extra=False, verbose=True, ) qat_model = ckpt_loader(qat_model) return qat_model @classmethod def int_infer_model(cls, pre_step_ckpt=None, pretrain_ckpt=None, use_deploy=False): if use_deploy: # for training model = cls.deploy_model() # float model else: # for prediction model = cls.model() model = Float2QAT()(model) # qat model if pre_step_ckpt: # load QAT checkpoint qat_ckpt_loader = LoadCheckpoint( pre_step_ckpt, allow_miss=False, ignore_extra=False, verbose=True, ) model = qat_ckpt_loader(model) # qat model with state_dict int_model = QAT2Quantize()(model) if pretrain_ckpt: int_ckpt_loader = LoadCheckpoint( pretrain_ckpt, allow_miss=False, ignore_extra=False, verbose=True, ) int_model = int_ckpt_loader(model) return int_model def _build_pp_model(self, is_deploy=False): """model structure""" # Voxelization cfg pc_range = [0, -39.68, -3, 69.12, 39.68, 1] voxel_size = [0.16, 0.16, 4.0] max_points_in_voxel = 100 max_voxels_num = 12000 class_names = ["Car"] def get_feature_map_size(point_cloud_range, voxel_size): point_cloud_range = np.array(point_cloud_range, dtype=np.float32) voxel_size = np.array(voxel_size, dtype=np.float32) grid_size = ( point_cloud_range[3:] - point_cloud_range[:3] ) / voxel_size grid_size = np.round(grid_size).astype(np.int64) return grid_size model = PointPillarsDetector( feature_map_shape=get_feature_map_size(pc_range, voxel_size), is_deploy=is_deploy, pre_process=PointPillarsPreProcess( pc_range=pc_range, voxel_size=voxel_size, max_voxels_num=max_voxels_num, max_points_in_voxel=max_points_in_voxel, ), reader=PillarFeatureNet( num_input_features=4, num_filters=(64,), with_distance=False, pool_size=(1, max_points_in_voxel), voxel_size=voxel_size, pc_range=pc_range, bn_kwargs=None, quantize=True, use_4dim=True, use_conv=True, ), backbone=PointPillarScatter( num_input_features=64, use_horizon_pillar_scatter=True, quantize=True, ), neck=SECONDNeck( in_feature_channel=64, down_layer_nums=[3, 5, 5], down_layer_strides=[2, 2, 2], down_layer_channels=[64, 128, 256], up_layer_strides=[1, 2, 4], up_layer_channels=[128, 128, 128], bn_kwargs=None, quantize=True, ), head=PointPillarsHead( num_classes=len(class_names), in_channels=sum([128, 128, 128]), use_direction_classifier=True, ), anchor_generator=Anchor3DGeneratorStride( anchor_sizes=[[1.6, 3.9, 1.56]], # noqa B006 anchor_strides=[[0.32, 0.32, 0.0]], # noqa B006 anchor_offsets=[[0.16, -39.52, -1.78]], # noqa B006 rotations=[[0, 1.57]], # noqa B006 class_names=class_names, match_thresholds=[0.6], unmatch_thresholds=[0.45], ), targets=LidarTargetAssigner( box_coder=GroundBox3dCoder(n_dim=7), class_names=class_names, positive_fraction=-1, ), loss=PointPillarsLoss( num_classes=len(class_names), loss_cls=FocalLossV2( alpha=0.25, gamma=2.0, from_logits=False, reduction="none", loss_weight=1.0, ), loss_bbox=SmoothL1Loss( beta=1 / 9.0, reduction="none", loss_weight=2.0, ), loss_dir=CrossEntropyLoss( use_sigmoid=False, reduction="none", loss_weight=0.2, ), ), postprocess=PointPillarsPostProcess( num_classes=len(class_names), box_coder=GroundBox3dCoder(n_dim=7), use_direction_classifier=True, num_direction_bins=2, # test_cfg use_rotate_nms=False, nms_pre_max_size=1000, nms_post_max_size=300, nms_iou_threshold=0.5, score_threshold=0.4, post_center_limit_range=[0, -39.68, -5, 69.12, 39.68, 5], max_per_img=100, ), ) if is_deploy: model.anchor_generator = None model.targets = None model.loss = None model.postprocess = None return model

All content related to the model has been defined in the PointPillarsModel in the above code. When using it, you can easily obtain the corresponding model structure through PointPillarsModel.xxx().

After completing the definition of the network structure, we can use the following command to first check FLOPs and Params of the network:

python3 examples/pointpillars.py --calops

Data Augmentation

Similar to the Model building section, we define the DataHelper to implement data related content, including transforms and data_Loader, etc.:

class DataHelper: data_dir = "./tmp_data/kitti3d" # root path of dataset train_batch_size = 2 # batch_size for training val_batch_size = 1 # batch_size for evaluation @classmethod def train_data_loader(cls): """train dataloader""" return cls.build_dataloader(cls, is_training=True) @classmethod def val_data_loader(cls): """val dataloader""" return cls.build_dataloader(cls, is_training=False) def build_dataloader(self, is_training=True): """dataloader""" transforms = self.build_transforms(self, self.data_dir, is_training) split_dir = "train_lmdb" if is_training else "val_lmdb" dataset = Kitti3D( data_path=os.path.join(self.data_dir, split_dir), transforms=transforms, ) if is_training: sampler = torch.utils.data.DistributedSampler(dataset) else: sampler = None dataloader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.train_batch_size if is_training else self.val_batch_size, sampler=sampler, shuffle=False, num_workers=1, pin_memory=True, collate_fn=hat.data.collates.collate_kitti3d, ) return dataloader def build_transforms(self, data_dir, is_training=True): """transforms""" class_names = ["Car"] pc_range = [0, -39.68, -3, 69.12, 39.68, 1] if is_training: transforms = torchvision.transforms.Compose( [ ObjectSample( class_names=class_names, remove_points_after_sample=False, db_sampler=DataBaseSampler( enable=True, root_path=data_dir, db_info_path=os.path.join( data_dir, "kitti3d_dbinfos_train.pkl" ), # noqa E501 sample_groups=[dict(Car=15)], # noqa C408 db_prep_steps=[ # noqa C408 dict( # noqa C408 type="DBFilterByDifficulty", filter_by_difficulty=[-1], ), dict( # noqa C408 type="DBFilterByMinNumPoint", filter_by_min_num_points=dict( # noqa C408 Car=5, ), ), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, ), ), ObjectNoise( gt_rotation_noise=[-0.15707963267, 0.15707963267], gt_loc_noise_std=[0.25, 0.25, 0.25], global_random_rot_range=[0, 0], num_try=100, class_names=class_names, ), PointRandomFlip(probability=0.5), PointGlobalRotation(rotation=[-0.78539816, 0.78539816]), PointGlobalScaling(min_scale=0.95, max_scale=1.05), ShufflePoints(True), ObjectRangeFilter(point_cloud_range=pc_range), LidarReformat(), ] ) else: transforms = torchvision.transforms.Compose([Reformat()]) return transforms

Training Strategy

To train a model with high accuracy, a good training strategy is essential. For each training task, the training strategies for models at different stages (float, QAT) may vary slightly, so we also define the training strategy content (such as, optimizer and lr_schedule) in PointPillarsModel:

class PointPillarsModel: ... @classmethod def optimizer(cls, model, stage): """optimizer setting for training""" if stage == "float": optimizer = torch.optim.AdamW( params=model.parameters(), betas=(0.95, 0.99), lr=2e-4, weight_decay=0.01, ) elif stage == "qat": optimizer = torch.optim.SGD( params=model.parameters(), lr=2e-4, momentum=0.9, weight_decay=0.0, ) else: optimizer = None return optimizer @classmethod def lr_schedule(cls, stage): """lr schedule setting for training""" if stage == "float": lr_updater = CyclicLrUpdater( target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, step_log_interval=50, ) elif stage == "qat": lr_updater = CyclicLrUpdater( target_ratio=(10, 1e-4), cyclic_times=1, step_ratio_up=0.4, step_log_interval=50, ) else: lr_updater = None return lr_updater @classmethod def train_metrics(cls): """Logs to show during training, only print loss info""" return LossShow() @classmethod def val_metrics(cls): """Metric for evaluation""" class_names = ["Car"] val_metrics = Kitti3DMetricDet( compute_aos=True, current_classes=class_names, difficultys=[0, 1, 2], ) return val_metrics
Note

If you need to reproduce the accuracy, it is best not to modify the training strategy in the sample code. Otherwise, unexpected training situations may arise.

Through the above introduction, we have completed the definition of all modules related to model training. Of course, training a good detection model is not our ultimate goal, it is just a pre-training for our future training of fixed-point models. If you simply want to start such a training task, just run the following command:

python3 examples/pointpillars.py --stage "float" --device-ids "0,1,2,3" --train

Quantized Model Training

When we have a floating-point model, we can start training the corresponding fixed-point model. In the same way as floating-point training, we can train a fixed-point model just by running the following script:

python3 examples/pointpillars.py --stage "qat" --device-ids "0,1,2,3" --train

The Value of the Quantize Parameter is Different

When we train the quantized model, we need to set quantize=True. At this time, the corresponding floating-point model will be converted into a quantized model. The code is as follows:

model.fuse_model() model.set_qconfig() horizon.quantization.prepare_qat(model, inplace=True)

For the key steps in quantization training, such as preparing floating-point models, operator replacement, inserting quantization and dequantization nodes, setting quantization parameters, and operator fusion, please read the Quantized Awareness Training (QAT) section.

Different Training Strategies

As we said before, quantization training is actually finetue on the basis of pure floating-point training. Therefore, when quantized training, our initial learning rate is set to one-tenth of the floating-point training, the number of epochs for training is also greatly reduced, and most importantly, when model is defined, our pretrained needs to be set to the address of the pure floating-point model that has been trained.

After making these simple adjustments, we can start training our quantized model.

Model Validation

After the model is trained, we can also verify the performance of the trained model. Since we provide two stages of training process, float and qat, we can verify the performance of the model trained in these two stages. It is only necessary to run the following two commands accordingly:

python3 examples/pointpillars.py --predict --stage "float" --device-ids "0" --ckpt ${float-checkpoint-file} python3 examples/pointpillars.py --predict --stage "qat" --device-ids "0" --ckpt ${qat-checkpoint-file}

At the same time, we also provide a performance test of the quantization model, just run the following command:

python3 examples/pointpillars.py --predict --stage "int_infer" --device-ids "0" --ckpt ${int_infer-checkpoint-file}

The displayed accuracy is the real accuracy of the final int8 model. Of course, this accuracy should be very close to the accuracy of the qat verification stage.

Align BPU Validation

In addition to the model validation described above, we offer an accuracy validation method that is identical to the board side, you can refer to the following:

python3 examples/pointpillars.py --device-ids "0" --align-bpu-validation

Results Visualization

If you want to see the detection effect of the trained model for a single frame radar point cloud, we also provide point cloud prediction and visualization scripts in our tools folder, you just need to run the following script.

python3 examples/pointpillars.py --device-ids "0" --visualize --input-points ${lidar-pointcloud-path} --is-plot

Model Checking and Compilation

After training, the quantized model can be compiled into an hbm file that can be run on the board by using the compile tool. At the same time, the tool can also estimate the running performance on the BPU. The following scripts can be used:

python3 examples/pointpillars.py --compile --opt 3