PointPillars Detection Model Training
This tutorial focuses on how to use HAT to train a PointPillars model on the radar point cloud dataset KITTI-3DObject from scratch, including floating-point, quantitative, and fixed-point models.
Dataset Preparation
Before starting to train the model, the first step is to prepare the dataset, download the 3DObject dataset.
The following 4 files are included:
Left color images of object dataset
Velodyne point clouds
Camera calibration matrices of object dataset
Training labels of object dataset
After downloading the above 4 files, unzip and organize the folder structure as follows:
├── tmp_data
│ ├── kitti3d
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
In order to create KITTI point cloud data, you need to load the original point cloud data and generate the associated data annotation file containing the target labels and annotation boxes.
It is also necessary to generate the point cloud data for each individual training target for the KITTI dataset and store it in a .bin file in data/kitti/gt_database.
In addition, you need to generate a .pkl file containing data information for the training data or validation data.
Then, create the KITTI data by running the following commands:
mkdir ./tmp_data/kitti/ImageSets
# Download dataset segmentation files from the community
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/trainval.txt
python3 tools/create_data.py --dataset "kitti3d" --root-dir "./tmp_data/kitti3d"
The above commands generate the following file directory:
├── tmp_data
│ ├──── kitti3d
│ │ ├── ImageSets
│ │ │ ├── test.txt
│ │ │ ├── train.txt
│ │ │ ├── trainval.txt
│ │ │ ├── val.txt
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced
│ │ ├── kitti3d_gt_database # Newly generated kitti_gt_database
│ │ │ ├── xxxxx.bin
│ │ ├── kitti3d_infos_train.pkl # Newly generated kitti_infos_train.pkl
│ │ ├── kitti3d_infos_val.pkl # Newly generated kitti_infos_val.pkl
│ │ ├── kitti3d_dbinfos_train.pkl # Newly generated kitti_dbinfos_train.pkl
│ │ ├── kitti3d_infos_test.pkl # Newly generated kitti_infos_test.pkl
│ │ ├── kitti3d_infos_trainval.pkl # Newly generated kitti_infos_trainval.pkl
Also, to improve the training speed, we packed and convert the data files into LMDB format datasets.
Simply run the following commands to complete the conversion:
python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --split-name train --pack-type lmdb
python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --anno-file val --pack-type lmdb
The first command: transforming the training dataset. The second command: transforming the validation dataset.
When the packaging completes, the file structure in the data directory should look as follows:
├── tmp_data
│ ├──── kitti3d
│ │ ├── pack_data # Newly generated lmdb
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── ImageSets
│ │ │ ├── test.txt
│ │ │ ├── train.txt
│ │ │ ├── trainval.txt
│ │ │ ├── val.txt
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── kitti3d_gt_database
│ │ │ ├── xxxxx.bin
│ │ ├── kitti3d_infos_train.pkl
│ │ ├── kitti3d_infos_val.pkl
│ │ ├── kitti3d_dbinfos_train.pkl
│ │ ├── kitti3d_infos_test.pkl
│ │ ├── kitti3d_infos_trainval.pkl
train_lmdb and val_lmdb are the packaged training dataset and validation dataset, which are also the final datasets read by the network.
kitti3d_gt_database and kitti3d_dbinfos_train.pkl are the samples that are used for sampling during training.
Floating-point Model Training
Once the dataset is ready, you can start training the floating-point PointPillars detection network.
Before the network training starts, you can test the number of operations and the number of parameters of the network by using the following command:
python3 tools/calops.py --config configs/detection/pointpillars/pointpillars_kitti_car.py
If you simply want to start such a training task, just run the following command:
python3 tools/train.py --stage float --config configs/detection/pointpillars/pointpillars_kitti_car.py
The HAT algorithm package uses an ingenious registration mechanism that allows each training task to be started in the form of train.py plus a config file.
train.py is a uniform training script and independent of the task.
The task we need to train, the dataset we need to use, and the hyperparameters we need to set for the training are all in the specified config file.
The config file provides key dicts such as model building and data reading.
Model Building
The network structure of PointPillars can be found in Paper and here we will skip the details.
We can easily define and modify the model by defining a dict-type variable such as model in the config file.
model = dict(
type="PointPillarsDetector",
feature_map_shape=get_feature_map_size(pc_range, voxel_size),
pre_process=dict(
type="PointPillarsPreProcess",
pc_range=pc_range,
voxel_size=voxel_size,
max_voxels_num=max_voxels_num,
max_points_in_voxel=max_points_in_voxel,
),
reader=dict(
type="PillarFeatureNet",
num_input_features=4,
num_filters=(64,),
with_distance=False,
pool_size=(1, max_points_in_voxel),
voxel_size=voxel_size,
pc_range=pc_range,
bn_kwargs=norm_cfg,
quantize=True,
use_4dim=True,
),
backbone=dict(
type="PointPillarScatter",
num_input_features=64,
use_horizon_pillar_scatter=True,
quantize=True,
),
neck=dict(
type="SequentialBottleNeck",
layer_nums=[3, 5, 5],
ds_layer_strides=[2, 2, 2],
ds_num_filters=[64, 128, 256],
us_layer_strides=[1, 2, 4],
us_num_filters=[128, 128, 128],
num_input_features=64,
bn_kwargs=norm_cfg,
use_tconv=True,
use_secnet=True,
quantize=True,
),
head=dict(
type="PointPillarHead",
num_classes=len(class_names),
in_channels=sum([128, 128, 128]),
use_direction_classifier=True,
),
anchor_generator=dict(
type="Anchor3DGeneratorStride",
anchor_sizes=[[1.6, 3.9, 1.56]],
anchor_strides=[[0.32, 0.32, 0.0]],
anchor_offsets=[[0.16, -39.52, -1.78]],
rotations=[[0, 1.57]],
class_names=class_names,
match_thresholds=[0.6],
unmatch_thresholds=[0.45],
),
targets=dict(
type="LidarTargetAssigner",
box_coder=dict(
type="GroundBox3dCoderTorch",
n_dim=7,
),
class_names=class_names,
positive_fraction=-1,
region_similarity_calculator=dict(type="NearestIouSimilarity"),
),
loss=dict(
type="PointPillarsLoss",
num_classes=len(class_names),
loss_cls=dict(
type="SigmoidFocalLoss",
alpha=0.25,
gamma=2.0,
loss_weight=1.0,
),
loss_bbox=dict(
type="WeightedSmoothL1Loss",
sigma=3.0,
code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
codewise=True,
loss_weight=2.0,
),
loss_dir=dict(
type="WeightedSoftmaxClassificationLoss",
name="direction_classifier",
loss_weight=0.2,
),
),
postprocess=dict(
type="PointPillarsPostProcess",
num_classes=len(class_names),
box_coder=dict(
type="GroundBox3dCoderTorch",
n_dim=7,
),
use_direction_classifier=True,
num_direction_bins=2,
# test_cfg
use_rotate_nms=False,
nms_pre_max_size=1000,
nms_post_max_size=300,
nms_iou_threshold=0.5,
score_threshold=0.4,
post_center_limit_range=[0, -39.68, -5, 69.12, 39.68, 5],
max_per_img=100,
),
)
In which, the type under model means the name of the defined model, and the remaining variables mean the other components of the model.
The advantage of defining the model in this way is that the structure can be easily replaced.
After starting, the training script calls the build_model interface to convert such a model of the dict type into a model of the torch.nn.Module type.
def build_model(cfg, default_args=None):
if cfg is None:
return None
assert "type" in cfg, "type is need in model"
return build_from_cfg(cfg, MODELS)
Data Enhancement
Like the definition of model, the data enhancement process is implemented by defining two dicts (data_loader and val_data_loader) in the config file, corresponding to the processing workflow of the training set and validation set, respectively.
Here we tak data_loader as an example:
dataset = dict(
type="Kitti3D",
data_path="./tmp_data/kitti3d/train_lmdb",
transforms=[
dict(
type="ObjectSample",
class_names=class_names,
remove_points_after_sample=False,
db_sampler=db_sampler,
),
dict(
type="ObjectNoise",
gt_rotation_noise=[-0.15707963267, 0.15707963267],
gt_loc_noise_std=[0.25, 0.25, 0.25],
global_random_rot_range=[0, 0],
num_try=100,
class_names=class_names,
),
dict(
type="PointRandomFlip",
probability=0.5,
),
dict(
type="PointGlobalRotation",
rotation=[-0.78539816, 0.78539816],
),
dict(
type="PointGlobalScaling",
min_scale=0.95,
max_scale=1.05,
),
dict(
type="ShufflePoints",
),
dict(
type="ObjectRangeFilter",
point_cloud_range=pc_range,
),
dict(type="Reformat"),
],
)
data_loader = dict(
type=torch.utils.data.DataLoader,
dataset=dataset,
sampler=dict(type=torch.utils.data.DistributedSampler),
batch_size=batch_size_per_gpu,
shuffle=False,
num_workers=1,
pin_memory=True,
collate_fn=hat.data.collates.collate_kitti3d,
)
Here, type directly uses the interface torch.utils.data.DataLoader that comes with pytorch, which means to combine the images with the size of batch_size together.
Here you may only need to pay attention to the dataset variable. The path data_path is the path we mentioned in the first part of the dataset preparation.
The transforms contains a series of data enhancements, while val_data_loader contains only point cloud Pillarization (Voxelization) and Reformat.
You can also implement your desired data enhancement operations by inserting a new dict in transforms.
Training Strategy
To train a model with high accuracy, a good training strategy is essential.
For each training task, the corresponding training strategy is also defined in the config file, which can be seen from the variable float_trainer.
float_trainer = dict(
type="distributed_data_parallel_trainer",
model=model,
data_loader=data_loader,
optimizer=dict(
type=torch.optim.AdamW,
betas=(0.95, 0.99),
lr=5e-4,
weight_decay=0.1,
),
batch_processor=batch_processor,
num_epochs=160,
device=None,
callbacks=[
stat_callback,
loss_show_update,
dict(
type="CyclicLrUpdater",
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
step_log_interval=50,
),
dict(
type="CyclicOptimParamUpdater",
param_name="betas",
target_ratio=(0.85 / 0.95, 1),
cyclic_times=1,
step_ratio_up=0.4,
step_log_interval=50,
),
dict(
type="GradClip",
max_norm=35,
norm_type=2,
),
val_callback,
ckpt_callback,
],
sync_bn=True,
train_metrics=dict(
type="LossShow",
),
val_metrics=dict(
type="Kitti3DMetricDet",
compute_aos=True,
current_classes=class_names,
difficultys=[0, 1, 2],
),
)
The float_trainer defines our training approach in general, including the use of distributed_data_parallel_trainer, the number of epochs for model training, and the choice of optimizer.
Also, callbacks reflects the small strategies used by the model in the training process and the operations that the user wants to implement, including the transformation method of the learning rate (CyclicLrUpdater),
and the indicator (Validation), and the save operation (Checkpoint) of the model. Of course, if you have operations that you want the model to implement during the training,
you can also add it in this way (dict).
The float_trainer is responsible for concatenating the entire training logic, which is also responsible for model pretraining.
Note
If you need to reproduce the accuracy, it is best not to modify the training strategy in the config file. Otherwise, unexpected training situations may arise.
Through the above introductions, you should have a clear understanding of the role of the config file.
Then, through the training script mentioned above, a high-precision pure floating-point detection model can be trained.
Of course, training a good detection model is not our ultimate goal, it is just a pre-training for our future training of fixed-point models.
Quantitative Model Training
When we have a floating-point model, we can start training the corresponding fixed-point model.
In the same way as the floating-point training, we can train a fixed-point model just by running the following script:
python3 tools/train.py --stage calibration --config configs/detection/pointpillars/pointpillars_kitti_car.py
python3 tools/train.py --stage qat --config configs/detection/pointpillars/pointpillars_kitti_car.py
As you can see, our configuration file has not changed, except the stage type.
At this point, the training strategy we use comes from the qat_trainer and calibration_trainer in the config file.
calibration_trainer = dict(
type="Calibrator",
model=model,
model_convert_pipeline=dict(
type="ModelConvertPipeline",
qat_mode="fuse_bn",
converters=[
dict(
type="LoadCheckpoint",
checkpoint_path=os.path.join(
ckpt_dir, "float-checkpoint-best.pth.tar"
),
),
dict(type="Float2Calibration", convert_mode=convert_mode),
],
),
data_loader=calibration_data_loader,
batch_processor=calibration_batch_processor,
num_steps=calibration_step,
device=None,
callbacks=[
stat_callback,
val_callback,
ckpt_callback,
],
val_metrics=dict(
type="Kitti3DMetricDet",
compute_aos=True,
current_classes=class_names,
difficultys=[0, 1, 2],
),
log_interval=calibration_step / 10,
)
qat_trainer = dict(
type="distributed_data_parallel_trainer",
model=model,
model_convert_pipeline=dict(
type="ModelConvertPipeline",
qat_mode="fuse_bn",
qconfig_params=dict(
activation_qat_qkwargs=dict(
averaging_constant=0,
),
weight_qat_qkwargs=dict(
averaging_constant=1,
),
),
converters=[
dict(type="Float2QAT", convert_mode=convert_mode),
dict(
type="LoadCheckpoint",
checkpoint_path=os.path.join(
ckpt_dir, "calibration-checkpoint-best.pth.tar"
),
),
],
),
data_loader=data_loader,
optimizer=dict(
type=torch.optim.SGD,
params={"weight": dict(weight_decay=0.0)},
lr=2e-4,
momentum=0.9,
),
batch_processor=batch_processor,
num_epochs=50,
device=None,
callbacks=[
stat_callback,
loss_show_update,
dict(
type="CyclicLrUpdater",
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
step_log_interval=50,
),
val_callback,
ckpt_callback,
],
train_metrics=dict(
type="LossShow",
),
val_metrics=dict(
type="Kitti3DMetricDet",
compute_aos=True,
current_classes=class_names,
difficultys=[0, 1, 2],
),
)
Value of Quantize Parameter is Different
When we train the quantitative model, we need to set quantize=True.
At this moment, the corresponding floating-point model will be converted into a quantitative model. The code is as follows:
model.fuse_model()
model.set_qconfig()
horizon.quantization.prepare_qat(model, inplace=True)
For the key steps in quantitative training, such as preparing floating-point models, operator replacement, inserting quantization and dequantization nodes, setting quantization parameters, and operator fusion,
please read the Quantized Awareness Training (QAT) section.
Different Training Strategies
As we said before, quantitative training is actually the finetuning on the basis of pure floating-point training.
Therefore, in quantitative training, the initial learning rate is set to one-tenth of the floating-point training, the number of epochs for training is also greatly reduced,
and most importantly, when model is defined, the pretrained parameter needs to be set to the address of the pure floating-point model that has been trained.
After making these simple adjustments, we can start training our quantitative model.
Export FixedPoint Model
Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:
python3 tools/export_hbir.py --config configs/detection/pointpillars/pointpillars_kitti_car.py
Model Verification
After the model is trained, we can also verify the performance of the trained model.
Since we provide two stages of training process, Float, Calibration and QAT, we can verify the performance of the model trained in these stages.
Simply run the following two commands:
python3 tools/predict.py --stage float --config configs/detection/pointpillars/pointpillars_kitti_car.py
python3 tools/predict.py --stage calibration --config configs/detection/pointpillars/pointpillars_kitti_car.py
python3 tools/predict.py --stage qat --config configs/detection/pointpillars/pointpillars_kitti_car.py
Additionally, the following command can be used to verify the accuracy of a fixed point model, but it should be noted that hbir must be exported first:
python3 tools/predict.py --stage "int_infer" --config configs/detection/pointpillars/pointpillars_kitti_car.py
The displayed accuracy is the real accuracy of the final int8 model.
Of course, this accuracy should be very close to the accuracy of the QAT verification stage.
Simulation Board Accuracy Verification
In addition to the model validation described above, we offer an accuracy validation method that is identical to the board side, you can refer to the following:
python3 tools/validation_hbir.py --stage "align_bpu" --config configs/detection/pointpillars/pointpillars_kitti_car.py
Model Inference and Results Visualization
HAT provides infer_hbir.py which can visualizes the inference results of the trained models at each stage:
python3 tools/infer_hbir.py --config configs/detection/pointpillars/pointpillars_kitti_car.py --model-inputs input_points:${lidar-pointcloud-path} --save-path ${save_path}
Model Checking and Compilation
After the training, the quantitative model can be compiled into an HBM file that can be run on the board by using the compile_perf_hbir tool.
The tool can also estimate the running performance on the BPU. The following scripts can be used:
python3 tools/compile_perf_hbir.py --config configs/detection/pointpillars/pointpillars_kitti_car.py