PointPillars Detection Model Training (No config and HAT Trainer)
This tutorial aims to introduce in detail how to use HAT as a model library, and use it in combination with a user-defined training framework to train the PointPillars model on the radar point cloud dataset KITTI-3DObject.
In the tutorial, we will show how to integrate the model in HAT into the user's training framework, including key steps such as data preprocessing and model building, to help you successfully integrate HAT into your training framework, and train floating-point, QAT and quantized models of PointPillars.
Dataset Preparation
Before starting to train the model, the first step is to prepare the dataset, download the 3DObject dataset .
The following 4 files are included:
left color images of object data set
velodyne point clouds
camera calibration matrices of object data set
taining labels of object data set
After downloading the above 4 files, unzip and organize the folder structure as follows:
├── tmp_data
│ ├── kitti3d
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
In order to create KITTI point cloud data, the original point cloud data needs to be loaded and the associated data annotation file containing the target labels and annotation boxes needs to be generated.
It is also necessary to generate the point cloud data for each individual training target for the KITTI dataset and store it in a .bin format file in data/kitti/gt_database.
In addition, a file containing data information in .pkl format needs to be generated for either the training data or the validation data.
Then, create the KITTI data by running the following command:
mkdir ./tmp_data/kitti/ImageSets
# Download dataset split files from the community
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./tmp_data/kitti3d/ImageSets/trainval.txt
python3 tools/create_data.py --dataset "kitti3d" --root-dir "./tmp_data/kitti3d"
After executing the above command, the following file directory is generated:
├── tmp_data
│ ├──── kitti3d
│ │ ├── imagesets
│ │ │ ├── test.txt
│ │ │ ├── train.txt
│ │ │ ├── trainval.txt
│ │ │ ├── val.txt
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced # Newly generated velodyne_reduced
│ │ ├── kitti3d_gt_database # Newly generated kitti_gt_database
│ │ │ ├── xxxxx.bin
│ │ ├── kitti3d_infos_train.pkl # Newly generated kitti_infos_train.pkl
│ │ ├── kitti3d_infos_val.pkl # Newly generated kitti_infos_val.pkl
│ │ ├── kitti3d_dbinfos_train.pkl # Newly generated kitti_dbinfos_train.pkl
│ │ ├── kitti3d_infos_test.pkl # Newly generated kitti_infos_test.pkl
│ │ ├── kitti3d_infos_trainval.pkl # Newly generated kitti_infos_trainval.pkl
Also, to improve the speed of training, we did a package of data information files to convert them into lmdb format datasets.
The conversion can be successfully achieved by simply running the following script:
python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --split-name train --pack-type lmdb
python3 tools/datasets/kitti3d_packer.py --src-data-dir ./tmp_data/kitti3d/ --target-data-dir ./tmp_data/kitti3d --anno-file val --pack-type lmdb
The above two commands correspond to transforming the training dataset and the validation dataset, respectively.
After the packing is completed, the file structure in the data directory should look as follows:
├── tmp_data
│ ├──── kitti3d
│ │ ├── pack_data # Newly generated lmdb
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── ImageSets
│ │ │ ├── test.txt
│ │ │ ├── train.txt
│ │ │ ├── trainval.txt
│ │ │ ├── val.txt
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── kitti3d_gt_database
│ │ │ ├── xxxxx.bin
│ │ ├── kitti3d_infos_train.pkl
│ │ ├── kitti3d_infos_val.pkl
│ │ ├── kitti3d_dbinfos_train.pkl
│ │ ├── kitti3d_infos_test.pkl
│ │ ├── kitti3d_infos_trainval.pkl
train_lmdb and val_lmdb are the packaged training and validation datasets, which are also the final datasets read by the network.
kitti3d_gt_database and kitti3d_dbinfos_train.pkl are the samples that are used for sampling during training.
Floating-point Model Training
Once the dataset is ready, you can start training the PointPillars detection network.
Model Building
The network structure of PointPillars can refer to Paper , this is not described in detail here.
The entire process from model training to compilation is roughly as follows:

From the above figure, we know that three stages of models are mainly used, namely, Float Model , QAT Model, and Quantized Model: Where,
Float Model : General floating-point model.
QAT Model : Model for quantization aware training.
Quantized Model : Quantized model, with the parameter of INT8 type.
In addition, models with different structures (or states) are used in training, compilation, and other processes:
model: A complete model structure, including pre processing, network structure, and post processing of the model. It is mainly used for training and evaluation.
deploy_model : Only contains the network structure (which can be compiled into hbm), excluding pre processing and post processing. It is mainly used for compilation.
We define a PointPillarsModel class to define all the content related to the model structure, including the above three stages: Float Model, QAT Model, and Quantized Model, as well as two states: model and deploy_model:
class PointPillarsModel:
task_name = "pointpillars_kitti_car"
@classmethod
def model(cls):
"""model structure"""
model = cls._build_pp_model(cls, is_deploy=False)
return model
@classmethod
def deploy_model(cls):
"""deploy model, for compile"""
deploy_model = cls._build_pp_model(cls, is_deploy=True)
return deploy_model
@classmethod
def deploy_inputs(cls):
"""deploy inputs, for compile"""
deploy_inputs = dict( # noqa C408
points=[
torch.randn(150000, 4),
],
)
return deploy_inputs
@classmethod
def float_model(cls, pretrain_ckpt=None):
"""float model"""
model = cls.model()
if pretrain_ckpt:
ckpt_loader = LoadCheckpoint(
pretrain_ckpt,
allow_miss=False,
ignore_extra=False,
verbose=True,
)
model = ckpt_loader(model)
return model
@classmethod
def qat_model(cls, pre_step_ckpt=None, pretrain_ckpt=None):
"""QAT model"""
float_model = cls.float_model(pre_step_ckpt)
qat_model = Float2QAT()(float_model)
if pretrain_ckpt:
ckpt_loader = LoadCheckpoint(
pretrain_ckpt,
allow_miss=False,
ignore_extra=False,
verbose=True,
)
qat_model = ckpt_loader(qat_model)
return qat_model
@classmethod
def int_infer_model(cls, pre_step_ckpt=None, pretrain_ckpt=None, use_deploy=False):
if use_deploy: # for training
model = cls.deploy_model() # float model
else: # for prediction
model = cls.model()
model = Float2QAT()(model) # qat model
if pre_step_ckpt: # load QAT checkpoint
qat_ckpt_loader = LoadCheckpoint(
pre_step_ckpt,
allow_miss=False,
ignore_extra=False,
verbose=True,
)
model = qat_ckpt_loader(model) # qat model with state_dict
int_model = QAT2Quantize()(model)
if pretrain_ckpt:
int_ckpt_loader = LoadCheckpoint(
pretrain_ckpt,
allow_miss=False,
ignore_extra=False,
verbose=True,
)
int_model = int_ckpt_loader(model)
return int_model
def _build_pp_model(self, is_deploy=False):
"""model structure"""
# Voxelization cfg
pc_range = [0, -39.68, -3, 69.12, 39.68, 1]
voxel_size = [0.16, 0.16, 4.0]
max_points_in_voxel = 100
max_voxels_num = 12000
class_names = ["Car"]
def get_feature_map_size(point_cloud_range, voxel_size):
point_cloud_range = np.array(point_cloud_range, dtype=np.float32)
voxel_size = np.array(voxel_size, dtype=np.float32)
grid_size = (
point_cloud_range[3:] - point_cloud_range[:3]
) / voxel_size
grid_size = np.round(grid_size).astype(np.int64)
return grid_size
model = PointPillarsDetector(
feature_map_shape=get_feature_map_size(pc_range, voxel_size),
is_deploy=is_deploy,
pre_process=PointPillarsPreProcess(
pc_range=pc_range,
voxel_size=voxel_size,
max_voxels_num=max_voxels_num,
max_points_in_voxel=max_points_in_voxel,
),
reader=PillarFeatureNet(
num_input_features=4,
num_filters=(64,),
with_distance=False,
pool_size=(1, max_points_in_voxel),
voxel_size=voxel_size,
pc_range=pc_range,
bn_kwargs=None,
quantize=True,
use_4dim=True,
use_conv=True,
),
backbone=PointPillarScatter(
num_input_features=64,
use_horizon_pillar_scatter=True,
quantize=True,
),
neck=SECONDNeck(
in_feature_channel=64,
down_layer_nums=[3, 5, 5],
down_layer_strides=[2, 2, 2],
down_layer_channels=[64, 128, 256],
up_layer_strides=[1, 2, 4],
up_layer_channels=[128, 128, 128],
bn_kwargs=None,
quantize=True,
),
head=PointPillarsHead(
num_classes=len(class_names),
in_channels=sum([128, 128, 128]),
use_direction_classifier=True,
),
anchor_generator=Anchor3DGeneratorStride(
anchor_sizes=[[1.6, 3.9, 1.56]], # noqa B006
anchor_strides=[[0.32, 0.32, 0.0]], # noqa B006
anchor_offsets=[[0.16, -39.52, -1.78]], # noqa B006
rotations=[[0, 1.57]], # noqa B006
class_names=class_names,
match_thresholds=[0.6],
unmatch_thresholds=[0.45],
),
targets=LidarTargetAssigner(
box_coder=GroundBox3dCoder(n_dim=7),
class_names=class_names,
positive_fraction=-1,
),
loss=PointPillarsLoss(
num_classes=len(class_names),
loss_cls=FocalLossV2(
alpha=0.25,
gamma=2.0,
from_logits=False,
reduction="none",
loss_weight=1.0,
),
loss_bbox=SmoothL1Loss(
beta=1 / 9.0,
reduction="none",
loss_weight=2.0,
),
loss_dir=CrossEntropyLoss(
use_sigmoid=False,
reduction="none",
loss_weight=0.2,
),
),
postprocess=PointPillarsPostProcess(
num_classes=len(class_names),
box_coder=GroundBox3dCoder(n_dim=7),
use_direction_classifier=True,
num_direction_bins=2,
# test_cfg
use_rotate_nms=False,
nms_pre_max_size=1000,
nms_post_max_size=300,
nms_iou_threshold=0.5,
score_threshold=0.4,
post_center_limit_range=[0, -39.68, -5, 69.12, 39.68, 5],
max_per_img=100,
),
)
if is_deploy:
model.anchor_generator = None
model.targets = None
model.loss = None
model.postprocess = None
return model
All content related to the model has been defined in the PointPillarsModel in the above code.
When using it, you can easily obtain the corresponding model structure through PointPillarsModel.xxx().
After completing the definition of the network structure, we can use the following command to first check FLOPs and Params of the network:
python3 examples/pointpillars_no_hat_trainer.py --calops
Data Augmentation
Similar to the Model building section, we define the DataHelper to implement data related content, including transforms and data_Loader, etc.:
class DataHelper:
data_dir = "./tmp_data/kitti3d" # root path of dataset
train_batch_size = 2 # batch_size for training
val_batch_size = 1 # batch_size for evaluation
@classmethod
def train_data_loader(cls):
"""train dataloader"""
return cls.build_dataloader(cls, is_training=True)
@classmethod
def val_data_loader(cls):
"""val dataloader"""
return cls.build_dataloader(cls, is_training=False)
def build_dataloader(self, is_training=True):
"""dataloader"""
transforms = self.build_transforms(self, self.data_dir, is_training)
split_dir = "train_lmdb" if is_training else "val_lmdb"
dataset = Kitti3D(
data_path=os.path.join(self.data_dir, split_dir),
transforms=transforms,
)
if is_training:
sampler = torch.utils.data.DistributedSampler(dataset)
else:
sampler = None
dataloader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.train_batch_size if is_training else self.val_batch_size,
sampler=sampler,
shuffle=False,
num_workers=1,
pin_memory=True,
collate_fn=hat.data.collates.collate_kitti3d,
)
return dataloader
def build_transforms(self, data_dir, is_training=True):
"""transforms"""
class_names = ["Car"]
pc_range = [0, -39.68, -3, 69.12, 39.68, 1]
if is_training:
transforms = torchvision.transforms.Compose(
[
ObjectSample(
class_names=class_names,
remove_points_after_sample=False,
db_sampler=DataBaseSampler(
enable=True,
root_path=data_dir,
db_info_path=os.path.join(
data_dir, "kitti3d_dbinfos_train.pkl"
), # noqa E501
sample_groups=[dict(Car=15)], # noqa C408
db_prep_steps=[ # noqa C408
dict( # noqa C408
type="DBFilterByDifficulty",
filter_by_difficulty=[-1],
),
dict( # noqa C408
type="DBFilterByMinNumPoint",
filter_by_min_num_points=dict( # noqa C408
Car=5,
),
),
],
global_random_rotation_range_per_object=[0, 0],
rate=1.0,
),
),
ObjectNoise(
gt_rotation_noise=[-0.15707963267, 0.15707963267],
gt_loc_noise_std=[0.25, 0.25, 0.25],
global_random_rot_range=[0, 0],
num_try=100,
class_names=class_names,
),
PointRandomFlip(probability=0.5),
PointGlobalRotation(rotation=[-0.78539816, 0.78539816]),
PointGlobalScaling(min_scale=0.95, max_scale=1.05),
ShufflePoints(True),
ObjectRangeFilter(point_cloud_range=pc_range),
LidarReformat(),
]
)
else:
transforms = torchvision.transforms.Compose([Reformat()])
return transforms
Training Strategy
To train a model with high accuracy, a good training strategy is essential.
For each training task, the training strategies for models at different stages (float, QAT) may vary slightly, so we also define the training strategy content (such as, optimizer and lr_schedule) in PointPillarsModel:
class PointPillarsModel:
...
@classmethod
def optimizer(cls, model, stage):
"""optimizer setting for training"""
if stage == "float":
optimizer = torch.optim.AdamW(
params=model.parameters(),
betas=(0.95, 0.99),
lr=2e-4,
weight_decay=0.01,
)
elif stage == "qat":
optimizer = torch.optim.SGD(
params=model.parameters(),
lr=2e-4,
momentum=0.9,
weight_decay=0.0,
)
else:
optimizer = None
return optimizer
@classmethod
def lr_schedule(cls, stage):
"""lr schedule setting for training"""
if stage == "float":
lr_updater = CyclicLrUpdater(
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
step_log_interval=50,
)
elif stage == "qat":
lr_updater = CyclicLrUpdater(
target_ratio=(10, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
step_log_interval=50,
)
else:
lr_updater = None
return lr_updater
@classmethod
def val_metrics(cls):
"""Metric for evaluation"""
class_names = ["Car"]
val_metrics = Kitti3DMetricDet(
compute_aos=True,
current_classes=class_names,
difficultys=[0, 1, 2],
)
return val_metrics
Note
If you need to reproduce the accuracy, it is best not to modify the training strategy in the sample code. Otherwise, unexpected training situations may arise.
Through the above introduction, we have completed the definition of all modules related to model training.
However, before starting training, we also need to have a model training framework (for example, we can use the built-in training framework of HAT, or any other framework to complete the training).
For example, the following example code is based on Pytorch official tutorial and Pytorch open source code , with a few modifications, a "training framework" that simply supports single machine multi-gpu DDP is built:
def eval(model, val_dataloader, val_metrics, device=None):
model.eval()
for batch_data in tqdm.tqdm(val_dataloader):
batch_data = to_cuda(model, device=device)
pred_outs = model(batch_data)
val_metrics.update(pred_outs, batch_data)
metric_names, metric_values = val_metrics.get()
log_info = "\n"
for name, value in zip(metric_names, metric_values):
if isinstance(value, (int, float)):
log_info += "%s[%.2f] " % (name, value)
else:
log_info += "%s[%s] " % (str(name), str(value))
log_info += "\n"
logger.info(log_info)
return metric_values[0]
class Trainer:
def __init__(
self,
model: torch.nn.Module,
stage: str,
train_data: DataLoader,
optimizer: torch.optim.Optimizer,
lr_schedule: Any,
gpu_id: int,
save_every: int = 1,
log_freq: int = 100,
output_dir: str = "./tmp_models",
) -> None:
self.gpu_id = gpu_id
self.model = model.to(gpu_id)
self.stage = stage
self.train_data = train_data
self.optimizer = optimizer
self.save_every = save_every
self.model = DDP(model, device_ids=[gpu_id])
self.lr_schedule = lr_schedule
self.output_dir = output_dir
self.log_freq = log_freq
def _run_epoch(self, epoch):
self.model.train()
self.train_data.sampler.set_epoch(epoch)
for batch_id, batch_data in enumerate(self.train_data):
batch_data = to_cuda(batch_data, device=self.gpu_id)
# one batch
self.optimizer.zero_grad()
self.lr_schedule.on_step_begin(self.global_step_id)
output = self.model(batch_data)
loss = sum([v for v in output.values()])
loss.backward()
self.optimizer.step()
# log
if (batch_id + 1) % self.log_freq == 0:
loss_str = ""
for k, v in output.items():
loss_str += f" {k} [{round(v.item(), 4)}]"
loss_log = "Epoch[%d] Step[%d] GlobalStep[%d]: %s" % (
epoch,
batch_id,
self.global_step_id,
loss_str,
)
# only log on gpu 0
if self.gpu_id == 0:
logger.info(loss_log)
self.global_step_id += 1
def _save_checkpoint(self, epoch):
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
state = {
"epoch": epoch,
"state_dict": self.model.module.state_dict(),
}
ckpt_file = os.path.join(
self.output_dir, f"{self.stage}-checkpoint-best.pth.tar"
)
torch.save(state, ckpt_file)
logger.info(f"Epoch {epoch} | Training checkpoint saved at {ckpt_file}")
def train(self, max_epochs: int):
self.global_step_id = 0
self.lr_schedule.on_loop_begin(self.optimizer, self.train_data, max_epochs)
for epoch in range(max_epochs):
self._run_epoch(epoch)
# validation and save checkpoint
if (epoch + 1) % self.save_every == 0 or (epoch + 1) == max_epochs:
if self.gpu_id == 0:
# validation
eval(self.model, DataHelper.val_data_loader(), self.gpu_id)
# save checkpoint
self._save_checkpoint(epoch)
def train_entrance(
rank: int,
world_size: int,
model,
stage,
march,
output_dir,
total_epochs,
):
# set distribute
dist.init_process_group(
backend="NCCL",
init_method="tcp://localhost:%s" % "12345",
world_size=world_size,
rank=rank,
)
horizon.march.set_march(march)
train_data = DataHelper.train_data_loader()
optimizer = PointPillarsModel.optimizer(model, stage)
lr_schedule = PointPillarsModel.lr_schedule(stage)
trainer = Trainer(
model,
stage,
train_data,
optimizer,
lr_schedule,
rank,
output_dir=output_dir,
)
trainer.train(total_epochs)
dist.destroy_process_group()
def train(
model,
stage,
march,
device_ids,
output_dir,
total_epochs,
):
os.environ["CUDA_VISIBLE_DEVICES"] = device_ids
device_ids = device_ids.split(",")
world_size = len(device_ids)
mp.spawn(
train_entrance,
args=(
world_size,
model,
stage,
march,
output_dir,
total_epochs,
),
nprocs=world_size,
)
if __name__ == '__main__':
train(
model,
stage,
march,
device_ids,
output_dir,
total_epochs,
)
At this point, we have completed the required content for model training and evaluation, and readers can view the complete code in the example.
Next, you can train a high-precision pure floating-point detection model.
Of course, training a good detection model is not our ultimate goal, it is just a pre-training for our future training of fixed-point models.
python3 examples/pointpillars_no_hat_trainer.py --stage "float" --device-ids "0,1,2,3" --train
Quantized Model Training
When we have a floating-point model, we can start training the corresponding fixed-point model.
In the same way as floating-point training, we can train a fixed-point model just by running the following script:
python3 examples/pointpillars_no_hat_trainer.py --stage "qat" --device-ids "0,1,2,3" --train
When building the model structure section above, we have learned that there are some differences between the Float Model and the QAT Model, mainly reflected in the following aspects:
The Value of the Quantize Parameter is Different
When we train the quantized model, we need to set quantize=True.
At this time, the corresponding floating-point model will be converted into a quantized model. The code is as follows:
model.fuse_model()
model.set_qconfig()
horizon.quantization.prepare_qat(model, inplace=True)
For the key steps in quantization training, such as preparing floating-point models, operator replacement, inserting quantization and dequantization nodes, setting quantization parameters, and operator fusion,
please read the Quantized Awareness Training (QAT) section.
Different Training Strategies
As we said before, quantization training is actually finetue on the basis of pure floating-point training.
Therefore, when quantized training, our initial learning rate is set to one-tenth of the floating-point training, the number of epochs for training is also greatly reduced, and most importantly,
when model is defined, our pretrained needs to be set to the address of the pure floating-point model that has been trained.
After making these simple adjustments, we can start training our quantized model.
Model Validation
After the model is trained, we can also verify the performance of the trained model.
Since we provide two stages of training process, float and qat, we can verify the performance of the model trained in these two stages.
It is only necessary to run the following two commands accordingly:
python3 examples/pointpillars_no_hat_trainer.py --predict --stage "float" --device-ids "0" --ckpt ${float-checkpoint-file}
python3 examples/pointpillars_no_hat_trainer.py --predict --stage "qat" --device-ids "0" --ckpt ${qat-checkpoint-file}
At the same time, we also provide a performance test of the quantization model, just run the following command:
python3 examples/pointpillars_no_hat_trainer.py --predict --stage "int_infer" --device-ids "0" --ckpt ${int_infer-checkpoint-file}
The displayed accuracy is the real accuracy of the final int8 model.
Of course, this accuracy should be very close to the accuracy of the qat verification stage.
Align BPU Validation
In addition to the model validation described above, we offer an accuracy validation method that is identical to the board side, you can refer to the following:
python3 examples/pointpillars_no_hat_trainer.py --device-ids "0" --align-bpu-validation
Results Visualization
If you want to see the detection effect of the trained model for a single frame radar point cloud, we also provide point cloud prediction and visualization scripts in our tools folder, you just need to run the following script.
python3 examples/pointpillars_no_hat_trainer.py --device-ids "0" --visualize --input-points ${lidar-pointcloud-path} --is-plot
Model Checking and Compilation
After training, the quantized model can be compiled into an hbm file that can be run on the board by using the compile tool.
At the same time, the tool can also estimate the running performance on the BPU.
The following scripts can be used:
python3 examples/pointpillars_no_hat_trainer.py --compile --opt 3