FCOS3D Detection Model Training
This tutorial shows how to train a fixed-point 3D detection model using the HAT algorithm toolkit by using FCOS3D-efficientnetb0 as an example.
Before starting the quantization aware training, namely, the fixed-point model training, you need to first train a pure floating-point model with high accuracy, by finetuning this pure floating-point model, you can train the fixed-point model quickly.
Let's start from training a pure floating-point FCOS3D-efficientnetb0 model.
Dataset Preparation
Before starting to train the model, the first step is to prepare the dataset.
Here we use nuscenes to train FCOS3D. After unpacking The data directory structure is shown as below:
tmp_data
|-- nuscenes
|-- v1.0-mini.tar
|-- v1.0-test_blobs.tar
|-- v1.0-test_meta.tar
|-- v1.0-trainval01_blobs.tar
|-- ...
|-- v1.0-trainval10_blobs.tar
|-- v1.0-trainval_meta.tar
|-- can_bus
|-- maps
|-- meta
|-- samples
|-- sweeps
|-- v1.0-mini
|-- v1.0-test
|-- v1.0-trainval
Also, to improve the training speed, we packaged the original jpg format dataset and converted it to the lmdb format.
The conversion can be successfully achieved by simply running the following script:
python3 tools/datasets/nuscenes_packer.py --src-data-dir ./tmp_data/nuscenes --pack-type lmdb --split-name train --version v1.0-trainval
python3 tools/datasets/nuscenes_packer.py --src-data-dir ./tmp_data/nuscenes --pack-type lmdb --split-name val --version v1.0-trainval
These two commands are for training dataset conversion and validation dataset conversion respectively. When the packing is done, the file structure of data should be as below:
tmp_data
|-- nuscenes
|-- train_lmdb
|-- val_lmdb
|-- meta
The above train_lmdb and val_lmdb are the packed training and validation datasets, which are also the final datasets read by the network.
Floating-point Model Training
Once datasets are ready, you can start training the floating-point FCOS3D-efficientnetb0 detection network.
If you simply want to start such a training task, just run the following command:
python3 tools/train.py --stage float --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
Since the HAT algorithm toolkit uses an ingenious registration mechanism, each training task can be started in the form of train.py plus a config file.
train.py is a uniform training script and independent of the task.
The task we need to train, the dataset we need to use, and the hyperparameters we need to set for the training are all in the specified config file.
The config file provides the key dict for model building, data reading, etc.
Model Building
The network structure of retinanet can be found in the Paper and here we will skip the details.
We can easily define and modify the model by defining a dict type variable like model in the config file.
model = dict(
type="FCOS3D",
backbone=dict(
type="efficientnet",
bn_kwargs=bn_kwargs,
model_type="b0",
num_classes=1000,
include_top=False,
activation="relu",
use_se_block=False,
),
neck=dict(
type="BiFPN",
in_strides=[2, 4, 8, 16, 32],
out_strides=[8, 16, 32, 64, 128],
stride2channels=dict({2: 16, 4: 24, 8: 40, 16: 112, 32: 320}),
out_channels=64,
num_outs=5,
stack=3,
start_level=2,
end_level=-1,
fpn_name="bifpn_sum",
),
head=dict(
type="FCOS3DHead",
num_classes=10,
in_channels=64,
feat_channels=256,
stacked_convs=2,
strides=[8, 16, 32, 64, 128],
group_reg_dims=(2, 1, 3, 1, 2), # offset, depth, size, rot, velo
use_direction_classifier=True,
pred_attrs=True,
num_attrs=9,
cls_branch=(256,),
reg_branch=(
(256,), # offset
(256,), # depth
(256,), # size
(256,), # rot
(), # velo
),
dir_branch=(256,),
attr_branch=(256,),
centerness_branch=(64,),
centerness_on_reg=True,
return_for_compiler=False,
output_int32=True,
),
targets=dict(
type="FCOS3DTarget",
num_classes=10,
background_label=None,
bbox_code_size=9,
regress_ranges=((-1, 48), (48, 96), (96, 192), (192, 384), (384, INF)),
strides=[8, 16, 32, 64, 128],
pred_attrs=True,
num_attrs=9,
center_sampling=True,
center_sample_radius=1.5,
centerness_alpha=2.5,
norm_on_bbox=True,
),
post_process=dict(
type="FOCS3DPostProcess",
num_classes=10,
use_direction_classifier=True,
strides=[8, 16, 32, 64, 128],
group_reg_dims=(2, 1, 3, 1, 2),
pred_attrs=True,
num_attrs=9,
attr_background_label=9,
bbox_coder=dict(type="FCOS3DBBoxCoder", code_size=9),
bbox_code_size=9,
dir_offset=0.7854,
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_pre=1000,
nms_thr=0.8,
score_thr=0.05,
min_bbox_size=0,
max_per_img=100,
),
),
loss=dict(
type="FCOS3DLoss",
num_classes=10,
pred_attrs=True,
group_reg_dims=(2, 1, 3, 1, 2),
num_attrs=9,
pred_velo=True,
use_direction_classifier=True,
dir_offset=0.7854,
dir_limit_offset=0,
diff_rad_by_sin=True,
loss_cls=dict(
type="FocalLoss",
num_classes=11,
gamma=2.0,
alpha=0.25,
),
loss_bbox=dict(type="SmoothL1Loss", beta=1.0 / 9.0, loss_weight=1.0),
loss_dir=dict(
type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0
),
loss_attr=dict(
type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0
),
loss_centerness=dict(
type="CrossEntropyLoss",
use_sigmoid=True,
loss_weight=1.0,
),
train_cfg=dict(
allowed_border=0,
code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.05, 0.05],
pos_weight=-1,
debug=False,
),
),
)
The type under model is the name of the defined model, and the remaining variables stand for the other components of the model.
By defining the model in this way, we can easily replace the structure we want.
For example, if we want to train a model with a backbone of resnet50, we just need to replace backbone under model.
Data Enhancement
Like the definition of model, the data enhancement process is implemented by defining two dicts, data_loader and val_data_loader, in the config file, corresponding to the processing process of the training and validation sets, respectively. Take data_loader as an example:
data_loader = dict(
type=torch.utils.data.DataLoader,
dataset=dict(
type="NuscenesMonoDataset",
data_path="./tmp_data/nuscenes/train_lmdb/",
transforms=[
dict(
type="Pad",
divisor=128,
),
dict(
type="ToTensor",
to_yuv=True,
),
dict(
type="Normalize",
mean=128.0,
std=128.0,
),
],
),
sampler=dict(type=torch.utils.data.DistributedSampler),
batch_size=batch_size_per_gpu,
shuffle=True,
num_workers=8,
pin_memory=True,
collate_fn=hat.data.collates.collate_2d,
)
Training Strategy
A good training strategy is is essential for training a model with high accuracy.
For each training task, the corresponding training strategy is also defined in the config file, as can be seen from the float_trainer variable.
float_trainer = dict(
type="distributed_data_parallel_trainer",
model=model,
model_convert_pipeline=dict(
type="ModelConvertPipeline",
converters=[
dict(
type="LoadCheckpoint",
checkpoint_path=(
"./tmp_pretrained_models/efficientnet_imagenet/float-checkpoint-best.pth.tar" # noqa: E501
),
allow_miss=True,
ignore_extra=True,
),
],
),
data_loader=data_loader,
optimizer=dict(
type=torch.optim.SGD,
params={"weight": dict(weight_decay=5e-5)},
lr=0.001,
momentum=0.9,
),
batch_processor=batch_processor,
num_epochs=12,
device=None,
callbacks=[
stat_callback,
loss_show_update,
dict(
type="StepDecayLrUpdater",
warmup_len=0.3,
lr_decay_id=[8, 11],
step_log_interval=10,
),
ckpt_callback,
],
train_metrics=dict(
type="LossShow",
),
sync_bn=True,
)
float_trainer defines the training approach in general, including the use of multi-card distributed training (distributed_data_parallel_trainer), the number of epochs for model training, and the choice of optimizer.
The callbacks reflects the small strategies used by the model during the training and the operations that the user wants to implement, including the way to transform the learning rate (WarmupStepLrUpdater), the metrics to validate the model during training (Validation), and the operations to save (Checkpoint) the model.
Of course, if you have your own operations that you want the model to implement during the training, you can also add them in this way using dict.
The above information may have give you a clearer understanding of the functions of the config file.
The training script mentioned earlier can help you train a pure floating point detection model wih high accuracy.
However, a well-trained detection model is not our goal, it is just a pretrain for us to train a fixed-point model later.
Quantitative Model Training
With a pure floating-point model in place, we can start training the corresponding fixed-point model.
Similar to the floating-point training, we can train the fixed-point model simply by running the following script:
python3 tools/train.py --stage calibration --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
python3 tools/train.py --stage qat --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
As you can see, the configuration file is not changed except the stage type.
At this point, the training strategy used comes from the qat_trainer and calibration_trainer in the config file.
calibration_trainer = dict(
type="Calibrator",
model=model,
model_convert_pipeline=dict(
type="ModelConvertPipeline",
qat_mode="fuse_bn",
qconfig_params=dict(
activation_calibration_observer="min_max",
),
converters=[
dict(
type="LoadCheckpoint",
checkpoint_path=os.path.join(
ckpt_dir, "float-checkpoint-best.pth.tar"
),
),
dict(type="Float2Calibration", convert_mode=convert_mode),
],
),
data_loader=calibration_data_loader,
batch_processor=calibration_batch_processor,
num_steps=calibration_step,
device=None,
callbacks=[
stat_callback,
val_callback,
ckpt_callback,
],
val_metrics=dict(
type="NuscenesMonoMetric",
data_root=meta_rootdir,
version="v1.0-trainval",
),
log_interval=calibration_step / 10,
)
qat_trainer = dict(
type="distributed_data_parallel_trainer",
model=model,
model_convert_pipeline=dict(
type="ModelConvertPipeline",
qat_mode="fuse_bn",
qconfig_params=dict(
activation_qat_qkwargs=dict(
averaging_constant=0,
),
weight_qat_qkwargs=dict(
averaging_constant=1,
),
),
converters=[
dict(type="Float2QAT", convert_mode=convert_mode),
dict(
type="LoadCheckpoint",
checkpoint_path=os.path.join(
ckpt_dir, "calibration-checkpoint-best.pth.tar"
),
),
],
),
data_loader=data_loader,
optimizer=dict(
type=torch.optim.AdamW,
params={"weight": dict(weight_decay=0.01)},
lr=1e-6,
),
batch_processor=batch_processor,
num_epochs=10,
device=None,
callbacks=[
stat_callback,
loss_show_update,
val_callback,
ckpt_callback,
],
sync_bn=True,
train_metrics=dict(
type="LossShow",
),
val_metrics=dict(
type="NuscenesMonoMetric",
data_root=meta_rootdir,
version="v1.0-trainval",
),
)
With Different model_convert_pipeline Parameters
By setting model_convert_pipeline when training quantitative models, the corresponding floating-point model can be converted into a quantitative model, as below:
model_convert_pipeline=dict(
type="ModelConvertPipeline",
qat_mode="fuse_bn",
converters=[
dict(type="Float2QAT", convert_mode=convert_mode),
dict(
type="LoadCheckpoint",
checkpoint_path=os.path.join(
ckpt_dir, "qat-checkpoint-best.pth.tar"
),
),
dict(type="QAT2Quantize", convert_mode=convert_mode),
],
)
For key steps in quantitative training, e.g., preparing the floating-point model, operator substitution, inserting quantization and inverse quantitative nodes, setting quantitative parameters, and operator fusion, etc.,
please read the Quantized Awareness Training (QAT) section.
With Different Training Strategies
As previously mentioned, the quantitative training is actually the finetuning based on the pure floating-point training.
Therefore, in the quantitative training, set the initial learning rate is to one-tenth of the floating-point training, and the number of epochs of the training will greatly decrease as well.
The most important thing is that when defining model, we need to set pretrained to the address of the trained pure floating-point model.
After these simple adjustments, we can start training our quantitative model.
Export FixedPoint Model
Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:
python3 tools/export_hbir.py --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
Model Validation
After the model is trained, we can also validate the performance of the trained model.
Since we provide two stages of training, float, calibration and qat, we can validate the performance of the trained model in these stages.
python3 tools/predict.py --stage float --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
python3 tools/predict.py --stage calibration --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
python3 tools/predict.py --stage qat --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
Also, we provide performance tests for the quantitative model by running the following command, but it should be noted that hbir must be exported first:
python3 tools/predict.py --stage "int_infer" --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
This displayed accuracy is the real accuracy of the final int8 model, which of course should be very close to the accuracy of the qat verification phase.
Simulation of On-board Accuracy Validation
In addition to the above model validation, we also provide the exact same accuracy validation method simulating the on-board conditions, as below:
python3 tools/validation_hbir.py --stage "align_bpu" --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py
Result Visualization
If you want to see the results of the trained model detecting a single image, we also provide scripts for single image prediction and visualization under our tools folder.
Run the following script:
python3 tools/infer_hbir.py -c configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py --model-inputs input_image:${img-path},input_ann:${ann-path},score_th:${score_th} --save-path ${save_path}
Model Checking and Compilation
After the training, you can use the compile_perf_hbir tool to compile the quantitative model into a board-ready hbm file.
The compile tool can also predict the on-computing platform running performance.
Run the following script:
python3 tools/compile_perf_hbir.py --config configs/detection/fcos3d/fcos3d_efficientnetb0_nuscenes.py