Car Keypoint Detection Model Training

This tutorial primarily focuses on how to train a key point detection model from scratch using HAT on the CarFusion car key point dataset. The tutorial covers floating-point, quantization, and fixed-point models.

CarFusion is a car keypoint dataset that includes annotations for 12 keypoints on cars: the centers of the front, rear, left, and right wheels, as well as the positions of the car lights and the corners of the car roof. Before starting the model training, the first step is to prepare the dataset. To do this, we need to apply for the dataset on the official website. The official website link for CarFusion is: CarFusion. After scrolling down, you will find the dataset application form. Once you submit the form, you will receive an email with the dataset download link. After downloading and extracting the dataset, the directory structure will be as follows:

CarFusion |-- train |-- car_butler1 |-- car_butler2 |-- car_craig1 |-- car_craig2 |-- car_fifth1 |-- car_fifth2 |-- car_morewood1 |-- car_morewood2 |-- test |-- car_penn1 |-- car_penn2

Here, we only consider a simpler scenario, which is to detect keypoints from pre-detected car images. Therefore, we need to first crop the cars from the images based on the annotated bounding boxes.

To convert the data into the cropped format, you can simply run the following command:

python3 tools/dataset_converters/gen_carfusion_data.py --src-data-path ${data-dir} --out-dir ${cropped-data-dir} --num-workers 10

The directory structure of the processed cropped data will be as follows:

cropped-data-dir |-- train |-- car_butler1 |-- car_butler2 |-- car_craig1 |-- car_craig2 |-- car_fifth1 |-- car_fifth2 |-- car_morewood1 |-- car_morewood2 |-- test |-- car_penn1 |-- car_penn2 |-- simple_anno |-- keypoints_train.json |-- keypoints_test.json

The file "keypoints_test.json" contains annotations for the test dataset, while "keypoints_train.json" contains annotations for the training dataset. Each sample is stored in JSON format as a dictionary with the following structure: "img_path": keypoints_list. The "keypoints_list" has a shape of (12, 3), where each row represents a keypoint. The first two elements of each row are the x and y coordinates of the keypoint, and the third element indicates the validity of the keypoint. If the keypoint is outside the image or is invalid, the third element is set to 0.

Next, you can package the data into a LMDB format dataset. You can successfully accomplish this conversion by running the following script:

python3 tools/datasets/carfusion_packer.py --src-data-dir ${cropped-data-dir} --target-data-dir ${pack-data-dir} --split-name train --pack-type lmdb --num-workers 10 python3 tools/datasets/carfusion_packer.py --src-data-dir ${cropped-data-dir} --target-data-dir ${pack-data-dir} --split-name test --pack-type lmdb --num-workers 10

After the packaging is completed, the file structure in the directory should look as follows:

tmp_data |-- CarFusion |-- test_lmdb |-- train_lmdb

train_lmdb and val_lmdb are the training and validation datasets, respectively, after packaging. Now you can proceed to train the model using these datasets.

Model Training

configs/keypoint/keypoint_efficientnetb0_carfusion.py includes all the settings related to model training in this tutorial.

Before starting the network training, you can use the following command to calculate the computational complexity and the number of parameters in the network:

python3 tools/calops.py --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

The next step is to start the training. Training can also be accomplished using the following script. Before training, make sure to confirm that the dataset paths in the configuration have been switched to the packaged dataset path.

python3 tools/train.py --stage "float" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py python3 tools/train.py --stage "calibration" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

Since the HAT algorithm package utilizes a registration mechanism, it allows each training task to follow this pattern, it allows each training task to follow this pattern, using train.py and config file to launch. Here train.py is a unified training script, that is irrelevant to the task. The type of task we need to train, the dataset to be used, and the hyperparameter settings for training are all specified in the designated config configuration file. In the command above, the parameter following --stage can be set to float or calibration . These options correspond to training a floating-point model and training a quantized model, respectively. Training a quantized model relies on the floating-point model generated from the previous floating-point training step.

Exporting Fixed-point Model

After completing quantization training, you can proceed to export the fixed-point model. You can use the following command to perform the export:

python3 tools/export_hbir.py --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

Model Validation

After completing the training, you will obtain the trained floating-point and quantized models. Similar to the training process, you can use the same method to perform metric evaluation on the trained models. This evaluation will provide metrics labeled as Float and QAT corresponding to the floating-point and quantized models, respectively.

python3 tools/predict.py --stage "float" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py python3 tools/predict.py --stage "calibration" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

Similar to the training process, when the parameter following --stage is set to float or calibration , tools/predict.py can be used to perform validation on the trained floating-point model or quantized model respectively.

For the fix-point model validation, we could use following command, but it should be noted that hbir must be exported first:

python3 tools/predict.py --stage "int_infer" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

Model Inference

HAT provides the infer_hbir.py script, which allows visualization of the inference results of the integer model.

python3 tools/infer_hbir.py --config configs/keypoint/keypoint_efficientnetb0_carfusion.py --model-inputs img:${img1-path} --save-path ${save_path}

Simulation On-board Accuracy Validation

In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:

python3 tools/validation_hbir.py --stage "align_bpu" --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

Fixed-point Model Check and Compilation

As the quantitative training toolchain integrated in HAT is mainly prepared for Horizon's processors, it is a must to check and compile the quantitative models.

We provide a script for model checking in HAT, allowing you to define the quantitative model and then check whether it can work properly on the BPU .

python3 tools/model_checker.py --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into a hbm file that supports on-board running. This tool can also predict the model performance on the BPU .

python3 tools/compile_perf_hbir.py --config configs/keypoint/keypoint_efficientnetb0_carfusion.py

The above is the whole process from data preparation to the generation of quantitative and deployable models.

Training Details

In this explanation, we will outline some considerations to be aware of during model training, primarily focusing on relevant settings in the config file.

Network Structure

For the lightweight car keypoint detection task, the network model HeatmapKeypointModel utilizes efficientnet-b0 as the backbone. It adds three transpose convolutional layers to generate heatmaps, from which the keypoint coordinates are decoded. By defining a dictionary variable, such asmodelin theconfigconfiguration file, we can easily define and modify the model.

model = dict( type="HeatmapKeypointModel", backbone=dict( type="efficientnet", model_type="b0", num_classes=1, bn_kwargs={}, activation="relu", use_se_block=False, include_top=False, ), decode_head=dict( type="DeconvDecoder", in_channels=320, out_channels=NUM_LDMK, input_index=4, num_conv_layers=3, num_deconv_filters=[128, 128, 128], num_deconv_kernels=[4, 4, 4], final_conv_kernel=3, ), loss=dict(type="MSELoss", reduction="mean"), post_process=dict( type="HeatmapDecoder", scale=4, mode="averaged", ), )

The model consists of the backbone , decode_head composed of transpose convolutions, losses , and post_process modules. In the HeatmapKeypointModel , the input is cropped car images. The backbone is responsible for extracting image features, while the decoder upsamples and generates heatmaps. The losses module utilizes the weighted MSELoss based on the heatmap positions as the training loss. The post_process module uses the HeatmapDecoder to convert the heatmap output into predicted keypoint locations.

Data Augmentation

Same as the definition of model , the data augmentation pipeline is achieved by defining data_loader and val_data_loader two dicts in the config file, which corresponds to the process pipelines in train and test set.

Taking the example of the data_loader, data augmentation techniques used include RandomFlip , Resize , RandomPadLdmkData , and GaussianNoise . For the keypoint detection task, it is also necessary to use GenerateHeatmapTarget to generate heatmap targets from the keypoint annotations.

data_loader = dict( type=torch.utils.data.DataLoader, dataset=dict( type="CarfusionPackData", data_path=f"{data_root}/train_lmdb", transforms=[ dict(type="RandomFlip", px=0.5), dict( type="Resize", img_scale=image_size, keep_ratio=True, ), dict( type="RandomPadLdmkData", size=image_size, ), dict( type="AddGaussianNoise", prob=0.2, mean=0, sigma=2, ), dict( type="GenerateHeatmapTarget", num_ldmk=NUM_LDMK, feat_stride=4, heatmap_shape=(32, 32), sigma=1.0, ), dict( type="ToTensor", to_yuv=True, use_yuv_v2=False, ), dict( type="Normalize", mean=128.0, std=128.0, ), ], ), sampler=dict(type=torch.utils.data.DistributedSampler), batch_size=batch_size_per_gpu, shuffle=True, num_workers=8, pin_memory=True, collate_fn=hat.data.collates.collate_2d, )

Because the model that runs on the BPU eventually uses YUV444 images as input, while regular training images are typically in RGB format, HAT provides the to_yuv=True option in the ToTensor data transform to convert RGB format images to YUV444 format.

HAT also offers the batch_processor interface for batch processing the data, but no additional augmentation is added in this case. The loss_collector is a function that retrieves the loss for the current batch data. As the model returns a tuple (pred, loss) , the loss value is obtained by indexing the tuple's element at index 1.

batch_processor = dict( type="MultiBatchProcessor", need_grad_update=True, loss_collector=collect_loss_by_index(1), )

The data transformation for the validation set is relatively simpler, as shown below:

val_data_loader = dict( type=torch.utils.data.DataLoader, dataset=dict( type="CarfusionPackData", data_path=f"{data_root}/test_lmdb", transforms=[ dict( type="Resize", img_scale=image_size, keep_ratio=True, ), dict( type="RandomPadLdmkData", size=image_size, random=False, ), dict( type="ToTensor", to_yuv=True, use_yuv_v2=False, ), dict( type="Normalize", mean=128.0, std=128.0, ), ], ), batch_size=batch_size_per_gpu, sampler=dict(type=torch.utils.data.DistributedSampler), shuffle=False, num_workers=8, pin_memory=True, collate_fn=hat.data.collates.collate_2d, ) val_batch_processor = dict( type="MultiBatchProcessor", need_grad_update=False, )

Training Strategy

In the configs/keypoint/keypoint_efficientnetb0_carfusion.py file, the float_trainer and calibration_trainer correspond to the training strategies for floating-point, and quantized models, respectively. Below is an example of the float_trainer training strategy:

float_trainer = dict( type="distributed_data_parallel_trainer", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", converters=[ dict( type="LoadCheckpoint", checkpoint_path=pretrain_model_path, allow_miss=True, ignore_extra=True, ), ], ), data_loader=data_loader, optimizer=dict(type=torch.optim.AdamW, lr=0.001, weight_decay=5e-2), batch_processor=batch_processor, num_epochs=30, device=None, callbacks=[ stat_callback, loss_show_update, dict( type="CosLrUpdater", warmup_len=0, warmup_by="epoch", step_log_interval=100, ), val_callback, ckpt_callback, ], train_metrics=[ dict( type="LossShow", ), ], sync_bn=True, val_metrics=[ dict(type="PCKMetric", alpha=0.1, feat_stride=4, img_shape=image_size), dict( type="MeanKeypointDist", feat_stride=4, ), ], )

The float_trainer defines our overall training approach, including the use of multi-GPU distributed training (distributed_data_parallel_trainer), the number of epochs for model training, and the choice of optimizer.

The model_convert_pipeline defines the transformation operations before the model starts training. In this case, the model first loads the pre-trained model of Efficientnet-b0 on ImageNet .

The callbacks define the strategies and operations used during the training process, including the learning rate variation CosLrUpdater , validation of model metrics (Validation), and saving the model (Checkpoint). If you have any specific operations you want the model to perform during training, you can add them in this dictionary format.

The train_metrics and val_metrics define the metrics to be monitored during model training and validation, respectively.

In summary, the float_trainer is responsible for connecting the entire logic of floating-point model training.

Quantization Model Calibration

Once we have the pure floating-point model, we can proceed to the quantization process. We first use some data samples to calculate the scale parameters of each layer of the model quantization through the Calibration operation, so as to perform int8 quantization on the model. The relevant config is:

calibration_data_loader = copy.deepcopy(data_loader) calibration_data_loader.pop("sampler") # Calibration do not support DDP or DP calibration_data_loader["batch_size"] = batch_size_per_gpu * 4 calibration_data_loader["dataset"]["transforms"] = val_data_loader["dataset"][ "transforms" ] calibration_batch_processor = copy.deepcopy(val_batch_processor) calibration_step = 100 calibration_trainer = dict( type="Calibrator", model=model, model_convert_pipeline=dict( type="ModelConvertPipeline", qat_mode="fuse_bn", qconfig_params=dict( activation_calibration_observer="percentile", activation_calibration_qkwargs=dict( percentile=99.975, ), ), converters=[ dict( type="LoadCheckpoint", checkpoint_path=( os.path.join(ckpt_dir, "float-checkpoint-best.pth.tar") ), ), dict(type="Float2Calibration", convert_mode=convert_mode), ], ), data_loader=calibration_data_loader, batch_processor=calibration_batch_processor, num_steps=calibration_step, device=None, callbacks=[ stat_callback, val_callback, ckpt_callback, ], val_metrics=[ dict(type="PCKMetric", alpha=0.1, feat_stride=4, img_shape=image_size), dict( type="MeanKeypointDist", feat_stride=4, ), ], log_interval=calibration_step / 10, )

The converter defines the conversion steps before model calibration. The model is first loaded as a floating-point model and then, through the Float2Calibration operation, pseudo-quantization nodes are inserted into each layer of the model, transforming it into a calibration model.

After the calibration process, the "calibration" model achieves performance of over 99% compared to the floating-point model. Therefore, there is no need for quantization training, and we can proceed with inference using the model converted to a fixed-point model.

Model Checking & Compilation & Simulation of On-board Accuracy Validation

For HAT, the significance of quantized models lies in their ability to run directly on the BPU (Binary Processing Unit). Therefore, model checking and compilation for quantized models are essential steps.

The compile_perf_hbir script mentioned above also allows the user to define the quantitative model and then check if it runs properly on the BPU.

Users can obtain the on-board accuracy of the model by using the validation_hbir script. The usage is the same as described in the previous section.