This tutorial focuses on showing how to train a Motr model from scratch on the lane line dataset MOT17 using HAT, including floating-point, quantitative, and fixed-point models.
MOT17 is one of the most commonly used datasets for multiple object track, and many advanced multiple object track studies are preferentially based on this dataset for good validation.
Prior to the model training, we need to prepare the dataset first. Here we download the official dataset and the corresponding labeled data MOT17DATASET .
The structure of the data directory after unzipping is as follows:
where MOT17-02-DPM is the name of video, det.txt is the result of detector, like the MOT17-02-DPM is the result of DPM, the gt.txt under of gt is the labels, and all images under of img1.
If you just want to simply train the Motr model, then you can read this section first.
Similar to other tasks, HAT performs all training tasks and evaluation tasks in the form of tools + config.
After preparing the original dataset, take the following process to complete the whole training process.
Since the official test set does not have GT, we split the training set into half, with the first half of each video frame as the training set and the last half as the validation set.
HAT provides a script to split the training set, just run the script below:
After running the above script, a folder similar to the following structure will be generated:
In order to improve the training speed, we made a package of the original dataset and converted it into a dataset in the format of LMDB. Just run the script below:
The above two commands correspond to the transformation training dataset and the validation dataset respectively.
To measure accuracy, we need to use the label data to verify the dataset, so we make a soft join, as follows:
after packaging and soft join, the file structure in the ${target-data-dir} directory should be as follows:
train_lmdb and test_lmdb are packaged training datasets and validation datasets, test_gt contains the label data of the validation set, and then you can start training the model.
Since the qim module input in the Motr model relies on some post-processing, we split the entire Motr model into two configs.
Except for the config that generates the fixed-point model, compiles, model checker, and the model calculation amount needs to use qim , the rest use config of motr , and the detailed usage can be found in the following section.
In the following description, the first model is the base module of Motr , the second model is the qim module, and if not specifically specified, it is a model in which two modules are concatenated.
Before the network starts training, you can use the following command to calculate the amount of computation and the number of parameters for the network:
The next step is to start training. Training can also be done through the following script, and you need to confirm whether the dataset path in the configuration has been switched to the packaged dataset path before training.
Since the HAT algorithm package uses the registration mechanism, it allows each training task to be started in the form of train.py plus a config file.
The train.py is a uniform training script and independent of the task, and the tasks we need to train, the datasets we need to use, and the hyperparameter settings related to training are all in the specified config file.
The parameters after --stage in the above command can be "float", "calibration", "qat", which, respectively, indicates the training of the floating-point model and the quantitative model,
and the conversion of the quantitative model to the fixed-point model, where the training of the quantitative model depends on the floating-point model produced by the previous floating-point training.
For this model, the conversion of quantization model to fixed-point model requires the use of config of the qim module.
Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:
After completing the training, we get the trained floating-point, quantitative, or fixed-point model.
Similar to the training method, we can use the same method to complete metrics validation on the trained model and get the metrics of Float, Calibration , QAT, and Quantized, which are floating-point, quantitative, and fully fixed-point metrics, respectively.
Similar to the model training, we can use --stage followed by "float", "calibration", or "qat" to validate the trained floating-point model, quantitative model, respectively.
The following command can be used to verify the accuracy of a fixed-point model, but it should be noted that hbir must be exported first:
HAT provides the infer_hbir.py script to visualize the inference results for the fixed-point model:
In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:
As the quantitative training toolchain integrated in HAT is mainly prepared for Horizon's processors, it is a must to check and compile the quantitative models.
We provide an interface for model checking in HAT, which allows the user to define a quantitative model and then check whether it can work properly on the BPU first.
After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into an HBM file that supports on-board running.
The tool can also predict the performance on the BPU.
The above is the whole process from data preparation to the generation of quantitative and deployable models.
In this note, we explain some things that need to be considered for model training, mainly including settings related to config.
The network structure of Motr can be found in the Paper, which is not described in detail here.
We can easily define and modify the model by defining a dict type variable like model in the config file.
In addition to backbone , the model also has head , criterion , post_process and track_embed modules.
In Motr , backbone is mainly to extract the features of the image, head is mainly to get the predicted category, location, and feature from the features.
criterion is the module that computes loss at training time, post_process is mainly the post-processing part, and track_embed is the module used to update the target query on the tracked (i.e. the qim module).
Like the definition of model, the data enhancement process is implemented by defining data_loader and val_data_loader in the config file, which correspond to the processing of the training sets and verification sets, respectively.
Taking data_loader as an example, the data enhancement uses SeqRandomFlip, RandomSelectOne, SeqResize, SeqRandomSizeCrop, SeqToTensor, and SeqNormalize to increase the diversity of training data and enhance the generalization ability of the model.
Since the final model running on BPU uses a YUV444 image input, and the general training image input is in the RGB format, HAT provides SeqBgrToYuv444 data enhancement to convert RGB to YUV444.
In which loss_collector is a function that gets the loss of the current batch data.
The data transformation of the validation set is relatively simpler, as follows:
Training a floating-point model on the Mot17 dataset using the Stepdecay learning strategy, and impose L2 norm on the weight parameter.
The float_trainer, calibration_trainer, qat_trainer, and int_trainer in the configs/track_pred/motr_efficientnetb3_mot17.py file correspond to the training strategies for floating-point, quantitative, and fixed-point models, respectively.
The following is an example of float_trainer training strategy::
For key steps in quantitative training, such as preparing the floating-point model, operator substitution, inserting quantization and inverse quantization nodes, setting quantitative parameters, and operator fusion, please read the Quantized Awareness Training (QAT) section. Here we focus on how to define and use the quantization models in multiple object track of HAT.
If the model is ready, and some existing modules are quantized, HAT uses the following script in the training script to map the floating-point model to the fixed-point model uniformly.
The overall strategy of quantitative training can directly follow the strategy of floating-point training, but the learning rate and training length need to be adjusted appropriately.
Due to the existence of the floating-point pre-training model, the learning rate Lr for quantitative training can be very small. Generally, you can start from 0.001 or 0.0001, and you can do one or two times Lr adjustments of scale=0.1 with StepLrUpdater without prolonging the training time.
In addition, weight decay will also have some effect on the training results.
The quantitative training strategy for the Motr example model can be found in the configs/track_pred/motr_efficientnetb3_mot17.py file.