As a sample of the HAT segmentation task, this tutorial demonstrates how to train a state-of-the-art floating-point and fixed-point model on the Cityscapes dataset using HAT.
Cityscapes is an image dataset of urban driving scenes, containing 5000 images with pixel-level annotations, and the objects in the images are divided into 19 categories.
The segmentation task is relatively complex and requires a high level of model capability, and it is not easy to achieve good metrics on the segmentation task using small models.
This tutorial will elaborate on how to train a state-of-the-art segmentation model on the Cityscapes dataset using HAT and conduct quantitative training on top of the floating-point model to finally get a fixed-point model from scratch.
To download the Cityscapes dataset, you first need to register an account on the official website.
After that, you can download the dataset files from the Download page.
Here we only need two files: gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip.
Additionally, the official Cityscapes dataset provides a script for downloading and processing the data, see Github.
First install the official tools using the following command:
Then use the official tool to log into the account registered above and download the required dataset files.
Finally, unpack the downloaded files (optional):
To improve data reading efficiency, we recommend pre-packing the dataset into the LMDB format.
HAT provides the cityscapes_packer.py script for you to easily convert the dataset from the original public format to numpy.ndarray or torch.Tensor.
Use msgpack to wrap the data and eventually pack it into LMDB files.
The command to pack the dataset is as follows:
The generated LMDB files are saved under ${data-dir}/train_lmdb and ${data-dir}/val_lmdb.
After packing the dataset into LMDB files, you can start training the model.
HAT provides the training script train.py to facilitate model training together with config files.
Before starting the training, please make sure that the dataset path (data_rootdir) in the configuration file unet_mobilenetv1_cityscapes.py is set to the LMDB file path.
The commands for model training are as follows:
The above two commands are respectively for the training of the floating-point model and the fixed-point model. The fixed-point model training needs to be based on the trained fixed-point model. For details on this, please read the Quantized Awareness Training (QAT) section.
After training, the script will automatically validate the metrics of the trained model.
In addition, HAT provides a script to validate the metrics of the trained floating-point and fixed-point models, which is also configured by using the config file.
In addition to the above model validation, we also provide the exact same accuracy validation method simulating the on-board conditions, as below:
HAT provides the infer.py script to visualize the inference results of the trained models at each phase:
The quantization algorithms used in the HAT integrated quantitative training framework are specifically designed for Horizon processors, so fixed-point models trained with HAT can be compiled using the tools provided by HBDK to generate fixed-point models that can run on the computing platform.
We provided the script compile_perf.py to facilitate the compilation process.
Thus, we have got a segmentation task fixed-point model that can run on the Horizon processors from scratch.
Once you've completed your quantization training, you can start exporting your fixed-point model. You can export it with the following command:
After completing the training, we get the trained floating-point, quantitative, or fixed-point model. Similar to the training method, we can use the same method to complete metrics validation on the trained model and get the metrics of Float, Calibration, and Quantized,
which are floating-point, quantitative, and fully fixed-point metrics, respectively.
Similar to the model training, we can use --stage followed by "float", "calibration", to validate the trained floating-point model, and quantitative model, respectively.
The following command can be used to verify the accuracy of a fixed-point model, but it should be noted that hbir must be exported first:
HAT provides the infer_hbir.py script to visualize the inference results for the fixed-point model:
In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:
As the quantitative training toolchain integrated in HAT is mainly prepared for Horizon's processors, it is a must to check and compile the quantitative models.
We provide an interface for model checking in HAT, which allows the user to define a quantitative model and then check whether it can work properly on the BPU first.
After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into an HBM file that supports on-board running.
The tool can also predict the performance on the BPU.
The above is the whole process from data preparation to the generation of quantitative and deployable models.
The segmentation model mainly consists of backbone, neck, and head.
Here we use MobileNetV1_0.25 as the backbone, which is a lightweight and efficient network structure.
The neck uses the Unet structure, which can combine the featuremap at each scale and preserve the fine spatial information.
The head is a convolutional layer responsible for the output of the final segmentation results.
We use FocalLoss as the loss function. FocalLoss can be regarded as a cross-entropy loss function with dynamic weights, which can better solve the problem of training difficulties caused by category imbalance.
The hierarchical structure of Unet is consistent with the idea of FPN and can be treated with the same training method, that is, constructing an output on each scale of Unet, and construct a loss function with this output and the ground truth with corresponding size for training,
which supervises each network scale, provides richer reference information for network training, reduces the training difficulty, and improves the training speed and the final accuracy.
Meanwhile, considering that the final result we need is the max-sized network output (scale=4), to avoid the over large gradient of other sizes affecting the accuracy of the max-sized output, we add weights to the loss function according to scale.
A layer with larger scale has smaller weight.
Usually, after defining a model, especially for some public models, we would have the need to check the FLOPs.
HAT provides calops.py to calculate the operations of the model, which is implemented as below:
Such ops-counting tool can support both floating-point and fixed-point models at the same time.
First, we use LabelRemap to re-map the data labels to the interval [0, 18].
For the training set, SegRandomAffine can perform a random affine transformation on the image for data enhancement, and we only configure a random scaling without any rotation operation.
Since the training uses an FPN-like approach, we need to scale the labels to different sizes to train the models with different scales.
The final model running on the BPU uses the YUV444 format as the image input, while regular trainings use the RGB format.
Therefore, HAT provides ImageBgrToYuv444 data enhancement to convert RGB data to the YUV444 format.
Finally, normalization is necessary for deep learning model training.
Note that here we use MultiBatchProcessor for the task training. This Processor supports data pre-processing on the GPU in batches.
Since the data preprocessing of the segmentation task is relatively complex, bottlenecks will occur if we use CPU for the processing.
However, using GPU will increase the video memory usage, resulting in a lower maximum batch size. That said, the final training speed is considerably improved.
Compared with the training set, the data preprocessing of the validation set does not require random affine transformation and multi-scale scaling. The other stages are the same and will not be repeated.
The segmentation task can be regarded as a pixel-level classification task, so the training strategy is highly similar to that of the classification task.
First, the training speed can be increased by increasing the learning rate as much as possible while ensuring convergence.
When training with a certain learning rate until the accuracy no longer grows, you can reduce the learning rate appropriately, the model will continue to converge, and the accuracy can be further improved.
After the final training, compare the accuracy of the test set with the training set, and if the accuracy of the training set is too much higher than the test set, the model is considered to be overfitted.
In this case, increasing the weight decay can enhance the generalization ability of the model, reduce the overfitting, and obtain a higher accuracy of the test set.
The purpose of quantitative training is to simulate the process of fix-point computation by performing simulated data quantization based on the trained floating-point model to minimize the accuracy loss during the quantitative to fixed-point model conversion.
Since the floating-point model has converged to a good state after sufficient training, the model usually needs only a little fine-tuning in the quantitative training.
The learning rate should not be set too large. You can start trying from 1e-4 order of magnitude, leaving other parameters the same as the floating-point training.
Similarly, as the model fine-tuning requires less amount of training, usually a few dozen training epochs is enough.
Since the model already has a good accuracy before quantitative training, room for the accuracy improvement is small. The simulated quantization of the data will also lead to large fluctuations in the training process. At this point, we need to patiently and carefully observe the trend from the fluctuations and adjust the parameters appropriately to get the best results.
For HAT, the significance of the quantitative model is that it can be run directly on the BPU. Therefore, it is necessary to check and compile the quantitative model.
The compile_perf.py script provided by HAT first checks the model to ensure that it can run properly on the BPU.
Then it compiles the model to make it runnable on the BPU. After that, the script will also test the compiled fixed-point model to predict its performance on the BPU.
Run the command as below:
HAT has provided pre-trained models for this example, which are all included in the release package.