This tutorial focuses on how to train a state-of-art floating-point and fixed-point model on ImageNet using HAT.
ImageNet is the most often used dataset for image classification, and many state-of-the-art image classification studies are primarily based on this dataset for validation.
Although there are many ways to obtain a state-of-art classification model in the community or through other means, training a state-of-art classification model from scratch is still not a simple task.
Starting from dataset preparation, this tutorial will focus on how to train a state-of-art model on ImageNet, including floating-point, quantitative, and fixed-point models.
The ImageNet dataset can be downloaded from ImageNet official website.
The downloaded dataset will be in the following formats:
Here we use MobileNetV1 as an example to describe the whole classification process in detail.
If you just want to use the HAT interface for simple experiments, it is a good idea to read this section first.
HAT uses the tools + config format for all the training and evaluation tasks.
After preparing the raw dataset, we can easily complete the training process by taking the procedure below.
The first thing is dataset packaging. Comparing with raw datasets, packed datasets have obvious advantages in terms of processing speed.
Here we choose the LMDB packaging method, which has the same style as PyTorch.
Thanks to the flexibility of HAT in handling datasets, other forms of dataset packing and reading, such as MXRecord, are also supported independently.
Packaging scripts for common datasets such as cityscapes, imagenet, voc, and mscoco are provided in the tools/datasets directory.
For example, the imagenet_packer script can directly convert the original public ImageNet dataset to Numpy or Tensor format by using the default public dataset processing method provided by torchvision, and finally compress the result data into the LMDB file using the msgpack method.
The dataset packing process can be easily accomplished with the following script:
After the dataset is packed, you can get the LMDB dataset containing ImageNet and move on to the next stage - training.
Once you have prepared the packed dataset, you can start training the model simply by running the following command:
The HAT algorithm toolkit uses a registration mechanism that allows each training task to be started in the form of train.py plus a config file.
train.py is a uniform training script and independent of the task.
The task we need to train, the dataset we need to use, and the hyperparameters we need to set for the training are all in the specified config file.
The parameters after --stage in the above command can be "float" and "calibration", which, respectively, indicates the training of the
floating-point model and the quantitative model, where the training of the quantitative model depends on the floating-point model produced by the previous floating-point training.
The details are described in the preceding section about quantization.
After completing quantization training, you can proceed to export the fixed-point model. You can use the following command to perform the export:
After completing the training, we get the trained floating-point, quantitative, or fixed-point model.
Similar to the training method, we can use the same method to complete metrics validation on the trained model and get the metrics of Float, Calibration, and Quantized,
which are floating-point, quantitative, and fully fixed-point metrics, respectively, as described in Quantization details.
Similar to the model training, we can use --stage followed by "float", "calibration", to validate the trained floating-point model and quantitative model respectively.
The following command can be used to verify the accuracy of a fixed-point model, but it should be noted that hbir must be exported first:
HAT provides the infer_hbir.py script to visualize the inference results of the models trained at each phase:
In addition to the above model validation, we provide an accuracy validation method identical to the on-board environment, which can be accomplished by:
After the model is trained, you can use the compile_perf_hbir script to compile the quantitative model into a hbm file that supports on-board running.
This tool can also predict the model performance on the BPU.
For Horizon BPUs with different architectures, you can set march = March.NASH_E or march = March.NASH_M in configs/classification/mobilenetv1_imagenet.py.
The above is the whole process from data preparation to the generation of quantitative and deployable models.
We still use MobileNetV1 as an example to illustrate some things that need to be taken into account for model training, which mainly includes settings related to config.
Training a deep learning model on ImageNet with more than 1 million images is very resource intensive, and the main bottlenecks are matrix computation and data reading.
For matrix computation, it is highly recommended to use a high-performance GPU instead of CPU for the training, and using multiple GPUs at the same time can effectively reduce the training time.
For data reading, it is recommended to use a better CPU and SSD storage.
Multi-threaded CPU acceleration and better SSD storage can help a lot in data reading.
Note that the whole ImageNet will roughly take up 300G of storage, so the SSD storage should have at least 300G of storage space.
As you can find many implementations of MobileNetV1 in both HAT and other communities, here we skip the specific implementation methods of MobileNetV1.
In the config of HAT, we can build a floating-point MobileNetV1 classification model directly with the dict below.
Users can modify the model by directly modifying the configuration parameters in backbone.
In addition to backbone, the model also has a losses module.
In common classification models, we often use Cross-Entropy as the training loss,
but more and more experiments prove that adding Label-Smooth to the classification loss can help to improve the training results,
especially when combined with Cosine's lr update method.
After defining a model, especially for some public models, we usually have the need to check the FLOPs. HAT calculates the operations of the model by using the calops tool, which is implemented as below:
Such ops-counting tool can support both floating-point and fixed-point models.
There is an emerging consensus on the data enhancement for ImageNet training, and we use the data enhancement provided by torchvision as the basis to build the data enhancement for classification training,
including RandomResizedCrop, RandomHorizontalFlip, and ColorJitter.
Since the final model running on the BPU uses YUV444 as image input, while training image input is generally in the RGB format, HAT provides BgrToYuv444 data enhancement to convert RGB to YUV444.
To optimize the training process, some enhancement can be processed in batch_processor to optimize the training.
The corresponding batch_processor part:
The data conversion of the validation set is relatively simpler, and the main difference is the short edge Resize to 256 and CenterCrop.
The other color space transformations are the same as the training set.
The training strategies for training different classification models on ImageNet are roughly the same with minor differences.
Here we focus on the details that have improved effects.
The learning strategy of Cosine with Warmup has some boosting effect compared with the regular StepLr.
Appropriately extending the epoch training length also has a boost for small models.
In addition, applying L2 norm only to the parameters of weight is also a recommended training strategy.
The float_trainer, calibration_trainer, and int_trainer in the configs/classification/mobilenetv1_imagenet.py file correspond to the training strategies for floating-point, quantitative, and fixed-point models, respectively.
Next we use float_trainer as an example of training strategy:
For key stages in quantitative training, e.g., preparing the floating-point model, operator substitution, inserting quantization and inverse quantitative nodes, setting quantitative parameters, and operator fusion, etc., please read the Quantized Awareness Training (QAT) section.
Here we focus on how to define and use quantitative models in the HAT classification.
When the model is ready and some existing modules are quantized, HAT uses the following scripts in the training script to map the floating-point model to the fixed-point model.
As the strategies for quantitative training are not necessarily the same, here we briefly introduce some common strategies used in training classification models.
The overall strategy of quantitative training can directly follow the strategy of floating-point training, but the learning rate and training length need to be adjusted appropriately.
Because there is a floating-point pre-training model, the learning rate Lr of quantitative training can be rather small, usually starting from 0.001 or 0.0001,
and can perform Lr adjustments of scale=0.1 for 1 or 2 times with StepLrUpdater; meanwhile, the training length does not need to be long.
In addition, weight decay will also have some influence on the training results.
HAT already provides a rich set of pre-trained models on ImageNet, you can refer to modelzoo. All models are included in the release package.