Prepare Calibration Data

Note

If you need to do this process in the sample folder, you need to execute the 00_init.sh script in the folder first to get the corresponding original model and dataset.

When performing model calibration, 20~100 samples are required at the calibration stage, each is an independent data file. To ensure the accuracy of the calibrated models, these calibration samples better come from the training or validation dataset when training the models. In addition, please try NOT to use rare samples, e.g. single colored images or those images who don't contain any detection or classification targets in them.

You need to preprocess the samples from the training/verification sets (the preprocessing process is the same as the original floating-point model data preprocessing process), and the calibrated samples after processing will have the same data type (input_type_train), size (input_shape) and layout (input_layout _train) with the original floating-point model. You can save the data as an npy file with the numpy.save command, and the toolchain will read it based on the numpy.load command when it is calibrated. For example, there is an ImageNet trained original classification floating-point model with only one input node, it should be described as below:

  • Input type: BGR.
  • Input layout: NCHW.
  • Input size: 1x3x224x224.

The steps for data preprocessing of the original floating point model are as follows:

  1. Uniformly scale the image and resize the shorter side to 256.

  2. Get 224x224 image using the center_crop method.

  3. Align the input layout to the NCHW required by the model.

  4. Convert the color space to the BGR required by the model.

  5. Adjust the range of image values to [0, 255] as required by the model.

  6. Subtract mean value by the channel.

  7. Data multiple by the scale factor.

The sample processing code for the above example model is as follows (to avoid excessive code length, some simple transformer implementation codes are ignored, the usage of transformer can be found in Image Processing).

# this sample uses skimage, mind the differences when using opencv/PIL import skimage import skimage.io import numpy as np from horizon_tc_ui.data.transformer import (CenterCropTransformer, HWC2CHWTransformer, MeanTransformer, RGB2BGRTransformer, ScaleTransformer, ShortSideResizeTransformer) def data_transformer(): transformers = [ # uniformly scale the image and resize the shorter side to 256 ShortSideResizeTransformer(short_size=256), # get 224x224 image using the CenterCrop CenterCropTransformer(crop_size=224), # read the NHWC layout results using the skimage and convert into the model required NCHW layout HWC2CHWTransformer(), # read the RGB channel sequence results using the skimage and convert into the model required BGR RGB2BGRTransformer(), # read the value range between [0.0,1.0] using the skimage and adjust into the model required value range ScaleTransformer(scale_value=255), # for all pixels in the input image, subtract mean_value MeanTransformer(means=np.array([103.94, 116.78, 123.68])), # for all pixels in the input image, multiple by the data_scale factor ScaleTransformer(scale_value=0.017) ] return transformers # the src_image refers to the source images in sample dataset # the dst_file refers to the filename to save the final sample datasets def convert_image(src_image, dst_file, transformers): image = [skimage.img_as_float( skimage.io.imread(src_image)).astype(np.float32)] for trans in transformers: image = trans(image) # type of input_type_train BGR value specified by the model is UINT8 image = image[0].astype(np.uint8) # Save calibration samples into a data file in binary format np.save(dst_file, image) if __name__ == '__main__': # refer to the original sample images, fake-code src_images = ['ILSVRC2012_val_00000001.JPEG', ...] # denote the filename (no restrictions on suffix) of the final samples, fake-code # calibration_data_bgr refers to your specified cal_data_dir in the configuration file dst_files = ['./calibration_data_bgr/ILSVRC2012_val_00000001.npy', ...] transformers = data_transformer() for src_image, dst_file in zip(src_images, dst_files): convert_image(src_image, dst_file, transformers)
Attention

Note that the input_shape parameter in the yaml file serves to specify the input data size of the original floating-point model. If it is a dynamic input model, you can use this parameter to set the converted input size, and the shape size of the calibration data should be consistent with input_shape.

For example, if the original floating-point model input node shape is ?x3x224x224 ("?" sign represents the placeholder, i.e., the first dimension of the model is dynamic input), and the input_shape: 8x3x224x224 is set in the conversion profile, then the size of each calibration data that you need to prepare is 8x3x224x224 (Please be aware that the input_batch parameter does not support modifying the model batch information for models with the first dimension of the input shape not equal to 1).