Model Segmented Deployment

Scenario

In some scenarios, there may be a need to split the model trained as a whole into multiple segments for on-board deployment. For example, for the two-stage detection model in the below picture, if the DPP needs to be executed on the CPU, and the output (roi) of the DPP needs to be the input of RoiAlign, then the model needs to be split into Stage1 and Stage2 according to the annotation of the dotted box and compiled separately for on-board deployment. When running on the board, the fixed-point data output from the backbone will be directly used as the input to RoiAlign.

segmented_deploy

Usage

segmented_deploy_method

  1. Model modification: As shown in the picture above, based on a model which can normally be trained in quantized awareness, you need to insert the QuantStub after the cutoff point of the model segments before prepare_qat. Note that if horizon_plugin_pytorch.quantization.QuantStub is used, scale = None must be set.

  2. QAT training: train the modified model on quantized awareness as a whole, the inserted QuantStub will record the scale of the input data of the Stage2 model into the buffer.

  3. Fixed-point model conversion: convert the whole trained QAT model to a fixed-point model using the convert interface.

  4. Segmentation and compilation: segment the model according to the form after deploying on the board, then export and compile the segmented model respectively. Note that although the input of Stage2 is quantized data during training, the example_input of Stage2 when exporting still needs to be the floating-point form, the inserted QuantStub in Stage2 will configure the correct scale for the data and quantize it.

On This Page