Prepare is the process of converting a floating-point model into a pseudo-quantized model. This process involves several key steps:
Operator Replacement: Some torch function operators (such as F.interpolate) need to have FakeQuantize nodes inserted during quantization. Therefore, these operators are replaced with corresponding Module type implementations (horizon_plugin_pytorch.nn.Interpolate) to place the FakeQuantize nodes inside this Module. The model before and after replacement is equivalent.
Operator Fusion: BPU supports fusing specific computational patterns, where the intermediate results of fused operators are represented with high precision. Therefore, we replace multiple operators to be fused with a single Module to prevent quantizing the intermediate results. The model before and after fusion is also equivalent.
Operator Conversion: Floating-point operators are replaced with QAT (Quantization-Aware Training) operators. According to the configured qconfig, QAT operators will add FakeQuantize nodes at the input/output/weights.
Model Structure Check: The QAT model is checked, and a check result file is generated.
Due to historical reasons, there are two early interfaces, prepare_qat and prepare_qat_fx, in the Plugin. These will gradually be deprecated. We only recommend using the prepare interface introduced in this document.
The usage of the prepare interface is as follows:
There are four prepare methods, compared as follows:
| method | Principle | Advantages | Disadvantages |
|---|---|---|---|
| PrepareMethod.JIT & PrepareMethod.JIT_STRIP | Use hooks and subclass tensor to get the graph structure, performing operator replacement/operator fusion on the original forward. | Fully automatic, minimal code modification, hides many detail issues, easy to debug. | Dynamic code blocks need special handling. |
| PrepareMethod.SYMBOLIC | Use symbolic trace to get the graph structure, performing operator replacement/operator fusion on the recompiled new forward. | Fully automatic, hides many detail issues. | Does not support dynamic control flow, some data types and Python operations, less convenient for debugging. |
| PrepareMethod.EAGER | Does not sense the graph structure. operator replacement/operator fusion needs to be done manually. | Flexible usage, controllable process, easy to debug and handle various special needs. | Requires more manual operations, more code modifications, high learning cost. |
Currently, JIT and JIT_STRIP are our recommended methods. The difference between them is that JIT_STRIP will identify and skip pre-process and post-process based on the positions of QuantStub and DequantStub in the model. Therefore, if there are pre-process and post-process steps in the model that do not need to be quantized, use JIT_STRIP. Otherwise, they will be quantized. Apart from this difference, they are completely identical. SYMBOLIC and EAGER are earlier solutions with many usability issues. We do not recommend using these two methods.
When example_inputs is provided, prepare will perform a model structure check by default. If the check completes, a model_check_result.txt file can be found in the running directory. If the check fails, you need to modify the model based on the warning prompts or call horizon_plugin_pytorch.utils.check_model.check_qat_model separately to check the model. The check process is the same as check_qat_model in the debug tool, and the analysis of the result file is detailed in the check_qat_model related documentation.