Key Concepts

This section provides you with some concepts that may appear frequently within the following as well as some commonly used background knowledge.

  • Original floating-point model

    Available models obtained from the DL framework training like TensorFlow, PyTorch, etc. This model is computed with a precision of float32.

  • Hybrid Heterogeneous Model

    A model format suitable for running on the Horizon computing platform. It is called heterogeneous model because it can support model execution on both ARM CPU and BPU. Since the operation speed on the BPU will be much faster than that on the CPU, the operators will be computed on the BPU as much as possible. For operators that are not supported on the BPU at the moment, they will be computed on the CPU.

  • Operator

    Deep learning algorithm are composed of computational units, we call these computational units as the Operator (also known as op). The operator is a mapping from a function space onto a function space, the name of the operator is unique in the same model, but more than one operator of the same type can exist. For example, Conv1, Conv2, are two different operators with the same operator type.

  • Model conversion

    Process of converting the original floating-point model or the ONNX model converted by QAT into a Horizon hybrid heterogeneous model.

  • Model quantization

    Currently one of the most effective model optimization methods in industry. Quantization is to establish data mapping relationships between fixed-point data and floating-point data to achieve inference performance gains with little precision loss, which can be simply understood as using "low-bit" numbers to represent FP32 or other types of values, e.g., FP32 --> INT8 can achieve 4 times parameter compression, and faster calculations can be achived while memory usage is reduced.

    • The Quantize node is used to quantize the input data of the model from the [float] type to [int8] type, which uses the following formula:

      qx = clamp(round(x / scale) + zero_point, -128, 127)
      • round(x) rounds the floating point number.
      • clamp(x) clamps the data to an integer value between -128 and 127.
      • scale is the quantized scale factor.
      • zero_point is the asymmetric quantization zero-point offset value. When in symmetric quantization, zero_point = 0.

      The C++ reference implementation is as follows:

      static inline float32_t _round(float32_t const input) { std::fesetround(FE_TONEAREST); float32_t const result{std::nearbyintf(input)}; return result; } static inline int8_t int_quantize(float32_t value, float32_t const scale) { value = _round(value / scale); value = std::min(std::max(value, -128.0f), 127.0f); return static_cast<int8_t>(value); }
    • The Dequantize node is used to dequantize output data of the model from the int8 or int32 type back to float or double type with the following formula:

      deqx = (x - zero_point) * scale

      The C++ reference implementation is as follows.

      static_cast<float>(value) * scale
  • PTQ

    PTQ conversion scheme, a quantization method that first trains a floating-point model and then uses a calibration image to calculate quantization parameters to convert the floating-point model into a quantized model. For more details, refer to the PTQ and QAT Introduction section.

  • QAT

    QAT (quantization-aware training) scheme, which intervenes in the floating-point model structure during the floating-point training to enable the model to perceive the loss from quantization and reduce the quantization loss accuracy. For more details, refer to the PTQ and QAT Introduction section.

  • Tensor

    The Tensor is a multidimensional array with a uniform data type, as a container for the data computed by the operator, it contains the input and output data. The carrier of tensor specific information, contains the name, shape, data layout, data type, etc. of the tensor data.

  • Data layout

    In the deep learning, multidimensional data is stored through the multidimensional array (tensor), and the generic neural network featuremaps are usually stored using the four-dimensional array (i.e., 4D) format, i.e., the following four dimensions:

    • N: The number of Batch, e.g. the number of images.
    • H: Height, the height of the image。
    • W: Width, the weight of the image。
    • C: Channel, the number of channels of the image.

    However, the data can only be stored linearly, so the four dimensions have a corresponding order, and different layout formats of the data will affect the computational performance. The common data storage formats are NCHW and NHWC:

    • NCHW: It stores all the pixel values of the same channel in order.
    • NHWC: It stores the pixel values of the same position of different channels in order.

    As shown below:

    data_format
  • Data type

    The image data types commonly used below include rgb, bgr, gray, yuv444, nv12, and featuremap.

    • The rgb, bgr, and gray are commonly used image format. Note that each value is represented using UINT8.
    • The yuv444 is also a popular image format. Note that each value is represented using UINT8.
    • The NV12 is a popular YUV420 image format. note that each value is represented using UINT8.
    • The Featuremap is suitable for cases where the above listed formats failed to meet your needs, and this type uses float32 for each value. For example, this format is commonly used for model processing such as radar and speech.

For more information about abbreviations in the documents, please refer to the section Common Abbreviations.