hbDNNQuantiScale

typedef struct { int32_t scaleLen; float *scaleData; int32_t zeroPointLen; int32_t *zeroPointData; } hbDNNQuantiScale;

Quantization/Dequantization scale data.

Input: If the floating point data data is collected, the corresponding scale data is scale, and the corresponding zero-point offset data is zeroPoint, then the inference data sent to the model is g((data/scale)+zeroPoint)g((data / scale) + zeroPoint), in which, g(x)=clip(round(x))g(x) = clip(round(x)), clip is a truncation function, for example: U8: g(x)[0,255]g(x)∈[0, 255], S8: g(x)[128,127]g(x)∈[-128, 127].

Output: If the corresponding scale data of the inference result data is scale, and the corresponding zero-point offset data is zeroPoint, then the final inference result is (datazeroPoint)scale(data - zeroPoint) * scale.

The scaleLen is determined by data according to per-axis or per-tensor quantization/dequantization. When the data is quantized/dequantized by per-tensor, the scaleLen is equal to 1, ignoring quantizeAxis. Otherwise the quantizeAxis represents the dimension index of the data quantization axis, and the scaleLen is equal to the size of the quantizeAxis dimension of the data. The value of zeroPointLen is equal to scaleLen.

  • Member
Member NameDescription
scaleLenLength of scale data.
scaleDataBase address of scale data.
zeropointLenLength of zero-point offset data.
zeropointDataBase address of zero-point offset data.