The principle of video coding is the elimination of redundant information and compression of video signals to reduce the amount of data and facilitate storage and transmission. The implementation of video coding usually relies on specific video coding standards, such as H.264, H.265, and so on. These standards specify specific coding algorithms and parameters for efficient video compression and transmission.

Taking the H.265 coding protocol as an example, video coding involves the following key steps:
H.265 first divides the video into a number of sequences, and a sequence is divided into a number of Groups of Picture (GOP), each GOP represents a set of consecutive video frames.
H.265 uses spatial prediction to remove redundant information between image blocks. As shown in the figure below, from a spatial point of view, the difference in pixel values between pixel points inside a single video frame is very small. From a temporal perspective, there are also many identical pixel points between two consecutive video frames.
| Spatial Sampling Schematic Image | Time Sampling Schematic Image |
|---|---|
![]() | ![]() |
Predictive coding is a method of data compression based on the statistical properties of an image that utilizes the temporal and spatial correlation of an image to predict the pixels that are currently being coded from the pixel data that has been reconstructed.
Intra-frame prediction means that both the pixels used for prediction and the pixels currently being encoded are within the same video frame and are generally in neighboring regions. Due to the strong correlation between neighboring pixels, the pixel values are generally very close to each other, and the probability of a mutation is very small, with the differences all being zero or very small numbers. Therefore, what is transmitted after intra-frame prediction coding is the difference between the predicted value and the true value, i.e., the value around 0, which is called the prediction error or residual. This enables compression to be achieved with fewer bits transmitted.
H.265 intra-frame predictive coding is done on a block basis, using the rebuilt values of neighboring blocks that have been rebuilt for the block being coded. The prediction component is divided into two, luminance and chrominance, and the corresponding prediction blocks are luminance prediction block and chrominance prediction block, respectively. In order to adapt to the content characteristics of HD video and improve the prediction accuracy, H.265 adopts richer prediction block sizes and prediction modes.
Inter-frame prediction means that the pixels used for prediction and the pixels currently being encoded are not in the same video frame, but are generally in adjacent or nearby locations. In general, inter-frame predictive coding provides better compression than intra-frame prediction, mainly because of the very strong correlation between video frames. If the rate of change of moving objects in a video frame is slow, then the pixel differences between video frames are small and the temporal redundancy is very large.
The method of inter-frame prediction to evaluate the motion condition of a moving object is motion estimation, and its main idea is to search for a matching block for the prediction block from a given range of the reference frame, and calculate the relative displacement between the matching block and the prediction block, and this relative displacement is the motion vector. After obtaining the motion vector, the prediction needs to be corrected, also known as motion compensation. Input the motion vectors into the motion compensation module and "compensate" the reference frame to get the prediction frame of the current encoded frame. The difference between the predicted frame and the current frame is the inter-frame prediction error.
If the inter-frame prediction only uses the previous frame image, it is called forward inter-frame prediction or unidirectional prediction. This predicted frame is also known as a P-frame, and the P-frame can refer to the previous I-frame or P-frame.
Inter-frame prediction is bidirectional if it uses not only the previous frame image to predict the current block, but also the subsequent frame image. This prediction frame is also known as the B-frame, which can refer to the preceding I-frame or P-frame and the following P-frame. Since P-frames need to refer to previous I-frames or P-frames, and B-frames need to refer to previous I-frames or P-frames and later P-frames, if in a video stream, the B-frame arrives first, and the dependent I-frames and P-frames haven't arrived yet, then the B-frame can't be decoded immediately, so what should be done to ensure the playback order?
Actually, PTS and DTS are generated during video encoding. Typically, after generating an I-frame, the encoder skips backward a few frames and encodes a P-frame using the preceding I-frame as a reference frame, and the frame between the I-frame and the P-frame is encoded as a B-frame. The video frame order of the push stream is already coded at the time of encoding in the dependent order of I-frames, P-frames, and B-frames, and the data is directly decoded upon receipt. Therefore, it is not possible to receive B-frames first and then dependent I-frames and P-frames.
| Schematic Image |
|---|
![]() |
In H.265, transform and quantization are used to further compress the data. By transforming the prediction error, the data can be converted from the time domain to the frequency domain for better removal of data redundancy. The transformed data is then quantized to map the data to a lower precision, thus further compressing the data. The process can be referred to the JPEG coding process.
In the final step of the encoding process, H.265 uses entropy coding to losslessly compress the data. The main purpose of entropy coding is to minimize the redundancy of the coded data in order to improve the data compression efficiency. The process can be referred to JPEG coding process.
Video decoding refers to the process of converting compressed video data back to the original video format. The process of video decoding is mainly divided into entropy decoding, inverse quantization, inverse transformation, motion compensation and deconvolution, and post-processing, each of which is designed to recover as close as possible to the original video picture from the highly compressed data. Since the goal of video encoding is to minimize the file size, the decoding process must perform each step of the encoding process precisely in reverse to recover the video content:
Entropy decoding is the process of converting compressed data back into a more manageable video format. In the process of video encoding, entropy coding techniques such as Hoffman coding or arithmetic coding are usually used to reduce the amount of data. And the purpose of entropy decoding is to recover the symbols used in the video coding process to prepare for the next step of inverse quantization.
Inverse quantization is the process of flipping the quantization (the step in the encoding process that reduces the precision of the data in order to achieve space savings) in order to restore the precision of the original data, a step that is critical to restoring image quality.
Inverse tranform is the process of reversing the changes used in coding (e.g., the Discrete Cosine Transform, DCT) in order to restore the data from the transform domain (e.g., the frequency domain) to the spatial domain (i.e., the original image), and this step is a key step in image reconstruction.
Run compensation means that, for predicted frames (i.e., frames generated based on the previous/next frame), the complete frame needs to be reconstructed using the run vector data. Deconvolution, on the other hand, is the process of removing block artifacts generated during the coding process. These steps are essential to restore smooth and consistent video playback.
Post-processing, as the final step, involves some techniques to enhance video quality such as denoising, sharpening, etc. This step is optional, and whether or not it is carried out depends on the requirements of video playback and whether or not the hardware capability can support it.
For detailed interface information, please refer to hbVPVideoEncode and hbVPVideoDecode.
The encoder supports bitrate control for H.264 and H.265 protocols, with five control modes for H.264 and H.265 encoding channels: CBR, VBR, AVBR, FixQp, and QpMap. CBR ensures the stability of the overall encoding bitrate; VBR ensures the stability of the encoded image quality; AVBR balances bitrate and image quality, providing relatively stable bitrate and image quality; FixQp fixes the QP value for each I frame and P frame; QpMap assigns a QP value to each block in a frame, with block sizes of 32x32 for H.265 and 16x16 for H.264. The following introduces bitrate parameters using the H.265 protocol as an example.
The CBR mode ensures the overall encoding bit rate remains stable. Below are the meanings of the parameters in CBR mode:
| Parameter | Description | Range of values | Default value |
|---|---|---|---|
intraPeriod | I frame interval | [0, 2047] | 28 |
intraQp | A quantization parameter of intra picture | [0, 51] | 30 |
bitRate | The target average bitrate of the encoded data in kbps | [0, 700000] | 1000 |
frameRate | The target frame rate of the encoded data in fps | [1, 240] | 30 |
initialRcQp | Specifies the initial QP by user. If this value is smaller than 0 or larger than 51, the initial QP is decided by F/W | [0, 63] | 63 |
vbvBufferSize | Specifies the size of the VBV buffer in msec (10 ~ 3000). This value is valid when RateControl is 1. VBV buffer size in bits is bit_rate * vbv_buffer_size / 1000. | [10, 3000] | 10 |
ctuLevelRcEnable | The rate control can work in frame level and MB level. | [0, 1] | 0 |
minQpI | A minimum QP of I picture for rate control | [0, 51] | 8 |
maxQpI | A maximum QP of I picture for rate control | [0, 51] | 8 |
minQpP | A minimum QP of P picture for rate control | [0, 51] | 8 |
maxQpP | A maximum QP of P picture for rate control | [0, 51] | 8 |
minQpB | A minimum QP of B picture for rate control | [0, 51] | 8 |
maxQpB | A maximum QP of B picture for rate control | [0, 51] | 8 |
hvsQpEnable | Enables or disables CU QP derivation based on CU variance. It can enable CU QP adjustment for subjective quality enhancement | [0, 1] | 1 |
hvsQpScale | A QP scaling factor for subCTU QP adjustment when hvs_qp_enable is 1 | [0, 4] | 2 |
hvsMaxDeltaQp | Specifies maximum delta QP of HVS QP. (0 ~ 12) This value is valid when hvs_qp_enable is 1 | [0, 12] | 10 |
qpMapEnable | Enables or disables QP map | [0, 1] | 0 |
In VBR mode, larger QP values are assigned to simple scenes to achieve higher compression and image quality, while smaller QP values are assigned to complex scenes to ensure the stability of the encoded image quality. Below are the meanings of the parameters in VBR mode:
| Parameter | Description | Range of values | Default value |
|---|---|---|---|
intraPeriod | I frame interval | [0, 2047] | 28 |
intraQp | A quantization parameter of intra picture | [0, 51] | 30 |
frameRate | The target frame rate of the encoded data in fps | [1, 240] | 30 |
qpMapEnable | Enables or disables QP map | [0, 1] | 0 |
In AVBR mode, lower bit rates are allocated to simple scenes and sufficient bit rates are allocated to complex scenes, allowing efficient bit rate distribution across different scenes, similar to VBR. Meanwhile, over a certain period, the average bit rate approximates the set target bit rate, controlling the output file size, similar to CBR. Hence, AVBR can be considered a compromise between CBR and VBR, producing a bitstream with relatively stable bit rate and image quality. Below are the meanings of the parameters in AVBR mode:
| Parameter | Description | Range of values | Default value |
|---|---|---|---|
intraPeriod | I frame interval | [0, 2047] | 28 |
intraQp | A quantization parameter of intra picture | [0, 51] | 30 |
bitRate | The target average bitrate of the encoded data in kbps | [0, 700000] | 1000 |
frameRate | The target frame rate of the encoded data in fps | [1, 240] | 30 |
initialRcQp | Specifies the initial QP by user. If this value is smaller than 0 or larger than 51, the initial QP is decided by F/W | [0, 63] | 63 |
vbvBufferSize | Specifies the size of the VBV buffer in msec (10 ~ 3000). This value is valid when RateControl is 1. VBV buffer size in bits is bit_rate * vbv_buffer_size / 1000. | [10, 3000] | 10 |
ctuLevelRcEnable | The rate control can work in frame level and MB level. | [0, 1] | 0 |
minQpI | A minimum QP of I picture for rate control | [0, 51] | 8 |
maxQpI | A maximum QP of I picture for rate control | [0, 51] | 8 |
minQpP | A minimum QP of P picture for rate control | [0, 51] | 8 |
maxQpP | A maximum QP of P picture for rate control | [0, 51] | 8 |
minQpB | A minimum QP of B picture for rate control | [0, 51] | 8 |
maxQpB | A maximum QP of B picture for rate control | [0, 51] | 8 |
hvsQpEnable | Enables or disables CU QP derivation based on CU variance. It can enable CU QP adjustment for subjective quality enhancement | [0, 1] | 1 |
hvsQpScale | A QP scaling factor for subCTU QP adjustment when hvs_qp_enable is 1 | [0, 4] | 2 |
hvsMaxDeltaQp | Specifies maximum delta QP of HVS QP. (0 ~ 12) This value is valid when hvs_qp_enable is 1 | [0, 12] | 10 |
qpMapEnable | Enables or disables QP map | [0, 1] | 0 |
FixQp indicates that the QP value for each I-frame and P-frame is fixed. Below are the meanings of the parameters in FixQp mode:
| Parameter | Description | Range of values | Default value |
|---|---|---|---|
intraPeriod | I frame interval | [0, 2047] | 28 |
frameRate | he target frame rate of the encoded data in fps | [1, 240] | 30 |
qpI | A force picture quantization parameter for I picture | [0, 51] | 0 |
qpP | A force picture quantization parameter for P picture | [0, 51] | 0 |
qpB | A force picture quantization parameter for B picture | [0, 51] | 0 |
QpMap specifies a QP value for each block within a frame, with block sizes of 32x32 in H.265. Below are the meanings of the parameters in QpMap mode:
| Parameter | Description | Range of values | Default value |
|---|---|---|---|
intraPeriod | I frame interval | [0, 2047] | 28 |
frameRate | The target frame rate of the encoded data in fps | [1, 240] | 30 |
qpMapArray | Specify the qp map. The QP map array should be written a series of 1 byte QP values for each subCTU in raster scan order. The subCTU block size is 32x32 | 指针地址 | nullptr |
qpMapArrayCount | Specify the qp map number. It's related with the picture width and height | (ALIGN64(picWidth)>>5)*(ALIGN64(picHeight)>>5) | 0 |
H.264 and H.265 encoding support the configuration of GOP structures, allowing users to choose from preset GOP structures. Below are the descriptions of the preset GOP structures:
| GopPresetIdx | GOP Structure | Low Delay | GOP Size | Encoding Order | Description |
|---|---|---|---|---|---|
1 | I | Yes | 1 | I0-I1-I2-I3,… | I-frames only, no cross-referencing |
2 | P | Yes | 1 | I-P0-P1-P2,… | Only I-frames and P-frames, and P-frames refer to 2 forward reference frames |
3 | B | Yes | 1 | I-B0-B1-B2,… | Only I-frames and B-frames, and B-frames refer to 2 forward reference frames |
6 | PPPP | Yes | 4 | I-P0-P1-P2-P3,… | Only I-frames and P-frames, and P-frames refer to 2 forward reference frames |
7 | BBBB | Yes | 4 | I-B0-B1-B2-B3,… | Only I-frames and B-frames, and B-frames refer to 2 forward reference frames |
9 | P | Yes | 1 | I-P0,… | Only I-frames and P-frames, and P-frames refer to 1 forward reference frames |