Video Codec

Principle

The principle of video coding is the elimination of redundant information and compression of video signals to reduce the amount of data and facilitate storage and transmission. The implementation of video coding usually relies on specific video coding standards, such as H.264, H.265, and so on. These standards specify specific coding algorithms and parameters for efficient video compression and transmission.

image

Video Encoding Principle

Taking the H.265 coding protocol as an example, video coding involves the following key steps:

  1. Segmentation

H.265 first divides the video into a number of sequences, and a sequence is divided into a number of Groups of Picture (GOP), each GOP represents a set of consecutive video frames.

  1. Spatial prediction

H.265 uses spatial prediction to remove redundant information between image blocks. As shown in the figure below, from a spatial point of view, the difference in pixel values between pixel points inside a single video frame is very small. From a temporal perspective, there are also many identical pixel points between two consecutive video frames.

Spatial Sampling Schematic ImageTime Sampling Schematic Image
imageimage

Predictive coding is a method of data compression based on the statistical properties of an image that utilizes the temporal and spatial correlation of an image to predict the pixels that are currently being coded from the pixel data that has been reconstructed.

Intra-frame prediction means that both the pixels used for prediction and the pixels currently being encoded are within the same video frame and are generally in neighboring regions. Due to the strong correlation between neighboring pixels, the pixel values are generally very close to each other, and the probability of a mutation is very small, with the differences all being zero or very small numbers. Therefore, what is transmitted after intra-frame prediction coding is the difference between the predicted value and the true value, i.e., the value around 0, which is called the prediction error or residual. This enables compression to be achieved with fewer bits transmitted.

H.265 intra-frame predictive coding is done on a block basis, using the rebuilt values of neighboring blocks that have been rebuilt for the block being coded. The prediction component is divided into two, luminance and chrominance, and the corresponding prediction blocks are luminance prediction block and chrominance prediction block, respectively. In order to adapt to the content characteristics of HD video and improve the prediction accuracy, H.265 adopts richer prediction block sizes and prediction modes.

VP5out_vencode_3

Inter-frame prediction means that the pixels used for prediction and the pixels currently being encoded are not in the same video frame, but are generally in adjacent or nearby locations. In general, inter-frame predictive coding provides better compression than intra-frame prediction, mainly because of the very strong correlation between video frames. If the rate of change of moving objects in a video frame is slow, then the pixel differences between video frames are small and the temporal redundancy is very large.

The method of inter-frame prediction to evaluate the motion condition of a moving object is motion estimation, and its main idea is to search for a matching block for the prediction block from a given range of the reference frame, and calculate the relative displacement between the matching block and the prediction block, and this relative displacement is the motion vector. After obtaining the motion vector, the prediction needs to be corrected, also known as motion compensation. Input the motion vectors into the motion compensation module and "compensate" the reference frame to get the prediction frame of the current encoded frame. The difference between the predicted frame and the current frame is the inter-frame prediction error.

If the inter-frame prediction only uses the previous frame image, it is called forward inter-frame prediction or unidirectional prediction. This predicted frame is also known as a P-frame, and the P-frame can refer to the previous I-frame or P-frame.

Inter-frame prediction is bidirectional if it uses not only the previous frame image to predict the current block, but also the subsequent frame image. This prediction frame is also known as the B-frame, which can refer to the preceding I-frame or P-frame and the following P-frame. Since P-frames need to refer to previous I-frames or P-frames, and B-frames need to refer to previous I-frames or P-frames and later P-frames, if in a video stream, the B-frame arrives first, and the dependent I-frames and P-frames haven't arrived yet, then the B-frame can't be decoded immediately, so what should be done to ensure the playback order?

Actually, PTS and DTS are generated during video encoding. Typically, after generating an I-frame, the encoder skips backward a few frames and encodes a P-frame using the preceding I-frame as a reference frame, and the frame between the I-frame and the P-frame is encoded as a B-frame. The video frame order of the push stream is already coded at the time of encoding in the dependent order of I-frames, P-frames, and B-frames, and the data is directly decoded upon receipt. Therefore, it is not possible to receive B-frames first and then dependent I-frames and P-frames.

Schematic Image
image
  1. Transformation and quantification

In H.265, transform and quantization are used to further compress the data. By transforming the prediction error, the data can be converted from the time domain to the frequency domain for better removal of data redundancy. The transformed data is then quantized to map the data to a lower precision, thus further compressing the data. The process can be referred to the JPEG coding process.

  1. Entropy encoding

In the final step of the encoding process, H.265 uses entropy coding to losslessly compress the data. The main purpose of entropy coding is to minimize the redundancy of the coded data in order to improve the data compression efficiency. The process can be referred to JPEG coding process.

Video Decoding Principle

Video decoding refers to the process of converting compressed video data back to the original video format. The process of video decoding is mainly divided into entropy decoding, inverse quantization, inverse transformation, motion compensation and deconvolution, and post-processing, each of which is designed to recover as close as possible to the original video picture from the highly compressed data. Since the goal of video encoding is to minimize the file size, the decoding process must perform each step of the encoding process precisely in reverse to recover the video content:

  1. Entropy decoding

Entropy decoding is the process of converting compressed data back into a more manageable video format. In the process of video encoding, entropy coding techniques such as Hoffman coding or arithmetic coding are usually used to reduce the amount of data. And the purpose of entropy decoding is to recover the symbols used in the video coding process to prepare for the next step of inverse quantization.

  1. Inverse quantization

Inverse quantization is the process of flipping the quantization (the step in the encoding process that reduces the precision of the data in order to achieve space savings) in order to restore the precision of the original data, a step that is critical to restoring image quality.

  1. Inverse transform

Inverse tranform is the process of reversing the changes used in coding (e.g., the Discrete Cosine Transform, DCT) in order to restore the data from the transform domain (e.g., the frequency domain) to the spatial domain (i.e., the original image), and this step is a key step in image reconstruction.

  1. Motion compensation & deblocking

Run compensation means that, for predicted frames (i.e., frames generated based on the previous/next frame), the complete frame needs to be reconstructed using the run vector data. Deconvolution, on the other hand, is the process of removing block artifacts generated during the coding process. These steps are essential to restore smooth and consistent video playback.

  1. Post-processing

Post-processing, as the final step, involves some techniques to enhance video quality such as denoising, sharpening, etc. This step is optional, and whether or not it is carried out depends on the requirements of video playback and whether or not the hardware capability can support it.

API Interface

Video Encoding Interface

// Get the default parameter for vedio encoding int32_t hbVPGetDefaultVideoEncParam(hbVPVideoEncParam *param); // Create the video encoding context int32_t hbVPCreateVideoEncContext(hbVPVideoContext *context, hbVPVideoEncParam const *param); // Release the video encoding context int32_t hbVPReleaseVideoEncContext(hbVPVideoContext context); // Create video encoding tasks int32_t hbVPVideoEncode(hbUCPTaskHandle_t *taskHandle, hbVPImage const *srcImg, hbVPVideoContext context); // Get the output buffer where the encoding data is stored int32_t hbVPGetVideoEncOutputBuffer(hbUCPTaskHandle_t taskHandle, hbVPArray *outBuf);

Video Decoding Interface

// Get the default parameter for vedio encoding int32_t hbVPGetDefaultVideoDecParam(hbVPVideoDecParam *param); // Create the video encoding context int32_t hbVPCreateVideoDecContext(hbVPVideoContext *context, hbVPVideoDecParam const *param); // Release the video encoding context int32_t hbVPReleaseVideoDecContext(hbVPVideoContext context); // Create video decoding tasks int32_t hbVPVideoDecode(hbUCPTaskHandle_t *taskHandle, hbVPArray const *srcBuf, hbVPVideoContext context); // Get the output buffer where the encoding data is stored int32_t hbVPGetVideoDecOutputBuffer(hbUCPTaskHandle_t taskHandle, hbVPImage *outImg);

For detailed interface information, please refer to hbVPVideoEncode and hbVPVideoDecode.

Usage

Video Encoding Usage

// Include the header #include "hobot/hb_ucp.h" #include "hobot/vp/hb_vp.h" #include "hobot/vp/hb_vp_video_codec.h" // init encoding param hbVPVideoEncParam venc_param; venc_param.videoType = HB_VP_VIDEO_TYPE_H265; hbVPGetDefaultVideoEncParam(&venc_param); venc_param->height = height; venc_param->width = width; venc_param->pixelFormat = image_format; // create video encoding context hbVPVideoContext enc_context{nullptr}; hbVPCreateVideoEncContext(&enc_context, venc_param) // init task handle and schedule param hbUCPTaskHandle_t task_handle{nullptr}; hbUCPSchedParam sched_param; HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param); sched_param.backend = HB_UCP_CORE_ANY; sched_param.priority = 0; { // init image_buf, alloc memory for image data hbUCPSysMem image_mem; hbUCPMalloc(&image_mem, yuv_size, 0); hbVPImage image_buf; image_buf.imageFormat = image_format; image_buf.imageType = image_type; image_buf.width = width; image_buf.height = height; image_buf.stride = stride; image_buf.uvStride = uv_stride; image_buf.dataVirAddr = image_mem.virAddr; image_buf.dataPhyAddr = image_mem.phyAddr; image_buf.uvVirAddr = reinterpret_cast<char *>(image_mem.virAddr) + luma_size; image_buf.uvPhyAddr = image_mem.phyAddr + luma_size; // create video encoding task hbVPVideoEncode(&task_handle, &image_buf, enc_context); // submit video encoding task hbUCPSubmitTask(task_handle, &sched_param); // wait for video encoding task done hbUCPWaitTaskDone(task_handle, 10); // get output buffer hbVPArray out_buf; hbVPGetVideoEncOutputBuffer(task_handle, &out_buf); // process h264 or h265 data // release task handle and output buffer hbUCPReleaseTask(task_handle); } // release memory hbUCPFree(&image_mem); // release video encoding context hbVPReleaseVideoEncContext(enc_context);

Video Decoding Usage

// Include the header #include "hobot/hb_ucp.h" #include "hobot/vp/hb_vp.h" #include "hobot/vp/hb_vp_video_codec.h" // init decoding param hbVPVideoDecParam vdec_param; venc_param.videoType = HB_VP_VIDEO_TYPE_H265; hbVPGetDefaultVideoDecParam(&venc_param); venc_param->pixelFormat = HB_VP_IMAGE_FORMAT_YUV420; venc_param->inBufSize = height * width * 3 / 2; // create video decoding context hbVPVideoContext dec_context{nullptr}; hbVPCreateVideoDecContext(&dec_context, vdec_param) // init task handle and schedule param hbUCPTaskHandle_t task_handle{nullptr}; hbUCPSchedParam sched_param; HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param); sched_param.backend = HB_UCP_CORE_ANY; sched_param.priority = 0; { // init src_buf, alloc memory for image data hbUCPSysMem src_mem; hbUCPMalloc(&src_mem, feedSize, 0); hbVPArray src_buf; src_buf.phyAddr = src_mem.phyAddr; src_buf.virAddr = src_mem.virAddr; src_buf.memSize = src_mem.memSize; src_buf.size = feedSize; src_buf.capacity = feedSize; // create video decoding task hbVPVideoDecode(&task_handle, &src_buf, dec_context); // submit video decoding task hbUCPSubmitTask(task_handle, &sched_param); // wait for video decoding task done hbUCPWaitTaskDone(task_handle, 10); // get output buffer hbVPImage out_img; hbVPGetVideoDecOutputBuffer(task_handle, &out_img); // process yuv data // release task handle and output buffer hbUCPReleaseTask(task_handle); // release memory hbUCPFree(&src_mem); } // release video decoding context hbVPReleaseVideoDecContext(dec_context);

Additional notes

Bit Rate Control Mode

The encoder supports bitrate control for H.264 and H.265 protocols, with five control modes for H.264 and H.265 encoding channels: CBR, VBR, AVBR, FixQp, and QpMap. CBR ensures the stability of the overall encoding bitrate; VBR ensures the stability of the encoded image quality; AVBR balances bitrate and image quality, providing relatively stable bitrate and image quality; FixQp fixes the QP value for each I frame and P frame; QpMap assigns a QP value to each block in a frame, with block sizes of 32x32 for H.265 and 16x16 for H.264. The following introduces bitrate parameters using the H.265 protocol as an example.

  1. Constant Bit Rate(CBR)

The CBR mode ensures the overall encoding bit rate remains stable. Below are the meanings of the parameters in CBR mode:

ParameterDescriptionRange of valuesDefault value
intraPeriodI frame interval[0, 2047]28
intraQpA quantization parameter of intra picture[0, 51]30
bitRateThe target average bitrate of the encoded data in kbps[0, 700000]1000
frameRateThe target frame rate of the encoded data in fps[1, 240]30
initialRcQpSpecifies the initial QP by user. If this value is smaller than 0 or larger than 51, the initial QP is decided by F/W[0, 63]63
vbvBufferSizeSpecifies the size of the VBV buffer in msec (10 ~ 3000). This value is valid when RateControl is 1. VBV buffer size in bits is bit_rate * vbv_buffer_size / 1000.[10, 3000]10
ctuLevelRcEnableThe rate control can work in frame level and MB level.[0, 1]0
minQpIA minimum QP of I picture for rate control[0, 51]8
maxQpIA maximum QP of I picture for rate control[0, 51]8
minQpPA minimum QP of P picture for rate control[0, 51]8
maxQpPA maximum QP of P picture for rate control[0, 51]8
minQpBA minimum QP of B picture for rate control[0, 51]8
maxQpBA maximum QP of B picture for rate control[0, 51]8
hvsQpEnableEnables or disables CU QP derivation based on CU variance. It can enable CU QP adjustment for subjective quality enhancement[0, 1]1
hvsQpScaleA QP scaling factor for subCTU QP adjustment when hvs_qp_enable is 1[0, 4]2
hvsMaxDeltaQpSpecifies maximum delta QP of HVS QP. (0 ~ 12) This value is valid when hvs_qp_enable is 1[0, 12]10
qpMapEnableEnables or disables QP map[0, 1]0
  1. Variable Bit Rate(VBR)

In VBR mode, larger QP values are assigned to simple scenes to achieve higher compression and image quality, while smaller QP values are assigned to complex scenes to ensure the stability of the encoded image quality. Below are the meanings of the parameters in VBR mode:

ParameterDescriptionRange of valuesDefault value
intraPeriodI frame interval[0, 2047]28
intraQpA quantization parameter of intra picture[0, 51]30
frameRateThe target frame rate of the encoded data in fps[1, 240]30
qpMapEnableEnables or disables QP map[0, 1]0
  1. Average Variable Bit Rate(AVBR)

In AVBR mode, lower bit rates are allocated to simple scenes and sufficient bit rates are allocated to complex scenes, allowing efficient bit rate distribution across different scenes, similar to VBR. Meanwhile, over a certain period, the average bit rate approximates the set target bit rate, controlling the output file size, similar to CBR. Hence, AVBR can be considered a compromise between CBR and VBR, producing a bitstream with relatively stable bit rate and image quality. Below are the meanings of the parameters in AVBR mode:

ParameterDescriptionRange of valuesDefault value
intraPeriodI frame interval[0, 2047]28
intraQpA quantization parameter of intra picture[0, 51]30
bitRateThe target average bitrate of the encoded data in kbps[0, 700000]1000
frameRateThe target frame rate of the encoded data in fps[1, 240]30
initialRcQpSpecifies the initial QP by user. If this value is smaller than 0 or larger than 51, the initial QP is decided by F/W[0, 63]63
vbvBufferSizeSpecifies the size of the VBV buffer in msec (10 ~ 3000). This value is valid when RateControl is 1. VBV buffer size in bits is bit_rate * vbv_buffer_size / 1000.[10, 3000]10
ctuLevelRcEnableThe rate control can work in frame level and MB level.[0, 1]0
minQpIA minimum QP of I picture for rate control[0, 51]8
maxQpIA maximum QP of I picture for rate control[0, 51]8
minQpPA minimum QP of P picture for rate control[0, 51]8
maxQpPA maximum QP of P picture for rate control[0, 51]8
minQpBA minimum QP of B picture for rate control[0, 51]8
maxQpBA maximum QP of B picture for rate control[0, 51]8
hvsQpEnableEnables or disables CU QP derivation based on CU variance. It can enable CU QP adjustment for subjective quality enhancement[0, 1]1
hvsQpScaleA QP scaling factor for subCTU QP adjustment when hvs_qp_enable is 1[0, 4]2
hvsMaxDeltaQpSpecifies maximum delta QP of HVS QP. (0 ~ 12) This value is valid when hvs_qp_enable is 1[0, 12]10
qpMapEnableEnables or disables QP map[0, 1]0
  1. Fix QP

FixQp indicates that the QP value for each I-frame and P-frame is fixed. Below are the meanings of the parameters in FixQp mode:

ParameterDescriptionRange of valuesDefault value
intraPeriodI frame interval[0, 2047]28
frameRatehe target frame rate of the encoded data in fps[1, 240]30
qpIA force picture quantization parameter for I picture[0, 51]0
qpPA force picture quantization parameter for P picture[0, 51]0
qpBA force picture quantization parameter for B picture[0, 51]0
  1. Qp Map

QpMap specifies a QP value for each block within a frame, with block sizes of 32x32 in H.265. Below are the meanings of the parameters in QpMap mode:

ParameterDescriptionRange of valuesDefault value
intraPeriodI frame interval[0, 2047]28
frameRateThe target frame rate of the encoded data in fps[1, 240]30
qpMapArraySpecify the qp map. The QP map array should be written a series of 1 byte QP values for each subCTU in raster scan order. The subCTU block size is 32x32指针地址nullptr
qpMapArrayCountSpecify the qp map number. It's related with the picture width and height(ALIGN64(picWidth)>>5)*(ALIGN64(picHeight)>>5)0

GOP Structure

H.264 and H.265 encoding support the configuration of GOP structures, allowing users to choose from preset GOP structures. Below are the descriptions of the preset GOP structures:

GopPresetIdxGOP StructureLow DelayGOP SizeEncoding OrderDescription
1IYes1I0-I1-I2-I3,…I-frames only, no cross-referencing
2PYes1I-P0-P1-P2,…Only I-frames and P-frames, and P-frames refer to 2 forward reference frames
3BYes1I-B0-B1-B2,…Only I-frames and B-frames, and B-frames refer to 2 forward reference frames
6PPPPYes4I-P0-P1-P2-P3,…Only I-frames and P-frames, and P-frames refer to 2 forward reference frames
7BBBBYes4I-B0-B1-B2-B3,…Only I-frames and B-frames, and B-frames refer to 2 forward reference frames
9PYes1I-P0,…Only I-frames and P-frames, and P-frames refer to 1 forward reference frames