Overview

General Introduction

  • What is UCP?

    The Unify Compute Platform(Unify Compute Platform, hereafter referred to UCP) defines a unified set of heterogeneous programming interfaces, and provides APIs(Application Programming Interface, hereafter referred to API) to enable calls to all resources on the SOC(System on Chip, hereafter referred to SOC). UCP abstracts and encapsulates the functional hardware on the SOC, and externally provides function-based APIs for creating corresponding UCP tasks (e.g., VP operator tasks), and supports the setting of a hardware backend to be submitted to the UCP scheduler, which can complete the unified scheduling of tasks on the Soc based on hardware resources.

    Specifically, the following features are provided: Vision Process, Neural Network Inference, High Performance Library and Custom Operator Plugin Development(to be realized).

    diagram_arch

    For details of the Backend, you can refer to the section Backend Instruction .

  • UCP scenarios:

    Call single operator: vision operators and high performance processing operators in UCP can be used directly.

    Operator plugin development and usage: custom operator development is available. Deep Learning Model Inference: deep learning model inference tasks can be accomplished, and model parsing and hardware deployment are completed within UCP.

  • UCP advantages:

    Highly abstract: for the function of single operator, no need to be troubled by hardware differences, hardware that needs to be executed can be selected by specifying the backend, reducing the difficulty of hardware deployment.

    Highly integrated: as Horizon's unified heterogeneous programming interface, all the requirements development can be completed by a set of interfaces.

Note

This section is used to guide you on how to use the UCP for hardware deployment of the model. Knowledge, experience, and skills in basic embedded development can lead to a better understanding of the contents of this section.

Interface Usage Flow

There are three ways to execute UCP task structure calls: synchronous execution, asynchronous execution, and registered callback function execution. Here is an example of calling the VP interface hbVPRotate(hbVPRotate) to illustrate the three task execution modes.

Synchronous Execution

When creating a task, pass the corresponding parameter of the taskHandle into the nullptr, which can be immediately synchronously executed, with the following reference code:

#include "hobot/hb_ucp.h" #include "hobot/vp/hb_vp_rotate.h" int main() { // step1: execute a rotate task hbVPImage src_img{/*do something to init src image*/}; hbVPImage dst_img{/*do something to init dst image*/}; hbVPRotate(nullptr, &dst_img, &src_img, HB_VP_ROTATE_90_CLOCKWISE); return 0; }

Asynchronous Execution

When creating a task, the taskHandle parameter needs to be initialized to nullptr in advance. After submitting a UCP task (hbUCPSubmitTask), executing the interface (hbUCPWaitTaskDone) at the specified location in the thread. Waiting for the task to complete, with the following reference code:

#include "hobot/hb_ucp.h" #include "hobot/vp/hb_vp_rotate.h" int main() { // step1: create a rotate task hbUCPTaskHandle_t task_handle{nullptr}; hbVPImage src_img{/*do something to init src image*/}; hbVPImage dst_img{/*do something to init dst image*/}; hbVPRotate(&task_handle, &dst_img, &src_img, HB_VP_ROTATE_90_CLOCKWISE); // step2:complete the commit and execution of the task hbUCPSchedParam sched_param; sched_param.backend = HB_UCP_DSP_CORE_0; sched_param.priority = 0; hbUCPSubmitTask(task_handle, &sched_param); hbUCPWaitTaskDone(task_handle, 100); hbUCPReleaseTask(task_handle); return 0; }

Registered Callback Function Execution

When creating a task, the taskHandle parameter needs to be initialized to nullptr in advance. The callback function needs to be registered before the task is submitted (hbUCPSubmitTask).

The reference code for setting the callback function is as follows:

#include "hobot/hb_ucp.h" #include "hobot/vp/hb_vp_rotate.h" typedef struct { std::mutex mutex; std::condition_variable cv; int32_t status; } UserData; void CallBack(hbUCPTaskHandle_t handle, int32_t status, void *userdata) { auto data = static_cast<UserData *>(userdata); data->status = status; data->cv.notify_all(); } void ProcessThread(void *userdata) { auto data = static_cast<UserData *>(userdata); { std::unique_lock<std::mutex> lk{tmp_data->mutex}; tmp_data->cv.wait(lk); } // do something here } int main() { // step1: create a rotate task hbUCPTaskHandle_t task_handle{nullptr}; hbVPImage src_img{/*do something to init src image*/}; hbVPImage dst_img{/*do something to init dst image*/}; hbVPRotate(&task_handle, &dst_img, &src_img, HB_VP_ROTATE_90_CLOCKWISE); // step2: create userdata and custom process thread UserData userdata; userdata.status = 0; std::thread worker(ProcessThread, &userdata); // step3:set callback function before submit task hbUCPSetTaskDoneCb(task_handle, CallBack, &userdata); // step4: submit task hbUCPSchedParam sched_param; HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param); sched_param.backend = HB_UCP_DSP_CORE_0; hbUCPSubmitTask(task_handle, &sched_param); worker.join(); hbUCPReleaseTask(task_handle); return 0; }
Note
  1. UCP has built-in the Neural Network Kernel, Vision Process Kernel, and High Performance Kernel, which may be supported by different Backends.
  2. UCP tasks are uniformly scheduled according to a priority-based scheduling strategy, and the priority of task execution can be specified when submitting tasks.
  3. Synchronous execution does not support configuring the task control parameters and selecting the backend. UCP will select the appropriate hardware for execution based on the executable backend of the current task and the hardware load information.
  4. Asynchronous execution supports configuring task control parameters and backend selection. If not specified, the backend is selected based on load balancing across available backends.
  5. Except for video encoding and decoding tasks, the input and output memory for all tasks must be allocated and managed by you using the memory interfaces provided by UCP.

Backend Instruction

The backend refers to the back-end computing hardware for UCP task execution. The current backends supported by UCP include BPU, DSP, GDC, STITCH, JPU, VPU, PYRAMID.

BackendDescription
BPUBrain Process Unit, the Horizon Neural Network computational unit.
DSPDigital Signal Processor, a programmable hardware unit.
GDCGeometric Distortion Correction, a hardware IP on ARM that can perform perspective transformation, distortion correction, and image affine transformation on input images.
STITCHThe stitch is an IP unit of J6, which can crop and stitch the input image, the stitching modes are: alpha fusion, alpha beta fusion, direct copy.
JPUJPEG Processing Unit, mainly used to complete the JPEG encoding and decoding functions.
VPUVideo Processing Unit, a specialized visual processing unit.
PYRAMIDFully known as Image Pyramid, a hardware processing module that reduces the entire original image.

Environment and Tools

UCP is applicable for the Horizon J6 and higher architecture computing platforms, you need to have basic embedded development knowledge and skills to complete the cross-compilation and deployment, the usage of the environment and tool requirements refer to the following table:

Environment/ToolsSupported Version
OSLinux
Development boardJ6 processor
Development languageC++11
Cross compilerLinaro 11.4.0
Toolchain DSPCadence Vsion Q8 2021.7

In addition to the development board, UCP also provides the same development support in an x86 emulation environment as it does on the board.

x86 Emulation Description

Function Description

Similar to the development board, UCP provides the same visual processing, model inference, and high-performance computing capabilities in an x86 architecture through emulation. All examples and interface code can be equivalently used in the emulation environment. You can develop and debug code in the x86 environment, gaining immediate feedback during the development process. This allows you to identify and resolve issues early, improving development efficiency and code quality, and ensuring seamless migration of the code to run on SoC hardware.

The emulation methods for each backend supported by UCP are as follows:

  • BPU and DSP hardware use instruction-level emulation.
  • GDC, JPU, and VPU hardware use CModel executable file emulation.
    • The CModel executable file for GDC hardware is gdc_cmodel .
    • The CModel executable files for JPU hardware are Nieuport_JpgEnc and Nieuport_JpgDec, used for JPEG encoding and decoding, respectively.
    • The CModel executable files for VPU hardware are hevc_enc and hevc_dec, used for video encoding and decoding, respectively.
  • STITCH and PYRAMID hardware use emulation libraries.

Environment Description

Compilation Environment

The UCP emulation uses the compiler environment provided by the Docker image.

Runtime Environment

  1. When using DSP hardware for emulation, you need to configure the Xtensa development environment and specify the DSP emulation image path. For environment configuration, refer to Install DSP Toolchain And Configure Core. You can specify the DSP emulation image path by setting the environment variable HB_DSP_CMODEL_IMAGE to ensure the application can locate the correct emulation image file. The reference command is as follows:
# define the DSP image path export HB_DSP_CMODEL_IMAGE=../x86/bin/image/vdsp0
  1. When running CModel executable files, you need to add the executable file path to the PATH environment variable so that they can be run directly from the terminal. The reference command is as follows:
# define the CModel executable file path root=../x86/bin/ # set the executable file search path export PATH=${root}:${PATH}
  1. No additional configuration is required for running emulations of the other hardware.

Performance Description

However, the performance of the x86 emulation environment is usually lower than that of the actual hardware due to the following reasons: As previously mentioned, the emulation methods used by the various backends supported by UCP include instruction-level emulation, CModel executable file emulation, and emulation libraries.

  • Instruction-level emulation: This method simulates and executes each instruction individually, resulting in higher computational overhead and lower emulation speed.
  • CModel executable file emulation: CModel executable files perform input and output operations through file read/write processes. File I/O operations are time-consuming and can impact the emulation speed to some extent.
  • Emulation libraries: These run on the CPU, which simulates the behavior of hardware accelerators. However, the CPU's architecture and design are not specifically optimized for these tasks, leading to lower execution efficiency.

Although the performance of the emulation environment is typically lower than that of actual hardware, the emulation environment provides comprehensive API support and precise functional verification. This significantly enhances your development efficiency and code quality, helping you identify and resolve potential issues at an early stage.