Quickstart
This section introduces how to use the sample of HPL module in UCP sample package ucp_tutorial, and how to configure the development environment, compile and run the sample application code,
which helps to get started with the HPL function module in UCP quickly.
The main architecture is as follows:

Sample Package Usage
The structure of the sample package is shown as below:
ucp_tutorial
├── deps_aarch64 // The aarch64 public dependency directory
│ ├── some libs // Other third-party dependency libraries
│ └── ucp // UCP dependency libraries and header files, including dnn/vp/ucp/hpl/dsp development
├── deps_x86 // The x86 emulation public dependency directory
│ ├── some libs // Other third-party dependency libraries
│ └── ucp // UCP dependency libraries and header files
├── dnn // The dnn sample
├── tools // Tool
├── vp // The vp sample
└── hpl // The hpl sample
├── code // Sample source code directory
| ├── 01_fft_ifft_transform // Sample of fft and ifft transform
| ├── util // Public code directory
│ ├── build_aarch64.sh // aarch64 sample compilation script
| ├── build.sh // Minimal executable environment scripts for compiling hpl samples
│ ├── build_x86.sh // x86 emulation example compilation script
| └── CMakeLists.txt // The cmake file
└── hpl_samples // Minimal executable environment scripts for hpl samples
├── data // Data directory
├── script // The aarch64 sample script directory
│ ├── 01_fft_ifft_transform // Sample script directory
| └── dsp_deploy.sh // Deployment board dsp runtime environment script
└── script_x86 // The x86 emulation sample script directory
├── 01_fft_ifft_transform // Sample script directory
└── README.md // The readme file
The HPL samples located under the ucp_tutorial/hpl folder, including Fast Fourier Transform sample, which can be compiled by both board-side running and x86 emulation.
For details, please refer to the section HPL Sample.
Compile Sample Operator
Before compiling and running the sample application code, you need to ensure that the environment meets the requirements, according to the guidelines in section Environment Deployment,
your development machine should already have the relevant environment installed, the requirements are as follows:
-
cmake >= 3.0.
-
For board-side compilation, you need to specify the cross-compile toolchain, for x86 emulation docker, you can use the compiler that comes with it.
In the ucp_tutorial/hpl/code directory of the sample, there is a pre-configured build script build.sh,
with the options -a x86 and -a aarch64 to support two types of builds respectively, and executing the build.sh script directly will complete the one-click build,
and the generated files will be saved to the ucp_tutorial/hpl/hpl_samples directory.
Moreover, the directory encompasses two compilation scripts, namely build_aarch64.sh and build_x86.sh, tailored to distinct compilation configurations.
Employing these scripts mirrors the functionality of the build.sh script.
The command you need to execute to compile the HPL module running on x86 simulation is as follows:
cd ucp_tutorial/hpl/code
sh build_x86.sh
After executing the compilation script, the executable programs and dependency files required to run the samples will be generated and saved in the ucp_tutorial/hpl/hpl_samples directory.
Taking the HPL as a sample, its generated objects are shown below, containing image data, sample running scripts, running dependency libraries, executable files,
and script directories for running the samples, which together form a complete running environment and running dependencies.
hpl_samples
├── data // Data directory
└── script_x86 // x86 emulation sample script directory
├── 01_fft_ifft_transform // Sample script directory
├── x86 // The executable files and dependency libraries directory generated after the excute the build script
└── README.md // The readme file
Run Sample
After all the steps of compiling are completed correctly, the executable samples will be configured and saved in the hpl_samples folder.
Depending on the execution environment, the two ways of commands for executing the samples on the board and simulation are introduced as follows.
Run on Board
Copy the entire hpl_samples folder to the development board,
and go to the hpl_samples/script folder and execute the provided running script in sample folder directly to see the results of the sample run.
The reference command for executing the script as follows:
cd hpl_samples
cd script
# Since some of the operators depend on the dsp backend, you need to refresh the dsp image,
# this operation is optional and can be skipped for those who do not need dsp execution or x86 emulation.
sh dsp_deploy.sh
# Run sample
cd 01_fft_ifft_transform
sh run_fft_ifft_tranform.sh.sh
Run on x86 Emulation
Go to the hpl_samples/script_x86 folder and execute the provided run script in sample folder directly to see the results of the sample run.
The reference command for executing the script as follows:
cd hpl_samples
cd script_x86
# Run sample
cd 01_fft_ifft_transform
sh run_fft_ifft_tranform.sh.sh
Note
The Horizon J6 SOC uses the Tensilica Vision Q8 DSP from Cadence, so the dsp operators running in the x86 simulation sample relies on a set of toolchain provided by Cadence.
The environment configuration can be found in the guidelines in section Install DSP Toolchain And Configure Core.
The correct configuration of License and environment variable XTENSA_ROOT is required.
Output Description
Take the sample performing on x86 emulator as an example,
when the sample is running, the process log will be printed on the console and the corresponding output file will be generated.
The log will contain the flow of all the operator callings, and the output will be saved in the data folder.
The output of the sample section as follows:
[user@machine 01_fft_ifft_transform]$ sh run_fft_ifft_tranform.sh
x86 only support direct mode
[UCP]: log level = 3
[UCP]: UCP version = 3.0.3
[VP]: log level = 3
[DNN]: log level = 3
[I][24033][02-29][16:07:46:652][fft_ifft_process.cpp:171][fft_ifft_transform_sample][FFT_IFFT_TRANSFO] FFT IFFT process begin
......
[I][24033][02-29][16:08:25:265][fft_ifft_process.cpp:181][fft_ifft_transform_sample][FFT_IFFT_TRANSFO] FFT IFFT process finish
[C][24070][02-29][16:08:25:272][cmodel_cli.cpp:172][fft_ifft_transform_sample][dsp] [C] DSP ISS exit
The generation will be saved to the hpl_samples/data directory with the following contents:
user@machine:/ucp_sample/hpl_samples/data# ls
fft1d_input_f32.txt fft1d_output_f32.txt ifft1d_output_f32.txt input_image_f32.jpg output_image_f32.jpg README.md
Usage of HPL Operator
This section shows how to implement the Fast Fourier Transform using HPL-encapsulated operators with a simple operator calling.
The main steps include data loading, task creation, task commit, task completion, task destruction, save output and so on.
You can read the corresponding source code and comments to learn.
The role of the sample is to perform an FFT transform of the input data using the hbFFT1D operator, which is implemented as follows:
#include <cstring>
#include "util.h"
#include "log_util.h"
#include "hobot/hpl/hb_fft.h"
// Initialize the input and output memory
hbUCPSysMem src_re_mem, src_im_mem, dst_re_mem, dst_im_mem;
int32_t src_length = 1024 * 4;
hbUCPMallocCached(&src_re_mem, src_length, 0);
hbUCPMallocCached(&src_im_mem, src_length, 0);
hbUCPMallocCached(&dst_re_mem, src_length, 0);
hbUCPMallocCached(&dst_im_mem, src_length, 0);
// Fill the input memory with data
std::string src_img = "./fft_input.txt";
std::ifstream ifs(src_path, std::ios::in | std::ios::binary);
ifs.read(static_cast<char *>(src_re_mem.virAddr), src_length);
ifs.read(static_cast<char *>(src_im_mem.virAddr), src_length);
hbUCPMemFlush(&src_re_mem, HB_SYS_MEM_CACHE_CLEAN);
hbUCPMemFlush(&src_im_mem, HB_SYS_MEM_CACHE_CLEAN);
// Fill the input and output information
hbHPLImaginaryData src, dst;
src.realDataVirAddr = src_re_mem.virAddr;
src.realDataPhyAddr = src_re_mem.phyAddr;
src.imDataVirAddr = src_im_mem.virAddr;
src.imDataPhyAddr = src_im_mem.phyAddr;
src.numDimensionSize = 1;
src.dataType = HB_HPL_DATA_TYPE_I16;
src.imFormat = HB_IM_FORMAT_SEPARATE;
src.dimensionSize[0] = src_length / sizeof(int16_t);
dst.realDataVirAddr = dst_re_mem.virAddr;
dst.realDataPhyAddr = dst_re_mem.phyAddr;
dst.imDataVirAddr = dst_im_mem.virAddr;
dst.imDataPhyAddr = dst_im_mem.phyAddr;
dst.numDimensionSize = 1;
dst.dataType = HB_HPL_DATA_TYPE_I16;
dst.imFormat = HB_IM_FORMAT_SEPARATE;
dst.dimensionSize[0] = src_length / sizeof(int16_t);
// Fill the operator parameter
hbFFTParam param;
param.pSize = HB_HPL_FFT32;
param.normalize = 0;
// Create a task through the operator interface provided by HPL, the task handle can be set to nullptr,
// and this task will be executed in synchronized mode
hbUCPTaskHandle_t task_handle{nullptr};
// Set scheduling parameters to adjust task priority, select execution terminals etc
hbUCPSchedParam sched_param;
HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param);
sched_param.backend = HB_UCP_DSP_CORE_0; // Specify the execution device ID
sched_param.priority = 0; // Specify task priority
hbFFT1D(&task_handle, // UCP task handle
&dst, // output data
&src, // input data
¶m // operator parameter
);
// Set scheduling parameters to adjust task priority, select execution terminals etc
hbScheParam sched_param;
HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param);
sched_param.backend = HB_UCP_DSP_CORE_0; /*Specify the execution device ID*/
// Commit the task, sche_param support setting to nullptr, use the default scheduling parameter
hbUCPSubmitTask(task_handle, &sched_param);
// Wait for the task to complete, set the timeout parameter, a value of 0 means to wait all the time
hbUCPWaitTaskDone(task_handle, 0);
// Release the task handle
hbUCPReleaseTask(task_handle);
// Release the memory resource
hbUCPFree(&src_re_mem);
hbUCPFree(&src_im_mem);
hbUCPFree(&dst_re_mem);
hbUCPFree(&dst_im_mem);