hrt_model_exec is a model execution tool that can evaluate the inference performance of the model and get the model information directly on the development board.
On one hand, it allows the user to get a realistic understanding of the model's real performance; On the other hand, it also helps the user to learn the speed limit that the model can achieve, which is useful information in application tuning.
hrt_model_exec tool source code is located in the samples/ucp_tutorial/tools/hrt_model_exec path of the horizon_j6_open_explorer publication. The structure is as follows:
hrt_model_exec provides three types of functions including model inference infer, model performance analysis perf and viewing model information model_info, respectively, as shown in the following table:
| No. | Subcommand | Description |
|---|---|---|
| 1 | model_info | Get model information, such as model input and output information, etc. |
| 2 | infer | Perform model inference and get model inference results. |
| 3 | perf | Perform model performance analysis and obtain performance analysis results. |
The tool can view the tool's dnn prediction library version number with the -v or --version commands.
| Parameter* | Type | Description |
|---|---|---|
model_file | string | Model file path, multiple paths can be separated by commas. |
model_name | string | Specify the name of a model in the model. |
core_id | int | Specify the running core, 0 means arbitrary core, 1 means core0; default 0. |
input_file | string | Model input information. The input of the image type, it must have one of the following file name suffixes: PNG / JPG / JPEG / png / jpg / jpeg, the input suffix name of the feature must be one of bin / txt. The inputs should be separated by commas ,, such as xxx.jpg,input.txt. |
input_img_properties | string | The color space information of the model image input. Each image type input in input_file needs to specify a Y/UV type, and each input color space needs to be separated by an English character comma ,, such as: Y,UV. |
input_valid_shape | string | Model dynamic validShape input information. If the model input attribute validShape contains -1, the -1 part needs to be completed, and multiple validShape are separated by English semicolons. For example: --input_valid_shape="1,376,376,1;1,188,188,2". |
input_stride | string | Model dynamic stride input information. If the model input attribute stride contains -1, the -1 part needs to be completed, and multiple strides are separated by English semicolons. For example: --input_stride="144384,384,1,1;72192,384,2,1". |
roi_infer | bool | Enables resizer model inference. If the model has input from the resizer input source, it needs to be set to true, and the input_file and roi parameters corresponding to the input source must be configured. |
roi | string | Specify the roi region required for resizer model inference, multiple ROIs are separated by English semicolons. For example: --roi="2,4,123,125;6,8,111,113" |
frame_count | int | The number of running frames of the execution model. |
dump_intermediate | string | dump model each layer of input and output.dump_intermediate=0, the dump function is turned off by default.dump_intermediate=1, the input and output data of each node layer in the model are saved as bin, where inputs and outputs of node are stride data.dump_intermediate=2, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are stride data.dump_intermediate=3, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are valid data. |
enable_dump | bool | Enables dump model input and output, defaults to false. |
dump_precision | int | Controls the number of decimal places of the float type data output in txt format, default is 9. |
dequantize_process | bool | Inverse quantization of model output, effective when enable_dump is true, default is false. |
remove_padding_process | bool | Remove padding of model output, effective when enable_dump is true, default is false. |
dump_format | string | The format of the dump model input and output. |
dump_txt_axis | int | Control line feed rules for txt format input and output. |
enable_cls_post_process | bool | Enables classification post-processing, defaults to false. Used when the subcommand is infer. Currently, it only supports post-processing of the ptq classification model and printing of classification results. |
perf_time | int | Execution model runtime. |
thread_num | int | Number of threads (parallelism), the value can indicate how many tasks are processed in parallel at most. When testing latency, the value needs to be set to 1 to avoid resource preemption and get more accurate latency. When testing throughput, it is recommended to set >2 (number of BPU cores) to adjust the number of threads so that the BPU utilization is as high as possible, and the throughput test is more accurate. |
profile_path | string | Statistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption. Generally, just set --profile_path=".", which means the log file will be generated in the current directory. |
dump_path | string | The path of dump model input and output, effective when enable_dump or dump_intermediate is set. |
After setting the profile_path parameter and the tool runs normally, profiler.log and profiler.csv files will be generated. The files include the following parameters:
| PARAMETER | DESCRIPTIONS |
|---|---|
FPS | Frames processed per second. |
average_latency | The average time it takes to run a frame. |
| PARAMETER | DESCRIPTIONS |
|---|---|
core_id | The bpu core set by the program running. |
frame_count | The total number of frames the program runs. |
model_name | The name of the evaluation model. |
run_time | Program running time. |
thread_num | The number of threads the program runs on. |
| PARAMETER | DESCRIPTIONS |
|---|---|
Node-pad | Model input padding takes time. |
Node-NodeIdx-NodeType-NodeName | Time consuming information of model nodes. Note: NodeIdx Specifies the sequence number of the model node topology, and NodeType is a specific node type, such as Dequantize, and NodeName is a specific node name. |
| PARAMETER | DESCRIPTIONS |
|---|---|
BPU_inference_time_cost | Inferencing BPU processor time per frame. |
CPU_inference_time_cost | Inference CPU processor time per frame. |
| PARAMETER | DESCRIPTIONS |
|---|---|
TaskRunningTime | The actual running time of the task includes the time consumed by the UCP framework. |
This tool provides three types of functions: model information acquisition, single-frame inference function, and multi-frame performance evaluation.
Run hrt_model_exec, hrt_model_exec -h, or hrt_model_exec --help for tool usage details, as shown in the following:
This parameter is used to get the model information, supporting both QAT and PTQ models. This parameter is used together with model_file to get detailed information about the model, including model input and output information hbDNNTensorProperties.
If model_name is not specified, all the models in the model are outputted. If model_name is specified, only the information of the corresponding model is outputted.
This parameter is used for model inference, where the input images are defined by user and one frame is inferred.
This parameter should be used together with input_file to specify the input image path, and the tool resizes the image according to the model information and organizes the model input information.
The program runs a single frame of data in a single thread and outputs the time of the model execution.
The model has three inputs, and the input source order is [ddr, resizer, resizer].
Infer data form two frames, suppose the input of the first frame is [xx0.bin, xx1.jpg, xx2.jpg], roi is [2,4,123,125;6,8,111,113], the input of the second frame is [xx3.bin, xx4.jpg, xx5.jpg], roi is [27,46,143,195;16,28,131,183], then the inference command is as follows:
Note that you should use commas to separate the multiple frame inputs, and use semicolons to separate the rois.
| Parameter | Description |
|---|---|
core_id | Specifies the core ID for model inference. |
input_img_properties | Color space information of the model image input. |
input_valid_shape | Model dynamic validShape input information. |
input_stride | Model dynamic stride input information. |
roi_infer | Enables resizer model inference. |
roi | Effective when roi_infer is true. Set the roi region required for resizer model inference. |
frame_count | Sets the number of frames to run infer. Single frame repeated inference, can be used together with enable_dump to verify output consistency, defaults to 1. |
dump_intermediate | Dumps the input+output data of each model layer, default is 0. |
enable_dump | Dumps the input and output data of the model, defaults to false. |
dump_precision | Controls the number of decimal places in the txt format to output float data, default is 9. |
dequantize_process | Inverse quantization of model output, effective when enable_dump is true, default is false. |
remove_padding_process | Remove padding of model output, effective when enable_dump is true, default is false. |
dump_format | Type of dump model output file, with optional parameters bin or txt, default is bin. |
dump_txt_axis | Line wrapping rule for txt format output of dump model. If output dimension = n, then parameter range: [0, n], defaults to -1, which means one data per row. |
enable_cls_post_process | Enables classification post-processing, currently only supports PTQ classification model, defaults to false. |
dump_path | The path of dump model input and output, effective when enable_dump or dump_intermediate is set. |
This parameter is used to test the model performance.
In this mode, the user does not need to input data, and the program automatically constructs the input tensor according to the model, and the tensor data are random numbers.
By default, the program runs 200 frames of data in a single thread. When perf_time is specified, frame_count is disabled, and the program will run for the specified period of time and then exit.
Outputs the latency and the frame rate of the model. The program prints the performance information every 200 frames: max, min, and average values of latency. If < 200 frames, prints once before the programs ends.
The program finally outputs the running-related data, including number of program threads, number of frames, total model inference time, average latency of model inference, and frame rate.
The model has three inputs, and the input source order is [ddr, resizer, resizer].
Infer data from two frames at a time, suppose the input of the first frame is [xx0.bin, xx1.jpg, xx2.jpg], the roi is [2,4,123,125;6,8,111,113], and the input of the second frame is [xx3.bin, xx4.jpg ,xx5.jpg], roi is [27,46,143,195;16,28,131,183], then the perf command is as follows:
Note that you should use commas to separate the multiple frame inputs, and use semicolons to separate the rois.
| PARAMETER | DESCRIPTIONS |
|---|---|
core_id | Specify the core id for model inference. |
input_file | Model input information, multiple can be separated by commas. |
input_img_properties | Color space information of the model image input. |
input_valid_shape | Model dynamic validShape input information. |
input_stride | Model dynamic stride input information. |
roi_infer | Enables resizer model inference, If the model contains resizer input source, set it to true, default is fasle. |
roi | Effective when roi_infer is true, Set the roi region required for resizer model inference to be split by semicolon. |
frame_count | Set perf the number of frames to run, takes effect when perf_time is 0, default value 200. |
dump_intermediate | dump model each layer of input data and output data, default value 0. |
perf_time | Set perf runtime in minutes, default value 0. |
thread_num | Set the number of threads to run, range [1, 8], default 1, if set to more than 8, it will be treated as 8 threads. |
profile_path | Statistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption. |
The tool infer supports inference for multiple input models, supporting image input, binary file input, and text file input, with input data separated by commas.
The model input information can be viewed via model_info.
Example:
If the model input is dynamic, you need to use the input_valid_shape and input_stride parameters to complete the dynamic information. You can choose to specify the parameters in the following two ways:
validShape or stride information is given for dynamic inputs.validShape or stride information is given for all inputs, and the information for non-dynamic inputs must be consistent with the model information.Taking the model in the Dynamic Input Introduction section as an example, you can run the model with the following command:
When input_file is given an image input, you need to use the input_img_properties parameter to specify which color space of the image you want to use as input to the model. Currently, only Y and UV color spaces are supported.
There is a pre-configured compilation script build.sh in the ucp_tutorial/tools/hrt_model_exec directory. The options -a x86 and -a aarch64 support two compilation modes respectively. You can use this script and specify the compilation options for compilation.
In addition, the directory also contains two compilation scripts, build_aarch64.sh and build_x86.sh, which correspond to two compilation options respectively. Compiling with these two scripts is equivalent to using the build.sh script and specifying the compilation options.
After building board-side hrt_model_exec tools, the output_shared_J6_aarch64 folder will be generated.
You can use this tool by copying the folder to the board environment and executing output_shared_J6_aarch64/script/run_hrt_model_exec.sh.
After building x86-side hrt_model_exec tools, the output_shared_J6_x86 folder will be generated.
You can use this tool on the x86 environment and executing output_shared_J6_x86/script_x86/run_hrt_model_exec.sh.
The run_hrt_model_exec.sh script is divided into two parts: setting environment variables and getting model information and inferring the model.
Before running, you need to modify the corresponding parameters of run_hrt_model_exec.sh to ensure that the model and input files are correct. You can also use other parameters flexibly to use more functions.
Latency refers to the average time spent by a single-process inference model. It focuses on the average time it takes to infer one frame when resources are sufficient. This is reflected in the statistics of single-core and single-thread running on the board. The pseudo code of the statistical method is as follows:
FPS refers to the average number of frames per second of model inference performed by multiple processes at the same time, which focuses on the throughput of the model when fully utilizing the resources.
In on-board running situation, it is represented as multi-core and multi-threaded. The statistical method is to perform the model inference by initiating multiple threads at the same time and calculate the total number of frames of the inference in average 1 second.
Latency and FPS are different in statistical scenarios. Latency is single-process (single-core, single-thread) inference, and FPS is multi-process (dual-core, multi-thread) inference,
so the calculation is different. If the number of processes (threads) is set to 1 when counting the FPS, then the FPS estimated by Latency is consistent with the measured one.