The hrt_model_exec Tool Introduction

Tool Introduction

hrt_model_exec is a model execution tool that can evaluate the inference performance of the model and get the model information directly on the development board.

On one hand, it allows the user to get a realistic understanding of the model's real performance; On the other hand, it also helps the user to learn the speed limit that the model can achieve, which is useful information in application tuning.

hrt_model_exec tool source code is located in the samples/ucp_tutorial/tools/hrt_model_exec path of the horizon_j6_open_explorer publication. The structure is as follows:

├── include # Header file ├── src # Source code ├── build.sh # Compile script ├── build_x86.sh # Compile to produce x86 tools ├── build_aarch64.sh # Compile to produce aarch64 tools ├── CMakeLists.txt ├── README.md ├── script # script for aarch64 └── script_x86 # script for x86

hrt_model_exec provides three types of functions including model inference infer, model performance analysis perf and viewing model information model_info, respectively, as shown in the following table:

No.SubcommandDescription
1model_infoGet model information, such as model input and output information, etc.
2inferPerform model inference and get model inference results.
3perfPerform model performance analysis and obtain performance analysis results.

The tool can view the tool's dnn prediction library version number with the -v or --version commands.

hrt_model_exec -v hrt_model_exec --version

Parameters Description

Parameter*TypeDescription
model_filestringModel file path, multiple paths can be separated by commas.
model_namestringSpecify the name of a model in the model.
core_idintSpecify the running core, 0 means arbitrary core, 1 means core0; default 0.
input_filestringModel input information. The input of the image type, it must have one of the following file name suffixes: PNG / JPG / JPEG / png / jpg / jpeg, the input suffix name of the feature must be one of bin / txt. The inputs should be separated by commas ,, such as xxx.jpg,input.txt.
input_img_propertiesstringThe color space information of the model image input. Each image type input in input_file needs to specify a Y/UV type, and each input color space needs to be separated by an English character comma ,, such as: Y,UV.
input_valid_shapestringModel dynamic validShape input information. If the model input attribute validShape contains -1, the -1 part needs to be completed, and multiple validShape are separated by English semicolons. For example: --input_valid_shape="1,376,376,1;1,188,188,2".
input_stridestringModel dynamic stride input information. If the model input attribute stride contains -1, the -1 part needs to be completed, and multiple strides are separated by English semicolons. For example: --input_stride="144384,384,1,1;72192,384,2,1".
roi_inferboolEnables resizer model inference. If the model has input from the resizer input source, it needs to be set to true, and the input_file and roi parameters corresponding to the input source must be configured.
roistringSpecify the roi region required for resizer model inference, multiple ROIs are separated by English semicolons. For example: --roi="2,4,123,125;6,8,111,113"
frame_countintThe number of running frames of the execution model.
dump_intermediatestringdump model each layer of input and output.
  • When dump_intermediate=0, the dump function is turned off by default.
  • When dump_intermediate=1, the input and output data of each node layer in the model are saved as bin, where inputs and outputs of node are stride data.
  • When dump_intermediate=2, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are stride data.
  • When dump_intermediate=3, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are valid data.
  • enable_dumpboolEnables dump model input and output, defaults to false.
    dump_precisionintControls the number of decimal places of the float type data output in txt format, default is 9.
    dequantize_processboolInverse quantization of model output, effective when enable_dump is true, default is false.
    remove_padding_processboolRemove padding of model output, effective when enable_dump is true, default is false.
    dump_formatstringThe format of the dump model input and output.
    dump_txt_axisintControl line feed rules for txt format input and output.
    enable_cls_post_processboolEnables classification post-processing, defaults to false. Used when the subcommand is infer. Currently, it only supports post-processing of the ptq classification model and printing of classification results.
    perf_timeintExecution model runtime.
    thread_numintNumber of threads (parallelism), the value can indicate how many tasks are processed in parallel at most.
    When testing latency, the value needs to be set to 1 to avoid resource preemption and get more accurate latency.
    When testing throughput, it is recommended to set >2 (number of BPU cores) to adjust the number of threads so that the BPU utilization is as high as possible, and the throughput test is more accurate.
    profile_pathstringStatistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption. Generally, just set --profile_path=".", which means the log file will be generated in the current directory.
    dump_pathstringThe path of dump model input and output, effective when enable_dump or dump_intermediate is set.

    After setting the profile_path parameter and the tool runs normally, profiler.log and profiler.csv files will be generated. The files include the following parameters:

    • perf_result:Record perf results.
    PARAMETERDESCRIPTIONS
    FPSFrames processed per second.
    average_latencyThe average time it takes to run a frame.
    • running_condition:Operating environment information.
    PARAMETERDESCRIPTIONS
    core_idThe bpu core set by the program running.
    frame_countThe total number of frames the program runs.
    model_nameThe name of the evaluation model.
    run_timeProgram running time.
    thread_numThe number of threads the program runs on.
    • model_latency: Model node time consumption statistics.
    PARAMETERDESCRIPTIONS
    Node-padModel input padding takes time.
    Node-NodeIdx-NodeType-NodeNameTime consuming information of model nodes. Note: NodeIdx Specifies the sequence number of the model node topology, and NodeType is a specific node type, such as Dequantize, and NodeName is a specific node name.
    • processor_latency:Model processor time consumption statistics.
    PARAMETERDESCRIPTIONS
    BPU_inference_time_costInferencing BPU processor time per frame.
    CPU_inference_time_costInference CPU processor time per frame.
    • task_latency:Model task time-consuming statistics.
    PARAMETERDESCRIPTIONS
    TaskRunningTimeThe actual running time of the task includes the time consumed by the UCP framework.

    Usage Instructions

    This tool provides three types of functions: model information acquisition, single-frame inference function, and multi-frame performance evaluation.

    Run hrt_model_exec, hrt_model_exec -h, or hrt_model_exec --help for tool usage details, as shown in the following:

    Usage: hrt_model_exec [Option...] [Parameter] [Option] [instruction] --------------------------------------------------------------------------------------------------------------- -h --help Display this information -v --version Display this version [Option] [Parameter] --------------------------------------------------------------------------------------------------------------- --model_file [string]: Model file paths, separate by comma, each represents one model file path. --model_name [string]: Model name. When model_file has one more model and Subcommand is infer or perf, "model_name" must be specified! --core_id [int] : core id, 0 for any core, 1 for core 0, 2 for core 1, 3 for core 2, 4 for core 3, default is 0. Please confirm the number of bpu cores on the board before setting up. --input_file [string]: Input file paths, separate by comma, each represents one input. The extension of files should be one of [jpg, JPG, jpeg, JPEG, png, PNG, bin, txt] bin for binary such as image data, nv12 or yuv444 etc. txt for plain data such as image info. --roi_infer [bool] : flag for roi infer, The default is false. --roi [string]: roi information. If set roi_infer as resizer, this parameter is required, roi are separated by semicolons. For example: --roi="0,0,124,124;1,1,123,123" --frame_count [int] : frame count for run loop, default 200, valid when perf_time is 0 in perf mode; default 1 for infer mode. --dump_intermediate [string]: dump intermediate layer input and output. The default is 0. Subcommand must be infer. --enable_dump [bool] : flag for dump infer input and output. The default is false. Subcommand must be infer. --dump_precision [int] : Output dump precision for float32/float64 in txt file. Default is 9 decimal places. --dequantize_process [bool] : dequantize the model infer output. The default is false. Subcommand must be infer, enable_dump set as true --remove_padding_process [bool] : remove padding of the model infer output. The default is false. Subcommand must be infer, enable_dump set as true --dump_format [string]: output dump format, only support [bin, txt]. The default is bin. Subcommand must be infer. --dump_txt_axis [int] : The txt file of dump is expanded according to the specified axis; the default is -1, which means there is only one data per line; Subcommand must be perf, dump_format must be txt. range:[0, tensor_dimension]. --enable_cls_post_process [bool] : flag for classfication post process, only for ptq model now. Subcommand must be infer. --perf_time [int] : minute, perf time for run loop, default 0. Subcommand must be perf. --thread_num [int] : thread num for run loop, thread_num range:[1,8], if thread_num > 8, set thread_num = 8. Subcommand must be perf. --profile_path [string]: profile log and csv files path, set to get detail information of model execution. --dump_path [string]: dump file path, --enable_dump or --dump_intermediate will dump model nodes inputs and outputs files. --input_img_properties [string]: Specify the color space of the image type input. Each image needs to specify the color space, separated by commas. The supported color spaces are [Y, UV]. The NV12 type is only used for compatible PYM and Resizer models. --input_valid_shape [string]: Complete the validshape of the model input, allowing only the dynamic part to change. Provide two ways to set: 1. This only needs to be set when the validShape of the model input is dynamic. 2. Set for all inputs. Different inputs are separated by semicolons, and different dimensions are separated by commas. For example: --input_valid_shape="1,376,376,1;1,188,188,2". --input_stride [string]: Complete the stride of the model input, allowing only the dynamic part to change. Provide two ways to set: 1. This only needs to be set when the stride of model input is dynamic. 2. Set for all inputs. Different inputs are separated by semicolons, and different dimensions are separated by commas. For example: --input_stride="144384,384,1,1;72192,384,2,1". [Examples] --------------------------------------------------------------------------------------------------------------- hrt_model_exec model_info | hrt_model_exec infer | hrt_model_exec perf --model_file | --model_file | --model_file --model_name | --model_name | --model_name | --core_id | --core_id | --input_file | --input_file | --input_img_properties | --input_img_properties | --input_valid_shape | --input_valid_shape | --input_stride | --input_stride | --roi_infer | --roi_infer | --roi | --roi | --frame_count | --frame_count | --dump_intermediate | --profile_path | --enable_dump | --perf_time | --dump_precision | --thread_num | --dequantize_process | | --dump_path | | --remove_padding_process | | --dump_format | | --dump_txt_axis | | --enable_cls_post_process |

    model_info

    Overview

    This parameter is used to get the model information, supporting both QAT and PTQ models. This parameter is used together with model_file to get detailed information about the model, including model input and output information hbDNNTensorProperties.

    If model_name is not specified, all the models in the model are outputted. If model_name is specified, only the information of the corresponding model is outputted.

    Example

    1. Single Model
    hrt_model_exec model_info --model_file=xxx.hbm ../aarch64/bin/hrt_model_exec model_info --model_file=resnet50_224x224_featuremap.hbm I0000 00:00:00.000000 1634 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3 Load model to DDR cost 6423.24ms. This model file has 1 model: [resnet50_224x224_featuremap] --------------------------------------------------------------------- [model name]: resnet50_224x224_featuremap [model desc]: {"BUILDER_VERSION": "3.0.3", "HBDK_VERSION": "4.0.12.post0.dev202312251146+8d9d31b", "HBDK_RUNTIME_VERSION": null, "HORIZON_NN_VERSION": "0.22.0.post0.dev202312172213+3c57b998554cf249fa033311501eead62c927392", "CAFFE_MODEL": null, "PROTOTXT": null, "ONNX_MODEL": "/open_explorer/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/resnet50/resnet50.onnx", "MARCH": "nash-e", "LAYER_OUT_DUMP": "False", "LOG_LEVEL": null, "WORKING_DIR": "/open_explorer/samples/ai_toolchain/horizon_model_convert_sample/03_classification/03_resnet50/model_output", "MODEL_PREFIX": "resnet50_224x224_featuremap", "OUTPUT_NODES": "", "REMOVE_NODE_TYPE": "", "REMOVE_NODE_NAME": "", "DEBUG_MODE": null, "NODE_INFO": "{}", "INPUT_NAMES": "input", "INPUT_SPACE_AND_RANGE": "regular", "INPUT_TYPE_RT": "featuremap", "INPUT_TYPE_TRAIN": "featuremap", "INPUT_LAYOUT_TRAIN": "NCHW", "INPUT_LAYOUT_RT": "", "NORM_TYPE": "no_preprocess", "MEAN_VALUE": "None", "SCALE_VALUE": "None", "INPUT_SHAPE": "1x3x224x224", "INPUT_BATCH": "", "SEPARATE_BATCH": "False", "CUSTOM_OP_METHOD": null, "CUSTOM_OP_DIR": null, "CUSTOM_OP_REGISTER_FILES": "", "OPTIMIZATION": "", "CALI_TYPE": "default", "CALI_DIR": "/open_explorer/samples/ai_toolchain/horizon_model_convert_sample/03_classification/03_resnet50/calibration_data_bgr", "CAL_DATA_TYPE": "float32", "PER_CHANNEL": "False", "MAX_PERCENTILE": "None", "RUN_ON_CPU": "", "RUN_ON_BPU": "", "ADVICE": 0, "DEBUG": "True", "OPTIMIZATION_LEVEL": "O2", "COMPILE_MODE": "latency", "CORE_NUM": 1, "MAX_TIME_PER_FC": 0, "BALANCE_FACTOR": null, "ABILITY_ENTRY": null, "INPUT_SOURCE": {"input": "ddr"}} input[0]: name: input input source: HB_DNN_INPUT_FROM_DDR valid shape: (1,3,224,224,) aligned shape: (1,3,224,224,) aligned byte size: 602112 tensor type: HB_DNN_TENSOR_TYPE_F32 tensor layout: HB_DNN_LAYOUT_NONE quanti type: NONE stride: (602112,200704,896,4,) output[0]: name: output valid shape: (1,1000,) aligned shape: (1,1000,) aligned byte size: 4096 tensor type: HB_DNN_TENSOR_TYPE_F32 tensor layout: HB_DNN_LAYOUT_NONE quanti type: NONE stride: (4000,4,) ---------------------------------------------------------------------
    1. Multi-model (output all model information)
    hrt_model_exec model_info --model_file=xxx.hbm,xxx.hbm
    1. Multi-model - pack model (output specified model information)
    hrt_model_exec model_info --model_file=xxx.hbm --model_name=xx

    infer

    Overview

    This parameter is used for model inference, where the input images are defined by user and one frame is inferred. This parameter should be used together with input_file to specify the input image path, and the tool resizes the image according to the model information and organizes the model input information.

    The program runs a single frame of data in a single thread and outputs the time of the model execution.

    Example

    1. Single Model
    hrt_model_exec infer --model_file=xxx.hbm --input_file=xxx.bin ../aarch64/bin/hrt_model_exec infer --model_file=resnet50_224x224_featuremap.hbm --model_name= --input_file=data.bin I0000 00:00:00.000000 1665 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3 Load model to DDR cost 5408.48ms. I0401 21:03:44.551949 1665 function_util.cpp:117] get model handle success I0401 21:03:44.591531 1665 function_util.cpp:123] get model input count success I0401 21:03:44.592371 1665 function_util.cpp:129] get model output count success I0401 21:03:44.595619 1665 function_util.cpp:155] prepare output tensor success file length: 602112 I0401 21:03:44.659796 1665 function_util.cpp:168] read file success! I0401 21:03:44.664801 1665 function_util.cpp:193] Create task success I0401 21:03:55.158859 1665 function_util.cpp:206] task done ---------------------Frame 0 begin--------------------- Infer time: 10498.3 ms ---------------------Frame 0 end---------------------
    1. Multi-model
    hrt_model_exec infer --model_file=xxx.hbm,xxx.hbm --model_name=xx --input_file=xxx.jpg
    1. Resizer Model

    The model has three inputs, and the input source order is [ddr, resizer, resizer].

    Infer data form two frames, suppose the input of the first frame is [xx0.bin, xx1.jpg, xx2.jpg], roi is [2,4,123,125;6,8,111,113], the input of the second frame is [xx3.bin, xx4.jpg, xx5.jpg], roi is [27,46,143,195;16,28,131,183], then the inference command is as follows:

    hrt_model_exec infer --roi_infer=true --model_file=xxx.bin --input_file="xx0.bin,xx1.jpg,xx2.jpg,xx3.bin,xx4.jpg,xx5.jpg" --roi="2,4,123,125;6,8,111,113;27,46,143,195;16,28,131,183"
    Note

    Note that you should use commas to separate the multiple frame inputs, and use semicolons to separate the rois.

    Optional Parameters

    ParameterDescription
    core_idSpecifies the core ID for model inference.
    input_img_propertiesColor space information of the model image input.
    input_valid_shapeModel dynamic validShape input information.
    input_strideModel dynamic stride input information.
    roi_inferEnables resizer model inference.
    roiEffective when roi_infer is true. Set the roi region required for resizer model inference.
    frame_countSets the number of frames to run infer. Single frame repeated inference, can be used together with enable_dump to verify output consistency, defaults to 1.
    dump_intermediateDumps the input+output data of each model layer, default is 0.
    enable_dumpDumps the input and output data of the model, defaults to false.
    dump_precisionControls the number of decimal places in the txt format to output float data, default is 9.
    dequantize_processInverse quantization of model output, effective when enable_dump is true, default is false.
    remove_padding_processRemove padding of model output, effective when enable_dump is true, default is false.
    dump_formatType of dump model output file, with optional parameters bin or txt, default is bin.
    dump_txt_axisLine wrapping rule for txt format output of dump model. If output dimension = n, then parameter range: [0, n], defaults to -1, which means one data per row.
    enable_cls_post_processEnables classification post-processing, currently only supports PTQ classification model, defaults to false.
    dump_pathThe path of dump model input and output, effective when enable_dump or dump_intermediate is set.

    perf

    Overview

    This parameter is used to test the model performance.

    In this mode, the user does not need to input data, and the program automatically constructs the input tensor according to the model, and the tensor data are random numbers.

    By default, the program runs 200 frames of data in a single thread. When perf_time is specified, frame_count is disabled, and the program will run for the specified period of time and then exit.

    Outputs the latency and the frame rate of the model. The program prints the performance information every 200 frames: max, min, and average values of latency. If < 200 frames, prints once before the programs ends.

    The program finally outputs the running-related data, including number of program threads, number of frames, total model inference time, average latency of model inference, and frame rate.

    Example

    1. Single Model
    hrt_model_exec perf --model_file=xxx.hbm ../aarch64/bin/hrt_model_exec perf --model_file=yolov5x_672x672_nv12.hbm --model_name= --frame_count=100 [BPU][[BPU_MONITOR]][INFO]BPULib verison(2, 0, 0)[]! [DNN]: 3.0.7_(4.0.20 HBRT) Load model to DDR cost 3107.75ms. Frame count: 100, Thread Average: 15.260320 ms, thread max latency: 15.845000 ms, thread min latency: 15.195000 ms, FPS: 63.908421 Running condition: Thread number is: 1 Frame count is: 100 Program run time: 1565.202000 ms Perf result: Frame totally latency is: 1526.031982 ms Average latency is: 15.260320 ms Frame rate is: 63.889517 FPS
    1. Multi-model
    hrt_model_exec perf --model_file=xxx.hbm,xxx.hbm --model_name=xxx
    1. Resizer Model

    The model has three inputs, and the input source order is [ddr, resizer, resizer].

    Infer data from two frames at a time, suppose the input of the first frame is [xx0.bin, xx1.jpg, xx2.jpg], the roi is [2,4,123,125;6,8,111,113], and the input of the second frame is [xx3.bin, xx4.jpg ,xx5.jpg], roi is [27,46,143,195;16,28,131,183], then the perf command is as follows:

    hrt_model_exec perf --roi_infer=true --model_file=xxx.hbm --input_file="xx0.bin,xx1.jpg,xx2.jpg,xx3.bin,xx4.jpg,xx5.jpg" --roi="2,4,123,125;6,8,111,113;27,46,143,195;16,28,131,183"
    Note

    Note that you should use commas to separate the multiple frame inputs, and use semicolons to separate the rois.

    Optional Parameters

    PARAMETERDESCRIPTIONS
    core_idSpecify the core id for model inference.
    input_fileModel input information, multiple can be separated by commas.
    input_img_propertiesColor space information of the model image input.
    input_valid_shapeModel dynamic validShape input information.
    input_strideModel dynamic stride input information.
    roi_inferEnables resizer model inference, If the model contains resizer input source, set it to true, default is fasle.
    roiEffective when roi_infer is true, Set the roi region required for resizer model inference to be split by semicolon.
    frame_countSet perf the number of frames to run, takes effect when perf_time is 0, default value 200.
    dump_intermediatedump model each layer of input data and output data, default value 0.
    perf_timeSet perf runtime in minutes, default value 0.
    thread_numSet the number of threads to run, range [1, 8], default 1, if set to more than 8, it will be treated as 8 threads.
    profile_pathStatistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption.

    Multi-input Model Description

    The tool infer supports inference for multiple input models, supporting image input, binary file input, and text file input, with input data separated by commas. The model input information can be viewed via model_info.

    Example:

    hrt_model_exec infer --model_file=xxx.hbm --input_file=xxx.jpg,input.txt

    Dynamic input instructions

    If the model input is dynamic, you need to use the input_valid_shape and input_stride parameters to complete the dynamic information. You can choose to specify the parameters in the following two ways:

    • Only validShape or stride information is given for dynamic inputs.
    • validShape or stride information is given for all inputs, and the information for non-dynamic inputs must be consistent with the model information.

    Taking the model in the Dynamic Input Introduction section as an example, you can run the model with the following command:

    # Only given dynamically input information hrt_model_exec infer --model_file=xxx.hbm --input_file="input_y.bin,input_uv.bin,input_roi.bin" --input_valid_shape="1,220,220,1;1,110,110,2" --input_stride="49280,224,1,1;24640,224,2,1" # Given all the input information hrt_model_exec infer --model_file=xxx.hbm --input_file="input_y.bin,input_uv.bin,input_roi.bin" --input_valid_shape="1,220,220,1;1,110,110,2;1,4" --input_stride="49280,224,1,1;24640,224,2,1;16,4"

    Image type input instructions

    When input_file is given an image input, you need to use the input_img_properties parameter to specify which color space of the image you want to use as input to the model. Currently, only Y and UV color spaces are supported.

    hrt_model_exec infer --model_file=xxx.hbm --input_file="img.jpg,img.jpg,input_roi.bin" --input_img_properties="Y,UV"

    Tool operation instructions

    Build

    There is a pre-configured compilation script build.sh in the ucp_tutorial/tools/hrt_model_exec directory. The options -a x86 and -a aarch64 support two compilation modes respectively. You can use this script and specify the compilation options for compilation. In addition, the directory also contains two compilation scripts, build_aarch64.sh and build_x86.sh, which correspond to two compilation options respectively. Compiling with these two scripts is equivalent to using the build.sh script and specifying the compilation options.

    # Build board-side hrt_model_exec tools bash -ex build_aarch64.sh # Build x86-side hrt_model_exec tools bash -ex build_x86.sh

    Execute

    After building board-side hrt_model_exec tools, the output_shared_J6_aarch64 folder will be generated. You can use this tool by copying the folder to the board environment and executing output_shared_J6_aarch64/script/run_hrt_model_exec.sh.

    After building x86-side hrt_model_exec tools, the output_shared_J6_x86 folder will be generated. You can use this tool on the x86 environment and executing output_shared_J6_x86/script_x86/run_hrt_model_exec.sh.

    The run_hrt_model_exec.sh script is divided into two parts: setting environment variables and getting model information and inferring the model.

    # Set environment variables # arch represents the architecture type, aarch64 or x86 arch=aarch64 bin=../$arch/bin/hrt_model_exec lib=../$arch/lib/ export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH} # Get model information, infer the model and get model performance ${bin} model_info --model_file=xxx.hbm ${bin} infer --model_file=xxx.hbm --input_file=xxx.bin ${bin} perf --model_file=xxx.hbm --frame_count=200
    Note

    Before running, you need to modify the corresponding parameters of run_hrt_model_exec.sh to ensure that the model and input files are correct. You can also use other parameters flexibly to use more functions.

    FAQ

    How are Latency and FPS data calculated?

    Latency refers to the average time spent by a single-process inference model. It focuses on the average time it takes to infer one frame when resources are sufficient. This is reflected in the statistics of single-core and single-thread running on the board. The pseudo code of the statistical method is as follows:

    // Load model and prepare input and output tensor ... // Loop run inference and get latency { int32_t const loop_num{1000}; start = std::chrono::steady_clock::now(); for(int32_t i = 0; i < loop_num; i++){ hbUCPSchedParam sched_param{}; HB_UCP_INITIALIZE_SCHED_PARAM(&sched_param); // create task hbDNNInferV2(&task_handle, output_tensor, input_tensor, dnn_handle); // submit task hbUCPSubmitTask(task_handle, &sched_param); // wait task done hbUCPWaitTaskDone(task_handle, 0); // release task handle hbUCPReleaseTask(task_handle); task_handle = nullptr; } end = std::chrono::steady_clock::now(); latency = (end - start) / loop_num; } // release tensor and model ...

    FPS refers to the average number of frames per second of model inference performed by multiple processes at the same time, which focuses on the throughput of the model when fully utilizing the resources.

    In on-board running situation, it is represented as multi-core and multi-threaded. The statistical method is to perform the model inference by initiating multiple threads at the same time and calculate the total number of frames of the inference in average 1 second.

    Why is the FPS estimated by Latency inconsistent with the FPS measured by the tool?

    Latency and FPS are different in statistical scenarios. Latency is single-process (single-core, single-thread) inference, and FPS is multi-process (dual-core, multi-thread) inference, so the calculation is different. If the number of processes (threads) is set to 1 when counting the FPS, then the FPS estimated by Latency is consistent with the measured one.