Modelzoo

Classification

networkfloatqatquantizationdatasetinput shapebpu latency (ms)
MobileNetV174.1273.9273.61ImageNet1x3x224x2240.77
MobileNetV272.6572.5172.11ImageNet1x3x224x2240.69
ResNet 1872.0472.0372.03ImageNet1x3x224x2241.53
ResNet 5077.3776.9976.94ImageNet1x3x224x2243.06
VargNetV273.9473.5673.64ImageNet1x3x224x2240.78
EfficientNet-B074.3174.2374.18ImageNet1x3x224x2240.91
SwinTransformer80.2480.1580.05ImageNet1x3x224x22414.50
MixVarGENet71.3371.2371.04ImageNet1x3x224x2240.56
VargConvert78.9878.9278.89ImageNet1x3x224x2241.51
EfficieNasNetm80.2479.9979.94ImageNet1x3x280x2802.06
EfficieNasNets76.6376.2376.03ImageNet1x3x300x3004.79
ViT-small79.5079.40-ImageNet1x3x224x224

Torchvision(Floating point models from the community):

networkfloatqatquantizationdatasetinput shapebpu latency (ms)
ResNet 1869.7669.7169.73ImageNet1x3x224x2241.59
ResNet 5076.1376.0776.06ImageNet1x3x224x2243.18
MobileNetV271.8871.2771.27ImageNet1x3x224x2240.85

Detection

FCOS

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
FCOS-efficientnetefficientnetb036.2635.7935.59MS COCO1x3x512x5121.35
FCOS-efficientnetefficientnetb141.3741.2140.71MS COCO1x3x640x6402.78
FCOS-efficientnetefficientnetb245.3545.1045.00MS COCO1x3x768x7684.32
FCOS-efficientnetefficientnetb348.0347.6547.58MS COCO1x3x896x8967.14

DETR

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
DETR-resnet50resnet5035.7031.4231.31MS COCO1x3x800x133341.36
DETR-efficientnetb3efficientnetb337.2135.9535.99MS COCO1x3x800x133332.33

FCOS3D

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
FCOS3D-efficientnetb0efficientnetb030.6030.2730.31nuscenes1x3x512x8963.51

Segmentation

UNet

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
UNetMobileNetV168.0267.5667.53Cityscapes1x3x1024x20482.10

Deeplab

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
DeeplabEfficientNet-M076.3076.2276.12Cityscapes1x3x1024x20484.78
DeeplabEfficientNet-M177.9477.6477.65Cityscapes1x3x1024x204811.23
DeeplabEfficientNet-M278.8278.6578.63Cityscapes1x3x1024x204817.37

FastScnn

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
FastScnnEfficientNet-B0lite69.9769.9069.88Cityscapes1x3x1024x20481.92

OpticalFlow

PwcNet

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
PwcNet-lgPwcNet1.41171.41121.4075FlyingChairs1x6x384x51212.62

Lidar

PointPillars

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
PointPillarsSequentialBottleNeck77.3176.8676.76KITTI3D150000x40.82

CenterPoint

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
CenterPointSequentialBottleNeck58.3258.1158.14nuscenes1x5x20x40000, 40000x423.38

LidarMultiTask

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
LidarMultiTaskMixVarGENet58.0957.7257.62nuscenes1x5x20x40000, 40000x422.07
Note

PointPillars 的指标是 Box3d Moderate 这项。

Lane Detection

GaNet

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
GaNetMixVarGENet79.4978.7278.72CuLane1x3x320x8001.053

Multiple Object Track

Motr

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
Motrefficientnetb358.0257.6257.76Mot171x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x12826.40

Binocular depth estimation

StereoNet

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
StereoNetStereoNeck1.12701.16771.1685SceneFlow1x6x540x96028.33
StereoNetPlusMixVarGENet1.12701.13291.1351SceneFlow2x3x544x9606.50

Bev

Bev

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
BevIPMefficientnetb030.5930.8030.41nuscenes det6x3x512x960, 6x128x128x29.76
BevIPMefficientnetb051.4751.4150.98nuscenes seg6x3x512x960, 6x128x128x29.76
BevLSSefficientnetb030.0930.0530.01nuscenes det6x3x256x704, 10x128x128x2, 10x128x128x27.35
BevLSSefficientnetb051.7851.4751.46nuscenes seg6x3x256x704, 10x128x128x2, 10x128x128x27.35
BevGKTMixVarGENet28.1128.1227.90nuscenes det6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x223.07
BevGKTMixVarGENet48.5348.0248.37nuscenes seg6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x223.07
BevIPM4Defficientnetb037.2137.1937.17nuscenes det6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x210.05
BevIPM4Defficientnetb052.9053.8053.77nuscenes seg6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x210.05
Detr3defficientnetb334.0433.8733.39nuscenes det6x3x512x140868.54
PETRefficientnetb337.6037.3237.31nuscenes det6x3x512x1408228.50
Bevformer-tinyresnet5037.0036.66-nuscenes det6x3x480x800/
BevCFTefficientnetb332.9332.68-nuscenes det6x3x512x1408/

Keypoint Detection

HeatmapKeypointModel

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
HeatmapKeypointModelefficientnetb094.3394.3094.31carfusion1x3x128x1280.83

Trajectory Prediction

DenseTNT

networkbackbonefloatqatquantizationdatasetinput shapebpu latency (ms)
DenseTNTvectornet1.29741.29891.3038argoverse 130x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x204826.45

QCNet

networkbackbonefloatqatquantizationdatasetinput shape
qcnet-83.84--argoverse 2see the list below

The input shape of the QCNet model is:

1x30x50, 1x50x30x30, 1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x80, 1x1x30x80, 1x1x30x80, 1x1x30x10, 1x1x30x10, 1x1x30x10, 1x1x30x10, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x30x29x128, 1x30x10x128, 1x30x10x128, 1x80, 1x80, 1x1x80x80, 1x1x80x80, 1x1x80x80, 1x1x80x50, 1x1x80x50, 1x1x80x50, 1x80x50, 1x80x50, 1x80x50, 1x80x50, 1x30x30, 1x30x1, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x1x30x80, 1x1x30x80, 1x1x30x80, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x80x80.

Online Map Construction

MapTR

networkbackbonefloatqatquantizationdatasetinput shape
MapTRv2resnet500.58590.58430.5763nuscenes6x3x480x800