| network | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|
| MobileNetV1 | 74.12 | 73.92 | 73.61 | ImageNet | 1x3x224x224 | 0.77 |
| MobileNetV2 | 72.65 | 72.51 | 72.11 | ImageNet | 1x3x224x224 | 0.69 |
| ResNet 18 | 72.04 | 72.03 | 72.03 | ImageNet | 1x3x224x224 | 1.53 |
| ResNet 50 | 77.37 | 76.99 | 76.94 | ImageNet | 1x3x224x224 | 3.06 |
| VargNetV2 | 73.94 | 73.56 | 73.64 | ImageNet | 1x3x224x224 | 0.78 |
| EfficientNet-B0 | 74.31 | 74.23 | 74.18 | ImageNet | 1x3x224x224 | 0.91 |
| SwinTransformer | 80.24 | 80.15 | 80.05 | ImageNet | 1x3x224x224 | 14.50 |
| MixVarGENet | 71.33 | 71.23 | 71.04 | ImageNet | 1x3x224x224 | 0.56 |
| VargConvert | 78.98 | 78.92 | 78.89 | ImageNet | 1x3x224x224 | 1.51 |
| EfficieNasNetm | 80.24 | 79.99 | 79.94 | ImageNet | 1x3x280x280 | 2.06 |
| EfficieNasNets | 76.63 | 76.23 | 76.03 | ImageNet | 1x3x300x300 | 4.79 |
| ViT-small | 79.50 | 79.40 | - | ImageNet | 1x3x224x224 |
Torchvision(Floating point models from the community):
| network | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|
| ResNet 18 | 69.76 | 69.71 | 69.73 | ImageNet | 1x3x224x224 | 1.59 |
| ResNet 50 | 76.13 | 76.07 | 76.06 | ImageNet | 1x3x224x224 | 3.18 |
| MobileNetV2 | 71.88 | 71.27 | 71.27 | ImageNet | 1x3x224x224 | 0.85 |
FCOS
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| FCOS-efficientnet | efficientnetb0 | 36.26 | 35.79 | 35.59 | MS COCO | 1x3x512x512 | 1.35 |
| FCOS-efficientnet | efficientnetb1 | 41.37 | 41.21 | 40.71 | MS COCO | 1x3x640x640 | 2.78 |
| FCOS-efficientnet | efficientnetb2 | 45.35 | 45.10 | 45.00 | MS COCO | 1x3x768x768 | 4.32 |
| FCOS-efficientnet | efficientnetb3 | 48.03 | 47.65 | 47.58 | MS COCO | 1x3x896x896 | 7.14 |
DETR
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| DETR-resnet50 | resnet50 | 35.70 | 31.42 | 31.31 | MS COCO | 1x3x800x1333 | 41.36 |
| DETR-efficientnetb3 | efficientnetb3 | 37.21 | 35.95 | 35.99 | MS COCO | 1x3x800x1333 | 32.33 |
FCOS3D
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| FCOS3D-efficientnetb0 | efficientnetb0 | 30.60 | 30.27 | 30.31 | nuscenes | 1x3x512x896 | 3.51 |
UNet
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| UNet | MobileNetV1 | 68.02 | 67.56 | 67.53 | Cityscapes | 1x3x1024x2048 | 2.10 |
Deeplab
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| Deeplab | EfficientNet-M0 | 76.30 | 76.22 | 76.12 | Cityscapes | 1x3x1024x2048 | 4.78 |
| Deeplab | EfficientNet-M1 | 77.94 | 77.64 | 77.65 | Cityscapes | 1x3x1024x2048 | 11.23 |
| Deeplab | EfficientNet-M2 | 78.82 | 78.65 | 78.63 | Cityscapes | 1x3x1024x2048 | 17.37 |
FastScnn
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| FastScnn | EfficientNet-B0lite | 69.97 | 69.90 | 69.88 | Cityscapes | 1x3x1024x2048 | 1.92 |
PwcNet
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| PwcNet-lg | PwcNet | 1.4117 | 1.4112 | 1.4075 | FlyingChairs | 1x6x384x512 | 12.62 |
PointPillars
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| PointPillars | SequentialBottleNeck | 77.31 | 76.86 | 76.76 | KITTI3D | 150000x4 | 0.82 |
CenterPoint
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| CenterPoint | SequentialBottleNeck | 58.32 | 58.11 | 58.14 | nuscenes | 1x5x20x40000, 40000x4 | 23.38 |
LidarMultiTask
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| LidarMultiTask | MixVarGENet | 58.09 | 57.72 | 57.62 | nuscenes | 1x5x20x40000, 40000x4 | 22.07 |
PointPillars 的指标是 Box3d Moderate 这项。
GaNet
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| GaNet | MixVarGENet | 79.49 | 78.72 | 78.72 | CuLane | 1x3x320x800 | 1.053 |
Motr
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| Motr | efficientnetb3 | 58.02 | 57.62 | 57.76 | Mot17 | 1x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x128 | 26.40 |
StereoNet
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| StereoNet | StereoNeck | 1.1270 | 1.1677 | 1.1685 | SceneFlow | 1x6x540x960 | 28.33 |
| StereoNetPlus | MixVarGENet | 1.1270 | 1.1329 | 1.1351 | SceneFlow | 2x3x544x960 | 6.50 |
Bev
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| BevIPM | efficientnetb0 | 30.59 | 30.80 | 30.41 | nuscenes det | 6x3x512x960, 6x128x128x2 | 9.76 |
| BevIPM | efficientnetb0 | 51.47 | 51.41 | 50.98 | nuscenes seg | 6x3x512x960, 6x128x128x2 | 9.76 |
| BevLSS | efficientnetb0 | 30.09 | 30.05 | 30.01 | nuscenes det | 6x3x256x704, 10x128x128x2, 10x128x128x2 | 7.35 |
| BevLSS | efficientnetb0 | 51.78 | 51.47 | 51.46 | nuscenes seg | 6x3x256x704, 10x128x128x2, 10x128x128x2 | 7.35 |
| BevGKT | MixVarGENet | 28.11 | 28.12 | 27.90 | nuscenes det | 6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 | 23.07 |
| BevGKT | MixVarGENet | 48.53 | 48.02 | 48.37 | nuscenes seg | 6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 | 23.07 |
| BevIPM4D | efficientnetb0 | 37.21 | 37.19 | 37.17 | nuscenes det | 6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 | 10.05 |
| BevIPM4D | efficientnetb0 | 52.90 | 53.80 | 53.77 | nuscenes seg | 6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 | 10.05 |
| Detr3d | efficientnetb3 | 34.04 | 33.87 | 33.39 | nuscenes det | 6x3x512x1408 | 68.54 |
| PETR | efficientnetb3 | 37.60 | 37.32 | 37.31 | nuscenes det | 6x3x512x1408 | 228.50 |
| Bevformer-tiny | resnet50 | 37.00 | 36.66 | - | nuscenes det | 6x3x480x800 | / |
| BevCFT | efficientnetb3 | 32.93 | 32.68 | - | nuscenes det | 6x3x512x1408 | / |
HeatmapKeypointModel
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| HeatmapKeypointModel | efficientnetb0 | 94.33 | 94.30 | 94.31 | carfusion | 1x3x128x128 | 0.83 |
DenseTNT
| network | backbone | float | qat | quantization | dataset | input shape | bpu latency (ms) |
|---|---|---|---|---|---|---|---|
| DenseTNT | vectornet | 1.2974 | 1.2989 | 1.3038 | argoverse 1 | 30x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x2048 | 26.45 |
QCNet
| network | backbone | float | qat | quantization | dataset | input shape |
|---|---|---|---|---|---|---|
| qcnet | - | 83.84 | - | - | argoverse 2 | see the list below |
The input shape of the QCNet model is:
1x30x50, 1x50x30x30, 1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x1, 1x1x30x80, 1x1x30x80, 1x1x30x80, 1x1x30x10, 1x1x30x10, 1x1x30x10, 1x1x30x10, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x30x29x128, 1x30x10x128, 1x30x10x128, 1x80, 1x80, 1x1x80x80, 1x1x80x80, 1x1x80x80, 1x1x80x50, 1x1x80x50, 1x1x80x50, 1x80x50, 1x80x50, 1x80x50, 1x80x50, 1x30x30, 1x30x1, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x1x30x80, 1x1x30x80, 1x1x30x80, 1x1x30x30, 1x1x30x30, 1x1x30x30, 1x80x80.
MapTR
| network | backbone | float | qat | quantization | dataset | input shape |
|---|---|---|---|---|---|---|
| MapTRv2 | resnet50 | 0.5859 | 0.5843 | 0.5763 | nuscenes | 6x3x480x800 |