橦言无忌

一个不想改变世界的程序媛

jetson性能对比

前言

yolo, mask rcnn和其他图像分类算法在常用jetson硬件上的表现,仅供参考哦~

工程上选模型可做参考~

1,yolo系列

network device activation precision batch DLA framework time
yolov3 Xavier leaky fp16 1 no TRT5.1.6 24ms
- Xavier leaky fp16 1 no TRT7.1.0 18ms
- NX leaky fp16 1 no TRT7.1.0 30ms
- TX2 leaky fp16 1 no TRT5.1.6 99ms
- Xavier leaky fp16 4 no TRT5.1.6 90ms(22.5ms each)
- Xavier leaky fp16 4 no TRT7.1.0 58ms(14.5ms each)
- TX2 leaky fp16 32 no TRT5.1.6 2930ms(91.5ms each)
- Xavier leaky fp16 32 no TRT7.1.0 440ms(13.75ms each)
- NX leaky fp16 4 no TRT7.1.0 104ms(26ms each)
- Xavier leaky int8 1 no TRT5.1.6 20ms
- Xavier leaky int8 1 no TRT7.1.0 12.5ms
- NX leaky int8 1 no TRT7.1.0 20ms
- Xavier leaky int8 4 no TRT5.1.6 66ms(16.5ms each)
- Xavier leaky int8 4 no TRT7.1.0 36ms(9ms each)
- Xavier leaky int8 32 no TRT7.1.0 256ms(8ms each)
- NX leaky int8 4 no TRT7.1.0 64ms(16ms each)
- Xavier relu fp16 4 no TRT5.1.6 52ms(13ms each)
- Xavier relu int8 1 no TRT5.1.6 10ms
- NX relu int8 1 no TRT7.1.0 17ms
- Xavier relu int8 4 no TRT5.1.6 30ms(7.5ms each)
- NX relu int8 4 no TRT7.1.0 58ms(14.5ms each)
- Xavier relu int8 4 no TRT5.1.6 54ms(13.5ms each)
- 1050ti relu int8 4 no TRT5.1.6 45ms(11.25ms each)
yolov3-tiny Xavier leaky fp16 1 no TRT5.1.6 5ms
yolo-resnet Xavier leaky fp16 1 no TRT5.1.6 14ms
- Xavier leaky fp16 4 no TRT5.1.6 44ms(11ms each)
- Xavier leaky int8 1 no TRT5.1.6 12ms
- Xavier leaky int8 4 no TRT5.1.6 39ms(10ms each)
- Xavier relu fp16 4 no TRT5.1.6 30ms(7.5ms each)
- Xavier relu fp16 4 yes TRT5.1.6 68ms(17ms each)
- Xavier relu int8 4 no TRT5.1.6 22ms(5.5ms each)
- 1050ti relu int8 4 no TRT5.1.6 24ms(6ms each)

2,mask rcnn模型

device input shape precision batch framework pure enqueue time
1050ti 1024x1024 fp32 1 TRT7.0 364ms
- - int8 - - 140ms
Xavier - fp16 - TRT7.1 136ms
- - int8 - - 103ms
NX - fp32 - - 871ms
- - fp16 - - 239ms
- - int8 - - 165ms

3,常用图像分类模型

network device precision batch DLA framework pure enqueue time
google net apex int8 1 no TRT5.1.6 1.5ms
- - - 4 - - 3.5ms(avg 0.9ms)
- - - 8 - - 5.5ms(avg 0.7ms)
- - - 32 - - 17.5ms(avg 0.55ms)
- - - 128 - - 64ms(avg 0.5ms)
- NX half 1 - TRT7.1 3ms
- - int8 - - - 2ms
- tx2 half 1 no TRT5.1.6 5.2ms
- - - 32 - - 118ms(avg 3.7ms)
- 1050ti float32 1 - - 3.5ms
- - - 4 - - 11ms(avg 2.75ms)
- - - 8 - - 16ms(avg 2ms)
- - - 32 - - 61ms(avg 1.9ms)
- - - 128 - - 236ms(avg 1.84ms)
- - int8 1 - - 1.5ms
- - - 4 - - 4.5ms(avg 1.1ms)
- - - 8 - - 6ms(avg 0.75ms)
- - - 32 - - 24ms(avg 0.75ms)
- - - 128 - - 90ms(avg 0.7ms)
resnet50 apex int8 1 no TRT5.1.6 2.2ms
- - - 4 - - 4.3ms(avg 1.1ms)
- - - 8 - - 7.5ms(avg 0.9ms)
- - - 32 - - 25ms(avg 0.8ms)
- - - 128 - - 94.5ms(avg 0.74ms)
- NX half 1 - TRT7.1 6ms
- - - 32 - - 103ms(avg 3.2ms)
- - int8 1 - - 3.8ms
- - - 32 - - 64ms(avg 2ms)
- tx2 half 1 - TRT5.1.6 13ms
- - - 32 - - 320ms(avg 10ms)
- 1050ti float32 1 no TRT5.1.6 8ms
- - - 4 - - 23ms(avg 5.75ms)
- - - 8 - - 38ms(avg 4.75ms)
- - - 32 - - 133ms(avg 4ms)
- - - 128 - - 510ms(avg 4ms)
- - int8 1 - - 3ms
- - - 4 - - 8ms(avg 2ms)
- - - 8 - - 14ms(avg 1.75ms)
- - - 32 - - 44ms(avg 1.4ms)
- - - 128 - - 167ms(avg 1.3ms)
// 代码折叠