CenterNet/readme/MODEL_ZOO.md

7.7 KiB

MODEL ZOO

Common settings and notes

  • The experiments are run with pytorch 0.4.1, CUDA 9.0, and CUDNN 7.1.
  • Training times are measured on our servers with 8 TITAN V GPUs (12 GB Memeory).
  • Testing times are measured on our local machine with TITAN Xp GPU.
  • The models can be downloaded directly from Google drive.

Object Detection

COCO

Model GPUs Train time(h) Test time (ms) AP Download
ctdet_coco_hg 5 109 71 / 129 / 674 40.3 / 42.2 / 45.1 model
ctdet_coco_dla_1x 8 57 19 / 36 / 248 36.3 / 38.2 / 40.7 model
ctdet_coco_dla_2x 8 92 19 / 36 / 248 37.4 / 39.2 / 41.7 model
ctdet_coco_resdcn101 8 65 22 / 40 / 259 34.6 / 36.2 / 39.3 model
ctdet_coco_resdcn18 4 28 7 / 14 / 81 28.1 / 30.0 / 33.2 model
exdet_coco_hg 5 215 134 / 246/1340 35.8 / 39.8 / 42.4 model
exdet_coco_dla 8 133 51 / 90 / 481 33.0 / 36.5 / 38.5 model

Notes

  • All models are trained on COCO train 2017 and evaluated on val 2017.
  • We show test time and AP with no augmentation / flip augmentation / multi scale (0.5, 0.75, 1, 1.25, 1.5) augmentation.
  • Results on COCO test-dev can be found in the paper or add --trainval for test.py.
  • exdet is our re-implementation of ExtremeNet. The testing does not include edge aggregation.
  • For dla and resnets, 1x means the training schedule that train 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch (following SimpleBaseline). 2x means train 230 epochs with learning rate dropped 10 times at the 180 and 210 epoch. The training schedules are not carefully investigated.
  • The hourglass trained schedule follows ExtremeNet: trains 50 epochs (approximately 250000 iterations in batch size 24) and drops learning rate at the 40 epoch.
  • Testing time include network forwarding time, decoding time, and nms time (for ExtremeNet).
  • We observed up to 0.4 AP performance jitter due to randomness in training.

Pascal VOC

Model GPUs Train time (h) Test time (ms) mAP Download
ctdet_pascal_dla_384 1 15 20 79.3 model
ctdet_pascal_dla_512 2 15 30 80.7 model
ctdet_pascal_resdcn18_384 1 3 7 72.6 model
ctdet_pascal_resdcn18_512 1 5 10 75.7 model
ctdet_pascal_resdcn101_384 2 7 22 77.1 model
ctdet_pascal_resdcn101_512 4 7 33 78.7 model

Notes

  • All models are trained on trainval 07+12 and tested on test 2007.
  • Flip test is used by default.
  • Training schedule: train for 70 epochs with learning rate dropped 10 times at the 45 and 60 epoch.
  • We observed up to 1 mAP performance jitter due to randomness in training.

Human pose estimation

COCO

Model GPUs Train time(h) Test time (ms) AP Download
multi_pose_hg_1x 5 62 151 58.7 model
multi_pose_hg_3x 5 188 151 64.0 model
multi_pose_dla_1x 8 30 44 54.7 model
multi_pose_dla_3x 8 70 44 58.9 model

Notes

  • All models are trained on keypoint train 2017 images which contains at least one human with keypoint annotations (64115 images).
  • The evaluation is done on COCO keypoint val 2017 (5000 images).
  • Flip test is used by default.
  • The models are fine-tuned from the corresponding center point detection models.
  • Dla training schedule: 1x: train for 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch.3x: train for 320 epochs with learning rate dropped 10 times at the 270 and 300 epoch.
  • Hourglass training schedule: 1x: train for 50 epochs with learning rate dropped 10 times at the 40 epoch.3x: train for 150 epochs with learning rate dropped 10 times at the 130 epoch.

3D bounding box detection

Notes

  • The 3dop split is from 3DOP and the suborn split is from SubCNN.
  • No augmentation is used in testing.
  • The models are trained for 70 epochs with learning rate dropped at the 45 and 60 epoch.

KITTI 3DOP split

Model GPUs Train time Test time AP-E AP-M AP-H AOS-E AOS-M AOS-H BEV-E BEV-M BEV-H Download
ddd_3dop 2 7h 31ms 96.9 87.8 79.2 93.9 84.3 75.7 34.0 30.5 26.8 model

KITTI SubCNN split

Model GPUs Train time Test time AP-E AP-M AP-H AOS-E AOS-M AOS-H BEV-E BEV-M BEV-H Download
ddd_sub 2 7h 31ms 89.6 79.8 70.3 85.7 75.2 65.9 34.9 27.7 26.4 model