MODEL ZOO

Object Detection

Model	GPUs	Train time(h)	Test time (ms)	AP	Download
ctdet_coco_hg	5	109	71 / 129 / 674	40.3 / 42.2 / 45.1	model
ctdet_coco_dla_1x	8	57	19 / 36 / 248	36.3 / 38.2 / 40.7	model
ctdet_coco_dla_2x	8	92	19 / 36 / 248	37.4 / 39.2 / 41.7	model
ctdet_coco_resdcn101	8	65	22 / 40 / 259	34.6 / 36.2 / 39.3	model
ctdet_coco_resdcn18	4	28	7 / 14 / 81	28.1 / 30.0 / 33.2	model
exdet_coco_hg	5	215	134 / 246/1340	35.8 / 39.8 / 42.4	model
exdet_coco_dla	8	133	51 / 90 / 481	33.0 / 36.5 / 38.5	model

All models are trained on COCO train 2017 and evaluated on val 2017.
We show test time and AP with no augmentation / flip augmentation / multi scale (0.5, 0.75, 1, 1.25, 1.5) augmentation.
Results on COCO test-dev can be found in the paper or add --trainval for test.py.
exdet is our re-implementation of ExtremeNet. The testing does not include edge aggregation.
For dla and resnets, 1x means the training schedule that train 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch (following SimpleBaseline). 2x means train 230 epochs with learning rate dropped 10 times at the 180 and 210 epoch. The training schedules are not carefully investigated.
The hourglass trained schedule follows ExtremeNet: trains 50 epochs (approximately 250000 iterations in batch size 24) and drops learning rate at the 40 epoch.
Testing time include network forwarding time, decoding time, and nms time (for ExtremeNet).
We observed up to 0.4 AP performance jitter due to randomness in training.

Model	GPUs	Train time (h)	Test time (ms)	mAP	Download
ctdet_pascal_dla_384	1	15	20	79.3	model
ctdet_pascal_dla_512	2	15	30	80.7	model
ctdet_pascal_resdcn18_384	1	3	7	72.6	model
ctdet_pascal_resdcn18_512	1	5	10	75.7	model
ctdet_pascal_resdcn101_384	2	7	22	77.1	model
ctdet_pascal_resdcn101_512	4	7	33	78.7	model

All models are trained on trainval 07+12 and tested on test 2007.
Flip test is used by default.
Training schedule: train for 70 epochs with learning rate dropped 10 times at the 45 and 60 epoch.
We observed up to 1 mAP performance jitter due to randomness in training.

Model	GPUs	Train time(h)	Test time (ms)	AP	Download
multi_pose_hg_1x	5	62	151	58.7	model
multi_pose_hg_3x	5	188	151	64.0	model
multi_pose_dla_1x	8	30	44	54.7	model
multi_pose_dla_3x	8	70	44	58.9	model

All models are trained on keypoint train 2017 images which contains at least one human with keypoint annotations (64115 images).
The evaluation is done on COCO keypoint val 2017 (5000 images).
Flip test is used by default.
The models are fine-tuned from the corresponding center point detection models.
Dla training schedule: 1x: train for 140 epochs with learning rate dropped 10 times at the 90 and 120 epoch.3x: train for 320 epochs with learning rate dropped 10 times at the 270 and 300 epoch.
Hourglass training schedule: 1x: train for 50 epochs with learning rate dropped 10 times at the 40 epoch.3x: train for 150 epochs with learning rate dropped 10 times at the 130 epoch.

The 3dop split is from 3DOP and the suborn split is from SubCNN.
No augmentation is used in testing.
The models are trained for 70 epochs with learning rate dropped at the 45 and 60 epoch.

Model	GPUs	Train time	Test time	AP-E	AP-M	AP-H	AOS-E	AOS-M	AOS-H	BEV-E	BEV-M	BEV-H	Download
ddd_3dop	2	7h	31ms	96.9	87.8	79.2	93.9	84.3	75.7	34.0	30.5	26.8	model

Model	GPUs	Train time	Test time	AP-E	AP-M	AP-H	AOS-E	AOS-M	AOS-H	BEV-E	BEV-M	BEV-H	Download
ddd_sub	2	7h	31ms	89.6	79.8	70.3	85.7	75.2	65.9	34.9	27.7	26.4	model