Argus
Five perception tasks from a single 86-million-parameter backbone. Model card
examples
example pairs
| source image | target image |
|---|
Argus runs five perception tasks from a single frozen EUPE-ViT-B encoder.
- Classification — ImageNet-1k via kNN or trained linear softmax, 1000 classes.
- Segmentation — ADE20K, 150 indoor and urban classes, linear head.
- Depth — NYU Depth V2, metric depth in meters, linear head.
- Detection — COCO 2017, 80 classes, FCOS head with simple feature pyramid. 41.0 mAP.
- Correspondence — training-free dense feature matching between two images.
The backbone weights are released by Meta FAIR (arXiv:2603.22387) under the FAIR Research License. The task heads, class prototypes, and packaging were trained and assembled by phanerozoic.