Argus

input image

examples

segmentation

depth (metric, viridis)

detection

source image

target image

source with query points

target with predicted matches

example pairs

source image	target image

Argus runs five perception tasks from a single frozen EUPE-ViT-B encoder.

Classification — ImageNet-1k via kNN or trained linear softmax, 1000 classes.
Segmentation — ADE20K, 150 indoor and urban classes, linear head.
Depth — NYU Depth V2, metric depth in meters, linear head.
Detection — COCO 2017, 80 classes, FCOS head with simple feature pyramid. 41.0 mAP.
Correspondence — training-free dense feature matching between two images.

The backbone weights are released by Meta FAIR (arXiv:2603.22387) under the FAIR Research License. The task heads, class prototypes, and packaging were trained and assembled by phanerozoic.