TrackMAE Checkpoints

Pretrained and fine-tuned video checkpoints from TrackMAE: Video Representation Learning via Track Mask and Predict. The weights are PyTorch state dictionaries compatible with the official TrackMAE codebase.

Pretrained Checkpoints

File Backbone Dataset Epochs Spatial Target
pretrain/vit_b/videomae_cotracker_base_bs_512_mask_tube_800_lambda_0.25_up_14_clip.pth ViT-B Kinetics-400 800 CLIP
pretrain/vit_l/videomae_cotracker_large_bs_1024_mask_tube_800_lambda_0.25_up_14_clip_k700.pth ViT-L Kinetics-700 800 CLIP

Fine-Tuned Checkpoints

File Backbone Pretraining Fine-tuning
finetune/vit_b/k400_finetuned_vit_b_with_k400_pretraining_trackmae.pth ViT-B Kinetics-400 Kinetics-400
finetune/vit_b/ssv2_finetuned_vit_b_with_k400_pretraining_trackmae.pth ViT-B Kinetics-400 Something-Something V2
finetune/vit_l/ssv2_finetuned_vit_l_with_k700_pretraining_trackmae.pth ViT-L Kinetics-700 Something-Something V2
TBA ViT-L Kinetics-700 Kinetics-400

Usage

Download a pretrained checkpoint:

from huggingface_hub import hf_hub_download

checkpoint = hf_hub_download(
    repo_id="rvandeghen/TrackMAE",
    filename="pretrain/vit_b/videomae_cotracker_base_bs_512_mask_tube_800_lambda_0.25_up_14_clip.pth",
)

Citation

@inproceedings{vandeghen2026trackmae,
  title     = {TrackMAE: Video Representation Learning via Track Mask and Predict},
  author    = {Vandeghen, Renaud and Thoker, Fida Mohammad and Van Droogenbroeck, Marc and Ghanem, Bernard},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for rvandeghen/TrackMAE