dgrachev
/

a2_pretrained

Model card Files Files and versions

A2 Pretrained Policy

Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.

Model Description

This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.

Files

sl_checkpoint_199.pth: Trained policy weights (ViLGP3D fusion network)
checkpoint-rs.tar: GraspNet checkpoint for grasp candidate generation

Usage

With lerobot_policy_a2

from lerobot_policy_a2 import A2Policy

# Load pretrained model
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")

# Use for grasp prediction
action, info = policy.predict_grasp(
    color_images={"front": rgb_image},
    depth_images={"front": depth_image},
    point_cloud=point_cloud,
    lang_goal="grasp a round object"
)

With LeRobot A2 Environment

# Data collection
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_collect     --policy a2     --hf_repo dgrachev/a2_pretrained     --task grasp     --num_episodes 100

# Benchmark evaluation
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_benchmark     --task grasp     --policy a2     --hf_repo dgrachev/a2_pretrained

Training Details

Architecture: ViLGP3D with CLIP ViT-B/32 backbone
Hidden dim: 768
Attention heads: 8
Position encoding: Rotary Position Encoding (RoPE)
Training data: Tabletop manipulation demonstrations

Related Resources

lerobot_policy_a2 - Policy package
lerobot_grach0v - LeRobot fork with A2 environment
a2_assets - Environment assets

Citation

@misc{a2_policy,
  author = {Denis Grachev},
  title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/dgrachev/a2_pretrained}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

loading