A2 Pretrained Policy
Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.
Model Description
This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.
Files
sl_checkpoint_199.pth: Trained policy weights (ViLGP3D fusion network)checkpoint-rs.tar: GraspNet checkpoint for grasp candidate generation
Usage
With lerobot_policy_a2
from lerobot_policy_a2 import A2Policy
# Load pretrained model
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")
# Use for grasp prediction
action, info = policy.predict_grasp(
color_images={"front": rgb_image},
depth_images={"front": depth_image},
point_cloud=point_cloud,
lang_goal="grasp a round object"
)
With LeRobot A2 Environment
# Data collection
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_collect --policy a2 --hf_repo dgrachev/a2_pretrained --task grasp --num_episodes 100
# Benchmark evaluation
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_benchmark --task grasp --policy a2 --hf_repo dgrachev/a2_pretrained
Training Details
- Architecture: ViLGP3D with CLIP ViT-B/32 backbone
- Hidden dim: 768
- Attention heads: 8
- Position encoding: Rotary Position Encoding (RoPE)
- Training data: Tabletop manipulation demonstrations
Related Resources
- lerobot_policy_a2 - Policy package
- lerobot_grach0v - LeRobot fork with A2 environment
- a2_assets - Environment assets
Citation
@misc{a2_policy,
author = {Denis Grachev},
title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/dgrachev/a2_pretrained}
}