Post
1412
This summer TRL leveled up for multimodal alignment π
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment