This collection hosts MRPO series introduced in paper, Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning
Data Mining and Information Systems Lab
dmis-lab
AI & ML interests
None yet
Recent Activity
upvoted a paper about 14 hours ago
Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning published a model about 18 hours ago
dmis-lab/Qwen3-VL-8B-Instruct-MRPO updated a model about 18 hours ago
dmis-lab/Qwen3-VL-8B-Instruct-MRPO