Efficient Reasoning via Decoupled Reward Policy Optimization
Gang Li
ganglii
AI & ML interests
None yet
Recent Activity
updated
a dataset
1 day ago
ganglii/pku-saferlhf-dpo
published
a dataset
1 day ago
ganglii/pku-saferlhf-dpo
updated
a dataset
1 day ago
ganglii/pku-saferlhf-sft
Organizations
None yet