arxiv:2502.04270
Yaqi Duan
duanyq
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
2 months ago
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence
Reweighting
upvoted
a
paper
12 months ago
PILAF: Optimal Human Preference Sampling for Reward Modeling
authored
a paper
12 months ago
PILAF: Optimal Human Preference Sampling for Reward Modeling
Organizations
None yet