Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
View all activity
spaces 6
Running
Answer Convergence Early Stopping
π
Demo for EMNLP Paper "Answer Convergence as a Signal..."
Sleeping
FactRBench
π
View and analyze long-form factuality leaderboard
Running
3
ExpertLongBench
π
Leaderboard for ExpertLongBench
Sleeping
1
ManyICLBench
π
Leaderboard for ManyICLBench
Sleeping
MLRC-BENCH
π
Display model performance rankings
Sleeping
3
Factbench
π
View and compare language model factuality scores
datasets 12
launch/ExpertLongBench
Preview
β’ Updated
β’ 495 β’ 10
launch/thinkprm-1K-verification-cots
Viewer
β’ Updated
β’ 1k β’ 29 β’ 6
launch/ManyICLBench
Viewer
β’ Updated
β’ 66 β’ 389 β’ 1
launch/CMV
Viewer
β’ Updated
β’ 133 β’ 56
launch/FactRBench
Viewer
β’ Updated
β’ 1.06k β’ 71 β’ 1
launch/FactBench
Viewer
β’ Updated
β’ 1k β’ 102 β’ 3
launch/CLASH
Viewer
β’ Updated
β’ 345 β’ 33 β’ 4
launch/gov_report
Viewer
β’ Updated
β’ 58.4k β’ 306 β’ 10
launch/gov_report_qs
Viewer
β’ Updated
β’ 7.87k β’ 58 β’ 4
launch/open_question_type
Viewer
β’ Updated
β’ 4.96k β’ 30 β’ 6