multilingual-reward-bench

community

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 7 days ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

seungone authored a paper 7 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

amphora submitted a paper 8 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

View all activity

models 0

None public yet

datasets 9

multilingual-reward-bench/m-arena-sampled

Viewer • Updated Mar 25, 2025 • 128 • 17

multilingual-reward-bench/m-arena

Viewer • Updated Mar 25, 2025 • 2.16k • 11

multilingual-reward-bench/MRB-Preview-1013

Viewer • Updated Oct 13, 2024 • 5.09k • 4

multilingual-reward-bench/code-en

Viewer • Updated Oct 12, 2024 • 80 • 18

multilingual-reward-bench/code-python

Viewer • Updated Oct 12, 2024 • 1.84k • 28

multilingual-reward-bench/safetyx1_prefx05_sky_x05_small

Viewer • Updated Oct 10, 2024 • 13.4k • 12

multilingual-reward-bench/safetyx2_prefx1_sky_x1_small

Viewer • Updated Oct 10, 2024 • 26.8k • 9

multilingual-reward-bench/safetyx2_prefx1_sky_x1

Viewer • Updated Oct 10, 2024 • 40.3k • 24

multilingual-reward-bench/open-assistant-sampled-new

Viewer • Updated Oct 7, 2024 • 444 • 123