Papers - a yenson-lau Collection

yenson-lau 's Collections

Papers

updated Aug 31, 2025

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 133
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 66
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8, 2025 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.06941 • Published Jun 7, 2025 • 16
s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published May 20, 2025 • 19
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13, 2025 • 74
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17, 2025 • 45
Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17, 2025 • 30
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 51
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Paper • 2507.14241 • Published Jul 17, 2025 • 18
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20, 2025 • 60
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 338
Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7, 2025 • 16
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17, 2025 • 24
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14, 2025 • 29
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20, 2025 • 43
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22, 2025 • 57
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22, 2025 • 4
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published Aug 25, 2025 • 6
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation

Paper • 2410.20774 • Published Oct 28, 2024
Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28, 2025 • 11