yenson-lau 's Collections Papers
updated
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published
• 133
Paper
• 2506.10910
• Published
• 66
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path
Lengths in LLMs
Paper
• 2506.07240
• Published
• 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
• 2506.09991
• Published
• 55
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.06941
• Published
• 16
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published
• 19
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper
• 2506.11763
• Published
• 74
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
• 2506.14245
• Published
• 45
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published
• 30
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
• 2506.24119
• Published
• 51
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
• 2507.14241
• Published
• 18
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published
• 60
Paper
• 2505.09388
• Published
• 338
Replacing thinking with tool usage enables reasoning in small language
models
Paper
• 2507.05065
• Published
• 16
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
• 2507.13158
• Published
• 24
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published
• 29
MCP-Universe: Benchmarking Large Language Models with Real-World Model
Context Protocol Servers
Paper
• 2508.14704
• Published
• 43
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
AgentScope 1.0: A Developer-Centric Framework for Building Agentic
Applications
Paper
• 2508.16279
• Published
• 57
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
• 2508.16072
• Published
• 4
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
• 2508.18076
• Published
• 6
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the
effect of Epistemic Markers on LLM-based Evaluation
Paper
• 2410.20774
• Published
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published
• 11