-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 76 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 109 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 511 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
Jianhong Wang
hsvgbkhgbv
AI & ML interests
multi-agent reinforcement learning,
ad hoc teamwork,
robust reinforcement learning
Recent Activity
updated a collection 9 days ago
LLM papers upvoted a paper 9 days ago
Beyond Language Modeling: An Exploration of Multimodal Pretraining updated a collection 21 days ago
LLM papersOrganizations
None yet