Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
Paper • 2510.10959 • Published • 2
None defined yet.
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas
SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue