sbhokare/Qwen2.5-7B-Instruct-ToolRL-PPO-Cold-Equal-Max Reinforcement Learning • 8B • Updated 4 days ago • 14 • 1
HamadaMayu/qwen3-4b-agent-trajectory-lora-marged-dbbench_v4 Text Generation • 4B • Updated 3 days ago • 1
HamadaMayu/qwen3-4b-agent-trajectory-lora-marged-alfworld_v5 Text Generation • 4B • Updated 3 days ago • 1
HamadaMayu/qwen3-4b-agent-trajectory-lora-marged-alfworld_v4 Text Generation • 4B • Updated 3 days ago • 1
HamadaMayu/qwen2.5-7b-agent-trajectory-lora-marged-dbbench_v4 Text Generation • 8B • Updated 3 days ago • 1
HamadaMayu/qwen2.5-7b-agent-trajectory-lora-marged-alfworld_v5 Text Generation • 8B • Updated 3 days ago • 1
HamadaMayu/qwen2.5-7b-agent-trajectory-lora-marged-alfworld_v4 Text Generation • 8B • Updated 2 days ago • 1
HamadaMayu/qwen3-4b-agent-trajectory-lora-marged-mixed_db_alf_1to1 Text Generation • 4B • Updated 2 days ago • 1