view article Article cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents 19 days ago • 10
view article Article AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems 12 days ago • 38
GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning Paper • 2508.15690 • Published Aug 21, 2025 • 8