MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale Paper โข 2604.04771 โข Published 15 days ago โข 120
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper โข 2603.22458 โข Published 28 days ago โข 135
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper โข 2509.22186 โข Published Sep 26, 2025 โข 160
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper โข 2507.17512 โข Published Jul 23, 2025 โข 37
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper โข 2507.11097 โข Published Jul 15, 2025 โข 64
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper โข 2507.04009 โข Published Jul 5, 2025 โข 54
Running Agents 46 OCRBenchv2 Leaderboard ๐ 46 Display OCRBench leaderboard for text recognition models
view article Article Generating Human-level Text with Contrastive Search in Transformers ๐ค Nov 8, 2022 โข 17