view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** 7 days ago β’ 17
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper β’ 2602.02361 β’ Published 24 days ago β’ 60
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs Paper β’ 2601.17058 β’ Published Jan 22 β’ 188
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper β’ 2601.14490 β’ Published Jan 20 β’ 37
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 β’ 84
HAI-DEF Concept Apps Collection Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def` β’ 7 items β’ Updated Dec 26, 2025 β’ 49
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. β’ 9 items β’ Updated Jan 14 β’ 441
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 β’ 128
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 β’ 58
view article Article System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience Jun 2, 2025 β’ 24
Prompt-MII: Meta-Learning Instruction Induction for LLMs Paper β’ 2510.16932 β’ Published Oct 19, 2025 β’ 7