Docling

Enterprise

community

https://docling.ai

docling-project

docling

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

PeterWJStaar updated a collection about 1 hour ago

Table Models

PeterWJStaar updated a collection about 1 hour ago

Table Models

PeterWJStaar updated a collection about 1 hour ago

Table Models

View all activity

PeterWJStaar

updated a collection about 1 hour ago

Table Models

Collection

5 items • Updated about 1 hour ago

Saidgurbuz

updated a model 3 days ago

docling-project/ScreenParser

Object Detection • Updated 3 days ago • 48

Saidgurbuz

updated a dataset 3 days ago

docling-project/screenparse

Viewer • Updated 3 days ago • 771k • 1.48k

Saidgurbuz

updated a model 3 days ago

docling-project/ScreenVLM

Image-Text-to-Text • 0.3B • Updated 3 days ago • 117

Saidgurbuz

published a dataset 4 days ago

docling-project/screenparse

Viewer • Updated 3 days ago • 771k • 1.48k

Saidgurbuz

published 2 models 4 days ago

docling-project/ScreenParser

Object Detection • Updated 3 days ago • 48

docling-project/ScreenVLM

Image-Text-to-Text • 0.3B • Updated 3 days ago • 117

Saidgurbuz

authored a paper about 1 month ago

Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

Paper • 2602.14276 • Published Feb 15 • 1

lucas-morin

authored 6 papers 7 months ago

MolGrapher: Graph-based Visual Recognition of Chemical Structures

Paper • 2308.12234 • Published Aug 23, 2023

Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

Paper • 2501.17887 • Published Jan 27, 2025 • 1

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 158

authored a paper 7 months ago

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 80

andito

posted an update 7 months ago

Post

2660

Finally, our new paper is out! "𝗙𝗶𝗻𝗲𝗩𝗶𝘀𝗶𝗼𝗻: 𝗢𝗽𝗲𝗻 𝗗𝗮𝘁𝗮 𝗜𝘀 𝗔𝗹𝗹 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱"! 🥳
FineVision: Open Data Is All You Need (2510.17269)

If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.
We wanted to change that.

FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.

In the paper, we share how we built it:
🔍 finding and cleaning data at scale
🧹 removing excessive duplicates across sources
🤗 decontaminating against 66 public benchmarks

My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets.
NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!

🎉 To celebrate the paper, I’m also releasing a concatenated and shuffled version of the full dataset! 👉HuggingFaceM4/FineVision_full_shuffled

It’s ready to stream, so you can start training your own models right away:

from datasets import load_dataset
d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True)
print(next(iter(d)))

A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!

AI & ML interests

Recent Activity

Team members 23

docling-project's activity