LeMaterial

Team

non-profit

https://www.lematerial.org

LeMaterial

Activity Feed

AI & ML interests

AI4Science

Recent Activity

cgeorgiaw updated a Space about 12 hours ago

LeMaterial/LeMat-GenBench

cgeorgiaw published a Space about 1 month ago

LeMaterial/LeMat-GenBench

cgeorgiaw updated a Space about 1 month ago

LeMaterial/LeMat-GenBench

View all activity

cgeorgiaw

updated a Space about 12 hours ago

Lemat Bench

😻

View and submit to a materials generation benchmark leaderboard

cgeorgiaw

published a Space about 1 month ago

Lemat Bench

😻

View and submit to a materials generation benchmark leaderboard

cgeorgiaw

posted an update about 2 months ago

Post

1343

🚀🚀🚀Huge biotech data drop today🚀🚀🚀

The largest drug-target dataset ever created was just released on Hugging Face—and it's still growing...

EvE Bio is further updating the dataset every 8 weeks. Drug development dream.

Read the blog: https://huggingface.co/blog/hugging-science/eve-bio-mapping-the-pharmone-drug-interaction
Play with the data: eve-bio/drug-target-activity

thomwolf

authored a paper 3 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 119

lvwerra

authored a paper 3 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 36

cgeorgiaw

posted an update 4 months ago

Post

5986

🚀🚀🚀 The largest ever dataset of co-folded 3D protein-ligand structures just dropped on HF!!

Meet SAIR (Structurally Augmented IC₅₀ Repository): 5M+ AI-generated complexes with experimentally measured drug potency data from SandboxAQ. 🚀🚀🚀

Check it out and explore here: SandboxAQ/SAIR

3 replies

cgeorgiaw

posted an update 5 months ago

Post

636

Just dropped the most influential materials science data of the year so far! Check it out :)))

cgeorgiaw/WyFormer-Symmetric-Crystals

1 reply

IAMJB

authored a paper 6 months ago

SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning

Paper • 2506.21355 • Published Jun 26, 2025 • 10

thomwolf

authored a paper 6 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 75

lvwerra

authored a paper 6 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 75

cgeorgiaw

posted an update 7 months ago

Post

2790

Huge new bio datasets just dropped!!!

Check out them out @

ginkgo-datapoints
Read the blog for more info: https://huggingface.co/blog/cgeorgiaw/gdp

1 reply

cgeorgiaw

posted an update 7 months ago

Post

1614

Snooping on HF is the best because sometimes you just discover that someone (in this case, Earth Species Project) is about to drop terabytes of sick (high quality animal sounds) data...

EarthSpeciesProject/NatureLM-audio-training

thomwolf

authored a paper 7 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 147

cgeorgiaw

posted an update 7 months ago

Post

525

Just dropped two bigger physics datasets (both on photonics)!

NUMBA 1: SIB-CL
This dataset of Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL) datasets for two scientific problems:
- PhC2D: 2D photonic crystal density-of-states (DOS) and bandstructure data.
- TISE: 3D time-independent Schrödinger equation eigenvalue and eigenvector solutions.

NUMBA2: 2D Photonic Topology
Symmetry-driven analysis of 2D photonic crystals: 10k random unit cells across 11 symmetries, 2 polarizations, 5 contrasts. Includes time-reversal breaking cases for 4 symmetries at high contrast.

Check them out: cgeorgiaw/sib-cl & cgeorgiaw/2d-photonic-topology

cgeorgiaw

authored 4 papers 8 months ago

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Paper • 2410.07436 • Published Oct 9, 2024 • 1

Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis

Paper • 2504.10166 • Published Apr 14, 2025 • 2

PSyDUCK: Training-Free Steganography for Latent Diffusion

Paper • 2501.19172 • Published Jan 31, 2025 • 1

LLM-Consensus: Multi-Agent Debate for Visual Misinformation Detection

Paper • 2410.20140 • Published Oct 26, 2024 • 1

clefourrier

posted an update 8 months ago

Post

2138

Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks

2 replies

thomwolf

posted an update 9 months ago

Post

7739

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!

1 reply

AI & ML interests

Recent Activity

Team members 24

LeMaterial's activity

Lemat Bench

Lemat Bench