Carlo Moro's picture

Carlo Moro

cnmoro

·

AI & ML interests

None yet

Recent Activity

reacted to robtacconelli's post with 🤯 about 21 hours ago

🧬 Midicoth: diffusion-based lossless compression — no neural net, no GPU, no training data What if reverse diffusion could compress text — without a neural network? Midicoth brings score-based denoising into classical compression. It treats prior smoothing as forward noise and reverses it with Tweedie's formula on a binary tree — 3 denoising steps, James-Stein shrinkage, applied after all model blending. ~2,000 lines of C, single CPU core. Beats every dictionary compressor we tested: enwik8 (100 MB) → 1.753 bpb (−11.9% vs xz, −15% vs Brotli, −24.5% vs bzip2) alice29.txt → 2.119 bpb (−16.9% vs xz) Outperforms xz, zstd, Brotli, bzip2, gzip on all inputs PAQ/CMIX still win with hundreds of models + LSTMs. LLM compressors win with pre-trained knowledge. Midicoth closes the gap with pure statistics — no mixer, no gradient descent, just counting. The Tweedie denoising layer adds 2.3–2.7% on every file tested — the most consistent component in the ablation. Adding SSE or logistic mixers made things worse. In the online setting, count-based beats gradient-based. No external dependencies. Fully deterministic. Bit-exact encode/decode. ~60 KB/s throughput. 💻 Code: https://github.com/robtacconelli/midicoth 📄 Paper: https://huggingface.co/papers/2603.08771 ⭐ Space: https://huggingface.co/spaces/robtacconelli/midicoth If you ever wondered whether diffusion ideas belong in data compression — here's proof they do. ⭐ appreciated!

liked a model 5 days ago

embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

upvoted a collection 5 days ago

Qwen3.5-text-only

View all activity

Organizations

cnmoro 's models 84

cnmoro/nomic-embed-text-v2-moe-distilled-high-quality

Feature Extraction • Updated 6 days ago • 45.8k • 4

cnmoro/Qwen2.5-0.5B-Portuguese-v1

Text Generation • 0.5B • Updated 25 days ago • 11 • 5

cnmoro/LFM2-PTBR-imatrix-Quants

3B • Updated Jan 22 • 30

cnmoro/LFM-Q4-GGUFS

3B • Updated Jan 19 • 3

cnmoro/LexicalEmbed-Base

Feature Extraction • 16.6M • Updated Dec 20, 2025 • 1

cnmoro/low-dimension-static-model

Updated Nov 11, 2025 • 2

cnmoro/custom-model2vec-tokenlearn-medium

Updated Oct 27, 2025 • 1

cnmoro/custom-model2vec-tokenlearn-small

Updated Oct 27, 2025 • 3 • 1

cnmoro/Qwen3-16B-A3B-REAP-PTBR

16B • Updated Oct 25, 2025 • 1 • 1

cnmoro/Qwen3-7B-A3B-REAP-PTBR

7B • Updated Oct 25, 2025 • 4 • 2

cnmoro/Qwen3-7B-A3B-REAP-PTBR-Q4_K_M-GGUF

7B • Updated Oct 25, 2025 • 74

cnmoro/Qwen3-16B-A3B-REAP-PTBR-SuperExp

16B • Updated Oct 25, 2025 • 2 • 1

cnmoro/distilbert-portuguese-tokenizer-lower-greedy

Updated Oct 10, 2025

cnmoro/bert-hash-femto-mlm

Fill-Mask • 1.8M • Updated Oct 8, 2025 • 1

cnmoro/LFM2-350M-PTBR

Text Generation • 0.4B • Updated Oct 7, 2025

cnmoro/static-nomic-distilled-vocab-quantized

Updated Oct 6, 2025

cnmoro/gpt-oss-20b-tokenizer-optional-reasoning

Updated Sep 5, 2025 • 3

cnmoro/gliclass-edge-v3.0-onnx

Text Classification • Updated Jul 30, 2025 • 1

cnmoro/gliclass-large-v3.0-onnx

Text Classification • Updated Jul 30, 2025 • 1

cnmoro/gliclass-base-v3.0-onnx

Text Classification • Updated Jul 30, 2025 • 1

cnmoro/gliclass-modern-large-v3.0-onnx

Text Classification • Updated Jul 30, 2025 • 1

cnmoro/gliclass-x-base-onnx

Text Classification • Updated Jul 30, 2025 • 1

cnmoro/prompt-router

Text Classification • 0.1B • Updated Jul 29, 2025 • 2

cnmoro/portuguese-nomic-embed-text-v2-moe

Sentence Similarity • 0.3B • Updated Jul 28, 2025 • 2

cnmoro/portuguese-bge-m3

Sentence Similarity • 0.3B • Updated Jul 28, 2025 • 137 • 4

cnmoro/portuguese-en-bge-m3

Sentence Similarity • 0.4B • Updated Jul 28, 2025

cnmoro/Gemma-3-Gaia-PT-BR-4b-it-Q8_0-GGUF

4B • Updated Jun 12, 2025 • 5

cnmoro/Gemma-3-Gaia-PT-BR-4b-it-Q4_K_M-GGUF

4B • Updated Jun 12, 2025 • 8 • 1

cnmoro/pylate-ibm-granite-107m-multilingual

Sentence Similarity • 0.1B • Updated Jun 12, 2025

cnmoro/static-nomic-eng-ptbr-tiny

Feature Extraction • Updated Jun 1, 2025 • 473 • 2