Β·
AI & ML interests
None yet
Recent Activity
reacted
to
robtacconelli's
post with π€― about 21 hours ago 𧬠Midicoth: diffusion-based lossless compression β no neural net, no GPU, no training data
What if reverse diffusion could compress text β without a neural network?
Midicoth brings score-based denoising into classical compression. It treats prior smoothing as forward noise and reverses it with Tweedie's formula on a binary tree β 3 denoising steps, James-Stein shrinkage, applied after all model blending. ~2,000 lines of C, single CPU core.
Beats every dictionary compressor we tested:
enwik8 (100 MB) β 1.753 bpb (β11.9% vs xz, β15% vs Brotli, β24.5% vs bzip2)
alice29.txt β 2.119 bpb (β16.9% vs xz)
Outperforms xz, zstd, Brotli, bzip2, gzip on all inputs
PAQ/CMIX still win with hundreds of models + LSTMs. LLM compressors win with pre-trained knowledge. Midicoth closes the gap with pure statistics β no mixer, no gradient descent, just counting.
The Tweedie denoising layer adds 2.3β2.7% on every file tested β the most consistent component in the ablation. Adding SSE or logistic mixers made things worse. In the online setting, count-based beats gradient-based.
No external dependencies. Fully deterministic. Bit-exact encode/decode. ~60 KB/s throughput.
π» Code: https://github.com/robtacconelli/midicoth
π Paper: https://huggingface.co/papers/2603.08771
β Space: https://huggingface.co/spaces/robtacconelli/midicoth
If you ever wondered whether diffusion ideas belong in data compression β here's proof they do. β appreciated! View all activity
Organizations
cnmoro/nomic-embed-text-v2-moe-distilled-high-quality
Feature Extraction
β’ Updated
β’ 45.8k
β’ 4
cnmoro/Qwen2.5-0.5B-Portuguese-v1
Text Generation
β’ 0.5B β’ Updated
β’ 11
β’ 5
cnmoro/LFM2-PTBR-imatrix-Quants
3B β’ Updated
β’ 30
3B β’ Updated
β’ 3
Feature Extraction
β’ 16.6M β’ Updated
β’ 1
cnmoro/low-dimension-static-model
Updated
β’ 2
cnmoro/custom-model2vec-tokenlearn-medium
Updated
β’ 1
cnmoro/custom-model2vec-tokenlearn-small
Updated
β’ 3
β’ 1
cnmoro/Qwen3-16B-A3B-REAP-PTBR
16B β’ Updated
β’ 1
β’ 1
cnmoro/Qwen3-7B-A3B-REAP-PTBR
7B β’ Updated
β’ 4
β’ 2
cnmoro/Qwen3-7B-A3B-REAP-PTBR-Q4_K_M-GGUF
7B β’ Updated
β’ 74
cnmoro/Qwen3-16B-A3B-REAP-PTBR-SuperExp
16B β’ Updated
β’ 2
β’ 1
cnmoro/distilbert-portuguese-tokenizer-lower-greedy
Updated
cnmoro/bert-hash-femto-mlm
Fill-Mask
β’ 1.8M β’ Updated
β’ 1
Text Generation
β’ 0.4B β’ Updated
cnmoro/static-nomic-distilled-vocab-quantized
Updated
cnmoro/gpt-oss-20b-tokenizer-optional-reasoning
Updated
β’ 3
cnmoro/gliclass-edge-v3.0-onnx
Text Classification
β’ Updated
β’ 1
cnmoro/gliclass-large-v3.0-onnx
Text Classification
β’ Updated
β’ 1
cnmoro/gliclass-base-v3.0-onnx
Text Classification
β’ Updated
β’ 1
cnmoro/gliclass-modern-large-v3.0-onnx
Text Classification
β’ Updated
β’ 1
cnmoro/gliclass-x-base-onnx
Text Classification
β’ Updated
β’ 1
Text Classification
β’ 0.1B β’ Updated
β’ 2
cnmoro/portuguese-nomic-embed-text-v2-moe
Sentence Similarity
β’ 0.3B β’ Updated
β’ 2
Sentence Similarity
β’ 0.3B β’ Updated
β’ 137
β’ 4
cnmoro/portuguese-en-bge-m3
Sentence Similarity
β’ 0.4B β’ Updated
cnmoro/Gemma-3-Gaia-PT-BR-4b-it-Q8_0-GGUF
4B β’ Updated
β’ 5
cnmoro/Gemma-3-Gaia-PT-BR-4b-it-Q4_K_M-GGUF
4B β’ Updated
β’ 8
β’ 1
cnmoro/pylate-ibm-granite-107m-multilingual
Sentence Similarity
β’ 0.1B β’ Updated
cnmoro/static-nomic-eng-ptbr-tiny
Feature Extraction
β’ Updated
β’ 473
β’ 2