DistilQwen

reaperdoesntknow 's Collections

Shepherd

DualMind

DistilQwen

DiscoverLM

SAGI - Swarm AGI Language Model

Qemma

DNA-AI

Mixture of Attentions - MoA

🔷 SymbioticAI: A Collection of Symbolic-Transformers for Th

updated about 5 hours ago

H100 BF16. 30B→1.7B/0.6B TKD. Three teachers. 15 models + DISC paper. 10K+ downloads. DOI: 10.57967/hf/8165 & 10.57967/hf/8194

Upvote

reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B

Text Generation • 2B • Updated 5 days ago • 2.2k • • 1

Note 30B teacher, 1.7B student. Proof-weighted KD at 2.25× on reasoning.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF

Text Generation • 2B • Updated 5 days ago • 854

Note Most downloaded GGUF in the collection. CPU-friendly.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT

2B • Updated 5 days ago • 177

Note Second stage: distil → SFT on instruction-following data.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B

Text Generation • 0.8B • Updated 5 days ago • 2.09k •

Note 0.6B student. Proves the methodology works at extreme scales.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Text Generation • 0.8B • Updated 5 days ago • 2.16k • • 2

Note Higher-entropy teacher distributions → richer student representations.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF

Text Generation • 0.8B • Updated 5 days ago • 864

Note Thinking-SFT at 0.6B quantized. Runs on anything.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT

Text Generation • 2B • Updated 5 days ago • 2.23k • 1

Note Coder teacher. Structured decomposition → STEM derivation.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF

Text Generation • 2B • Updated 5 days ago • 1.28k • 1

Note Coder pipeline quantized. F16/Q4/Q5/Q8.
reaperdoesntknow/DistilQwen3-1.7B-uncensored

Text Generation • 2B • Updated 5 days ago • 1.97k •

Note Starting point for custom SFT pipelines.
reaperdoesntknow/TopologicalQwen

Text Generation • 2B • Updated 5 days ago • 2.41k •

Note The model that proved ghost imprinting — literary from physics data.
reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored

2B • Updated 5 days ago • 226

Note Named for Discrepancy Calculus influence on training signal.
reaperdoesntknow/Disctil-Qwen3-1.7B

Text Generation • 2B • Updated 5 days ago • 1.89k •

Note Structural refinement via DISC operator before TKD stage.
reaperdoesntknow/DistilQwen3-1.7B-uncensored-GGUF

2B • Updated 5 days ago • 1.5k • 1

Note Edge deployment for research. No alignment filtering. Apache 2.0.
reaperdoesntknow/Qwen3-1.7B-Thinking-Distil

Text Generation • 2B • Updated 5 days ago • 2.43k • • 1

Note Extended deliberation from 30B-Thinking → 1.7B student.
reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT

Text Generation • 1B • Updated 5 days ago • 913

Note Proves TKD works across architecture families, not just within Qwen.
reaperdoesntknow/Discrepancy_Calculus

Updated 5 days ago

Note Continuous Thought Dynamics — mathematical backbone of DualMind.

Upvote