You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🧠 AxonAI MX4 2.0

Reasoning-First Language Model · 4B Parameters · Chain-of-Thought Native

**Released by Daffaadityp ·

🔍 Overview

AxonAI MX4 2.0 is a state-of-the-art, 4-billion-parameter causal language model built for deep reasoning, structured problem-solving, and intelligent text generation. It is the second major iteration of the AxonAI MX series — an independent research line conducted under AxonLabs, a student-led AI initiative founded by Daffa Aditya Pratama at SMKN 26 Jakarta, Indonesia.

This model is a fully merged variant, meaning the DoRA adapter weights have been permanently folded into the base model. No adapter loading is required at inference time. The result is a single, self-contained, deployment-ready LLM that punches well above its weight class through the power of high-quality reasoning distillation.

In short: AxonAI MX4 2.0 is what happens when you take a lean 4B-parameter architecture and teach it to think before it speaks — using reasoning traces distilled from one of the most capable AI systems available.

✨ Key Highlights

Property	Detail
Model Family	AxonAI MX Series
Version	2.0 (Merged)
Base Architecture	Qwen3-4B (Pure Transformer)
Parameter Count	~4 Billion
Training Method	Reasoning Distillation via DoRA + NEFTune
Distillation Source	Claude Opus 4.6 Reasoning Traces
Training Dataset	`Crownelius/Opus-4.6-Reasoning-3300x`
Precision	`torch.float16`
Format	SafeTensors (5 shards)
Adapter Source	`Daffaadityp/AxonAI-MX4-2.0-Adapter`
Reasoning Token	Native `<think>...</think>` CoT block
License	Apache 2.0
Creator	Daffa Aditya Pratama / AxonLabs

🧬 The Secret Sauce — What Makes This Model Different

1. Reasoning Distillation from Claude Opus 4.6

AxonAI MX4 2.0 was not simply fine-tuned on generic instruction data. It was trained on ~3,300 high-quality reasoning traces generated by Claude Opus 4.6 — Anthropic's most capable reasoning model at the time of training. This process, known as Knowledge Distillation, transfers the structured reasoning style of a larger teacher model into a far more compact student model.

The practical effect is significant: the model has internalized patterns of logical decomposition, step-by-step inference, and self-verification that are characteristic of frontier-scale systems — all within a 4B parameter footprint.

2. Native Chain-of-Thought via `<think>` Tokens

AxonAI MX4 2.0 uses a native Chain-of-Thought (CoT) mechanism through a dedicated <think>...</think> token pair. Before producing its final answer, the model is prompted to reason explicitly inside this block. This is not a post-processing trick — the model was trained end-to-end to generate this reasoning trace as a fundamental part of its output process.

This makes the model especially capable on tasks that require:

🔢 Multi-step mathematical reasoning
💻 Code generation and debugging
🧩 Logical inference and puzzle solving
📖 Complex instruction following

3. DoRA — A More Expressive Fine-tuning Method

Training used DoRA (Weight-Decomposed Low-Rank Adaptation), a technique published as an Oral paper at ICML 2024 by NVIDIA Research. Unlike standard LoRA which applies a single low-rank update, DoRA decomposes pre-trained weights into magnitude and direction components and fine-tunes them separately. This allows the model to make more nuanced parameter adjustments, closing the gap between LoRA fine-tuning and full fine-tuning — without any additional inference cost.

4. NEFTune Noise Injection for Generalization

During training, NEFTune (Noisy Embedding Fine-Tuning) was applied. This technique, published at ICLR 2024, injects controlled random noise into embedding vectors during the forward pass. It acts as a regularizer, preventing the model from overfitting to the narrow patterns of the instruction dataset. The result is a model that generalizes more robustly and produces richer, more coherent responses — a "free lunch" for fine-tuning quality, as described by the original authors.

🏗️ Technical Architecture

Base:          unsloth/Qwen3-4B (Pure Decoder-only Transformer)
Architecture:  32 Transformer layers, Multi-Head Attention, RoPE
Fine-tuning:   DoRA (ICML 2024 Oral) via Unsloth
Regularizer:   NEFTune Noise Injection (ICLR 2024)
Adapter:       Daffaadityp/AxonAI-MX4-2.0-Adapter (merged into weights)
Precision:     torch.float16
Format:        SafeTensors — 5 shards
CoT Token:     <think> ... </think> (native, trained end-to-end)

The training pipeline was orchestrated using Unsloth, a highly optimized fine-tuning library that significantly reduces VRAM usage and accelerates training throughput, making this level of research accessible on consumer-grade hardware.

🚀 Quickstart — Running Inference

Install the dependencies if you have not already:

pip install transformers torch accelerate

Then run inference with the following script:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# --- Load model and tokenizer ---
model_id = "Daffaadityp/AxonAI-MX4-2.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

# --- Define your conversation ---
# The system prompt instructs the model to use its <think> block before answering.
messages = [
    {
        "role": "system",
        "content": (
            "You are AxonAI, an intelligent reasoning assistant. "
            "Before providing your final answer, always reason through the problem "
            "carefully inside a <think>...</think> block. "
            "Be thorough in your thinking, then provide a clean and concise final answer."
        ),
    },
    {
        "role": "user",
        "content": "A train travels 120 km in 1.5 hours. "
                   "If it then increases its speed by 20%, "
                   "how long will it take to travel the next 180 km?",
    },
]

# --- Apply chat template and tokenize ---
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

# --- Generate response ---
with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1,
    )

# --- Decode and print ---
generated_tokens = output_ids[0][input_ids.shape[-1]:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=False)
print(response)

📤 Expected Output Format

The model will first emit a <think> block containing its internal reasoning chain, followed by its structured final answer:

<think>
The train's initial speed is 120 km / 1.5 hours = 80 km/h.
An increase of 20% gives a new speed of 80 × 1.2 = 96 km/h.
Time to travel 180 km at 96 km/h = 180 / 96 = 1.875 hours.
Converting: 1.875 hours = 1 hour and 52.5 minutes.
</think>

The train's initial speed is **80 km/h**. After a 20% increase, its new speed
becomes **96 km/h**. At this speed, traveling 180 km will take exactly
**1 hour and 52.5 minutes** (or 1.875 hours).

Tip: If you want to suppress the <think> block from the final output shown to users, simply split the response string on </think> and take the second part.

⚙️ Recommended Generation Parameters

Parameter	Recommended Value	Notes
`max_new_tokens`	`512` – `2048`	Higher for complex multi-step problems
`temperature`	`0.6` – `0.8`	Lower values yield more deterministic answers
`top_p`	`0.9`	Nucleus sampling threshold
`repetition_penalty`	`1.1` – `1.15`	Strongly recommended to avoid looping
`do_sample`	`True`	Required when using `temperature` / `top_p`

🎯 Intended Use Cases

This model is well-suited for the following tasks:

Reasoning & Logic: Multi-step word problems, logical deduction, argument analysis
Mathematics: Arithmetic, algebra, and structured quantitative reasoning
Coding Assistance: Code explanation, bug analysis, algorithm walkthroughs
Instruction Following: Complex, multi-part instructions requiring sequential execution
Educational Tools: Step-by-step tutoring, explanation generation, exam preparation
Indonesian Language Applications: The model retains multilingual capability from its Qwen3 base

⚠️ Limitations & Responsible Use

This is a research model produced by a student researcher. It has not been subject to enterprise-grade safety evaluation or red-teaming.
The model may occasionally produce incorrect reasoning inside its <think> block. Always verify critical outputs, especially for mathematical or factual claims.
Performance on very long-context tasks (>4K tokens) may degrade compared to the base model.
This model is not intended for use in medical, legal, financial, or other high-stakes decision-making contexts without human oversight.
The model is provided under the Apache 2.0 license. Users are responsible for ensuring their use complies with applicable laws and regulations.

🇮🇩 Untuk Developer Indonesia

Halo semuanya! Saya Daffa Aditya Pratama. Model ini saya buat sebagai bagian dari proyek riset mandiri di bawah AxonLabs — sebuah inisiatif AI kecil-kecilan yang saya jalankan sendiri.

AxonAI MX4 2.0 adalah model bahasa berbasis reasoning yang dilatih menggunakan teknik distilasi pengetahuan dari jejak penalaran Claude Opus 4.6. Artinya, model ini belajar cara berpikir secara terstruktur sebelum menjawab pertanyaan — mirip seperti cara kita mengerjakan soal matematika: coret-coret dulu, baru tulis jawabannya.

Saya berharap model ini bisa bermanfaat bagi teman-teman developer, peneliti, maupun pelajar di Indonesia yang ingin bereksperimen dengan model AI lokal berukuran kecil namun bertenaga. Jika ada pertanyaan, masukan, atau ingin berkolaborasi, jangan ragu untuk menghubungi saya melalui Hugging Face!

Selamat bereksperimen! 🚀

📚 References & Acknowledgements

This work builds on the shoulders of outstanding open research:

Qwen3 by Alibaba Cloud — the capable base architecture powering this model
DoRA: Weight-Decomposed Low-Rank Adaptation — Liu et al., ICML 2024 (Oral) · arXiv:2402.09353
NEFTune: Noisy Embeddings Improve Instruction Finetuning — Jain et al., ICLR 2024 · arXiv:2310.05914
Unsloth by Daniel & Michael Han — for making efficient fine-tuning accessible
Crownelius — for the Opus-4.6-Reasoning-3300x dataset

🔗 Related Resources

Resource	Link
🔧 Adapter Weights	`Daffaadityp/AxonAI-MX4-2.0-Adapter`
📦 Training Dataset	`Crownelius/Opus-4.6-Reasoning-3300x`
🏗️ Base Model	`unsloth/Qwen3-4B`
👤 Creator Profile	`Daffaadityp`

📄 Citation

If you use AxonAI MX4 2.0 in your research or project, please consider citing it:

@misc{pratama2025axonaimx4,
  author       = {Daffa Aditya Pratama},
  title        = {AxonAI MX4 2.0: A Reasoning-Distilled 4B Language Model},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0}},
  note         = {Fine-tuned from Qwen3-4B using DoRA and NEFTune under AxonLabs}
}

AxonLabs · Indonesia 🇮🇩

Built with curiosity, trained with rigor.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

F16

F32

Model tree for Daffaadityp/AxonAI-MX4-2.0

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

unsloth/Qwen3-4B

Finetuned

(603)

this model

Quantizations

1 model

Dataset used to train Daffaadityp/AxonAI-MX4-2.0

Papers for Daffaadityp/AxonAI-MX4-2.0

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 32

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14

You need to agree to share your contact information to access this model

🧠 AxonAI MX4 2.0

Reasoning-First Language Model · 4B Parameters · Chain-of-Thought Native

🔍 Overview

✨ Key Highlights

🧬 The Secret Sauce — What Makes This Model Different

1. Reasoning Distillation from Claude Opus 4.6

2. Native Chain-of-Thought via <think> Tokens

3. DoRA — A More Expressive Fine-tuning Method

4. NEFTune Noise Injection for Generalization

🏗️ Technical Architecture

🚀 Quickstart — Running Inference

📤 Expected Output Format

⚙️ Recommended Generation Parameters

🎯 Intended Use Cases

⚠️ Limitations & Responsible Use

🇮🇩 Untuk Developer Indonesia

📚 References & Acknowledgements

🔗 Related Resources

📄 Citation

Model tree for Daffaadityp/AxonAI-MX4-2.0

Dataset used to train Daffaadityp/AxonAI-MX4-2.0

Papers for Daffaadityp/AxonAI-MX4-2.0

2. Native Chain-of-Thought via `<think>` Tokens