You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧠 AxonAI MX4 2.0

Reasoning-First Language Model · 4B Parameters · Chain-of-Thought Native

License: Apache 2.0 Model Size Base Model Fine-tuning Framework Precision Made in Indonesia

**Released by Daffaadityp ·


🔍 Overview

AxonAI MX4 2.0 is a state-of-the-art, 4-billion-parameter causal language model built for deep reasoning, structured problem-solving, and intelligent text generation. It is the second major iteration of the AxonAI MX series — an independent research line conducted under AxonLabs, a student-led AI initiative founded by Daffa Aditya Pratama at SMKN 26 Jakarta, Indonesia.

This model is a fully merged variant, meaning the DoRA adapter weights have been permanently folded into the base model. No adapter loading is required at inference time. The result is a single, self-contained, deployment-ready LLM that punches well above its weight class through the power of high-quality reasoning distillation.

In short: AxonAI MX4 2.0 is what happens when you take a lean 4B-parameter architecture and teach it to think before it speaks — using reasoning traces distilled from one of the most capable AI systems available.


✨ Key Highlights

Property Detail
Model Family AxonAI MX Series
Version 2.0 (Merged)
Base Architecture Qwen3-4B (Pure Transformer)
Parameter Count ~4 Billion
Training Method Reasoning Distillation via DoRA + NEFTune
Distillation Source Claude Opus 4.6 Reasoning Traces
Training Dataset Crownelius/Opus-4.6-Reasoning-3300x
Precision torch.float16
Format SafeTensors (5 shards)
Adapter Source Daffaadityp/AxonAI-MX4-2.0-Adapter
Reasoning Token Native <think>...</think> CoT block
License Apache 2.0
Creator Daffa Aditya Pratama / AxonLabs

🧬 The Secret Sauce — What Makes This Model Different

1. Reasoning Distillation from Claude Opus 4.6

AxonAI MX4 2.0 was not simply fine-tuned on generic instruction data. It was trained on ~3,300 high-quality reasoning traces generated by Claude Opus 4.6 — Anthropic's most capable reasoning model at the time of training. This process, known as Knowledge Distillation, transfers the structured reasoning style of a larger teacher model into a far more compact student model.

The practical effect is significant: the model has internalized patterns of logical decomposition, step-by-step inference, and self-verification that are characteristic of frontier-scale systems — all within a 4B parameter footprint.

2. Native Chain-of-Thought via <think> Tokens

AxonAI MX4 2.0 uses a native Chain-of-Thought (CoT) mechanism through a dedicated <think>...</think> token pair. Before producing its final answer, the model is prompted to reason explicitly inside this block. This is not a post-processing trick — the model was trained end-to-end to generate this reasoning trace as a fundamental part of its output process.

This makes the model especially capable on tasks that require:

  • 🔢 Multi-step mathematical reasoning
  • 💻 Code generation and debugging
  • 🧩 Logical inference and puzzle solving
  • 📖 Complex instruction following

3. DoRA — A More Expressive Fine-tuning Method

Training used DoRA (Weight-Decomposed Low-Rank Adaptation), a technique published as an Oral paper at ICML 2024 by NVIDIA Research. Unlike standard LoRA which applies a single low-rank update, DoRA decomposes pre-trained weights into magnitude and direction components and fine-tunes them separately. This allows the model to make more nuanced parameter adjustments, closing the gap between LoRA fine-tuning and full fine-tuning — without any additional inference cost.

4. NEFTune Noise Injection for Generalization

During training, NEFTune (Noisy Embedding Fine-Tuning) was applied. This technique, published at ICLR 2024, injects controlled random noise into embedding vectors during the forward pass. It acts as a regularizer, preventing the model from overfitting to the narrow patterns of the instruction dataset. The result is a model that generalizes more robustly and produces richer, more coherent responses — a "free lunch" for fine-tuning quality, as described by the original authors.


🏗️ Technical Architecture

Base:          unsloth/Qwen3-4B (Pure Decoder-only Transformer)
Architecture:  32 Transformer layers, Multi-Head Attention, RoPE
Fine-tuning:   DoRA (ICML 2024 Oral) via Unsloth
Regularizer:   NEFTune Noise Injection (ICLR 2024)
Adapter:       Daffaadityp/AxonAI-MX4-2.0-Adapter (merged into weights)
Precision:     torch.float16
Format:        SafeTensors — 5 shards
CoT Token:     <think> ... </think> (native, trained end-to-end)

The training pipeline was orchestrated using Unsloth, a highly optimized fine-tuning library that significantly reduces VRAM usage and accelerates training throughput, making this level of research accessible on consumer-grade hardware.


🚀 Quickstart — Running Inference

Install the dependencies if you have not already:

pip install transformers torch accelerate

Then run inference with the following script:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# --- Load model and tokenizer ---
model_id = "Daffaadityp/AxonAI-MX4-2.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

# --- Define your conversation ---
# The system prompt instructs the model to use its <think> block before answering.
messages = [
    {
        "role": "system",
        "content": (
            "You are AxonAI, an intelligent reasoning assistant. "
            "Before providing your final answer, always reason through the problem "
            "carefully inside a <think>...</think> block. "
            "Be thorough in your thinking, then provide a clean and concise final answer."
        ),
    },
    {
        "role": "user",
        "content": "A train travels 120 km in 1.5 hours. "
                   "If it then increases its speed by 20%, "
                   "how long will it take to travel the next 180 km?",
    },
]

# --- Apply chat template and tokenize ---
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

# --- Generate response ---
with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1,
    )

# --- Decode and print ---
generated_tokens = output_ids[0][input_ids.shape[-1]:]
response = tokenizer.decode(generated_tokens, skip_special_tokens=False)
print(response)

📤 Expected Output Format

The model will first emit a <think> block containing its internal reasoning chain, followed by its structured final answer:

<think>
The train's initial speed is 120 km / 1.5 hours = 80 km/h.
An increase of 20% gives a new speed of 80 × 1.2 = 96 km/h.
Time to travel 180 km at 96 km/h = 180 / 96 = 1.875 hours.
Converting: 1.875 hours = 1 hour and 52.5 minutes.
</think>

The train's initial speed is **80 km/h**. After a 20% increase, its new speed
becomes **96 km/h**. At this speed, traveling 180 km will take exactly
**1 hour and 52.5 minutes** (or 1.875 hours).

Tip: If you want to suppress the <think> block from the final output shown to users, simply split the response string on </think> and take the second part.


⚙️ Recommended Generation Parameters

Parameter Recommended Value Notes
max_new_tokens 5122048 Higher for complex multi-step problems
temperature 0.60.8 Lower values yield more deterministic answers
top_p 0.9 Nucleus sampling threshold
repetition_penalty 1.11.15 Strongly recommended to avoid looping
do_sample True Required when using temperature / top_p

🎯 Intended Use Cases

This model is well-suited for the following tasks:

  • Reasoning & Logic: Multi-step word problems, logical deduction, argument analysis
  • Mathematics: Arithmetic, algebra, and structured quantitative reasoning
  • Coding Assistance: Code explanation, bug analysis, algorithm walkthroughs
  • Instruction Following: Complex, multi-part instructions requiring sequential execution
  • Educational Tools: Step-by-step tutoring, explanation generation, exam preparation
  • Indonesian Language Applications: The model retains multilingual capability from its Qwen3 base

⚠️ Limitations & Responsible Use

  • This is a research model produced by a student researcher. It has not been subject to enterprise-grade safety evaluation or red-teaming.
  • The model may occasionally produce incorrect reasoning inside its <think> block. Always verify critical outputs, especially for mathematical or factual claims.
  • Performance on very long-context tasks (>4K tokens) may degrade compared to the base model.
  • This model is not intended for use in medical, legal, financial, or other high-stakes decision-making contexts without human oversight.
  • The model is provided under the Apache 2.0 license. Users are responsible for ensuring their use complies with applicable laws and regulations.

🇮🇩 Untuk Developer Indonesia

Halo semuanya! Saya Daffa Aditya Pratama. Model ini saya buat sebagai bagian dari proyek riset mandiri di bawah AxonLabs — sebuah inisiatif AI kecil-kecilan yang saya jalankan sendiri.

AxonAI MX4 2.0 adalah model bahasa berbasis reasoning yang dilatih menggunakan teknik distilasi pengetahuan dari jejak penalaran Claude Opus 4.6. Artinya, model ini belajar cara berpikir secara terstruktur sebelum menjawab pertanyaan — mirip seperti cara kita mengerjakan soal matematika: coret-coret dulu, baru tulis jawabannya.

Saya berharap model ini bisa bermanfaat bagi teman-teman developer, peneliti, maupun pelajar di Indonesia yang ingin bereksperimen dengan model AI lokal berukuran kecil namun bertenaga. Jika ada pertanyaan, masukan, atau ingin berkolaborasi, jangan ragu untuk menghubungi saya melalui Hugging Face!

Selamat bereksperimen! 🚀


📚 References & Acknowledgements

This work builds on the shoulders of outstanding open research:

  • Qwen3 by Alibaba Cloud — the capable base architecture powering this model
  • DoRA: Weight-Decomposed Low-Rank Adaptation — Liu et al., ICML 2024 (Oral) · arXiv:2402.09353
  • NEFTune: Noisy Embeddings Improve Instruction Finetuning — Jain et al., ICLR 2024 · arXiv:2310.05914
  • Unsloth by Daniel & Michael Han — for making efficient fine-tuning accessible
  • Crownelius — for the Opus-4.6-Reasoning-3300x dataset

🔗 Related Resources

Resource Link
🔧 Adapter Weights Daffaadityp/AxonAI-MX4-2.0-Adapter
📦 Training Dataset Crownelius/Opus-4.6-Reasoning-3300x
🏗️ Base Model unsloth/Qwen3-4B
👤 Creator Profile Daffaadityp

📄 Citation

If you use AxonAI MX4 2.0 in your research or project, please consider citing it:

@misc{pratama2025axonaimx4,
  author       = {Daffa Aditya Pratama},
  title        = {AxonAI MX4 2.0: A Reasoning-Distilled 4B Language Model},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0}},
  note         = {Fine-tuned from Qwen3-4B using DoRA and NEFTune under AxonLabs}
}

AxonLabs · Indonesia 🇮🇩

Built with curiosity, trained with rigor.

Downloads last month
3
Safetensors
Model size
4B params
Tensor type
F16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daffaadityp/AxonAI-MX4-2.0

Finetuned
Qwen/Qwen3-4B
Finetuned
unsloth/Qwen3-4B
Finetuned
(603)
this model
Quantizations
1 model

Dataset used to train Daffaadityp/AxonAI-MX4-2.0

Papers for Daffaadityp/AxonAI-MX4-2.0