Qwen2.5-Coder-0.5B — Synoema Tools v1

LoRA adapter fine-tuned on Qwen2.5-Coder-0.5B-Instruct for agentic tool-use with the Synoema MCP server.

Score: 92.9% (26/28 tasks) on the Synoema MCP agentic evaluation benchmark.

What is Synoema?

Synoema is an LLM-native programming language and runtime:

BPE-aligned operators — every operator maps to exactly 1 cl100k_base token
GBNF grammar for constrained decoding (structural correctness guarantee)
Cranelift JIT + WebAssembly compilation targets
MCP server exposing file_write, file_read, sno_typecheck, sno_run, search_corpus tools
Contract annotations (requires/ensures) for formal verification

Model Details

Property	Value
Base model	`Qwen/Qwen2.5-Coder-0.5B-Instruct` (via unsloth 4-bit)
Method	QLoRA (4-bit NF4 quantization + LoRA)
LoRA rank	r=8, alpha=32
Target modules	q/k/v/o proj + gate/up/down proj (all attention + FFN)
Batch	4 × grad_accum=4 = effective batch 16
Sequence length	1024 tokens
Epochs	3 per cycle
Optimizer	AdamW with cosine decay
Training corpus	~14,778 examples (tool-use + codegen)
Training time	~84 min/cycle × 8 cycles = ~11h total carousel
Training hardware	AMD RX 7900 GRE 16GB (ROCm + unsloth)
Carousel cycles	8 cycles (C1→C8), each starting from best previous
Cycle C8 loss	0.022

Training Approach: Carousel Fine-tuning

This model was trained using a carousel strategy:

Base model → C1 (eval) → C2 (eval) → ... → C8 (best: 92.9%)
                                             ↑ always from best adapter

Each cycle:

Merge corpus — base corpus + all targeted examples for failing tasks
Train 3 epochs from the best previous adapter
Eval on 28 agentic tasks (real tool calls, real typecheck)
Analyze failures → generate targeted examples → add to corpus
Repeat from best adapter

Evaluation: 28 Agentic Tasks

The model is evaluated on real multi-turn tool-use scenarios. Each task requires calling MCP tools correctly in sequence. The eval runs actual sno typecheck and sno run commands — no mock results.

Result: 26/28 tasks passed (92.9%)

Category	Tasks	Passed
Basic write + typecheck	TU1, TU2, TU3, TU5, TU6	5/5 ✅
Multi-step (search→write→run)	TU9, TU20	2/2 ✅
Language features (cons/ADT/HOF)	TU11, TU14–TU19	7/7 ✅
Ternary + complex expressions	TU22, TU30	2/2 ✅
List comprehension	TU12, TU26	2/2 ✅
Write-only (no run)	TU10	1/1 ✅
String ops	TU18, TU25	2/2 ✅
Pattern matching	TU11, TU19	2/2 ✅
Fix error (if/else → ternary)	TU4, TU13	0/2 ❌

Remaining failures:

TU4 — must write if x > y then x else y, see typecheck error, then fix to ? x > y -> x : y (2-write pattern)
TU13 — same pattern with classify n = if n > 0 then 1 else 0 → ? n > 0 -> 1 : 0

Both require a strict write→typecheck→rewrite→typecheck sequence with exactly 2 file_write calls.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base, "Delimitter/qwen2.5-0.5b-synoema-tools-v1")

With unsloth (recommended for inference):

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Delimitter/qwen2.5-0.5b-synoema-tools-v1",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

System prompt format (ChatML):

<|im_start|>system
You are an AI coding assistant for the Synoema programming language...
<|im_end|>
<|im_start|>user
Write a quicksort in Synoema to src/qs.sno and run it.
<|im_end|>
<|im_start|>assistant

Corpus Composition

Source	Examples	Description
`tool_use_train_v17_fix.jsonl`	676	Fix-error patterns (if/else→ternary)
`tool_use_train_v16_gen.jsonl`	~3500	Write+check+run patterns
`tool_use_train_lang_v1.jsonl`	~3000	Synoema language codegen
`targeted_seq_c*` files	~400	Carousel-generated targeted examples
Other validated sources	~7200	Mixed tool-use patterns
Total	~14,778

All examples validated with sno check + sno run before training.

Training History (Carousel)

Cycle	Score	Failing tasks
C1	89.3% (25/28)	TU4, TU13, TU20
C2	82.1% (23/28)	TU4, TU9, TU12, TU13, TU20
C3	85.7% (24/28)	TU4, TU12, TU13, TU20
C4	78.6% (22/28)	TU4, TU10, TU12, TU13, TU20
C5	85.7% (24/28)	TU4, TU12, TU13, TU20
C6	85.7% (24/28)	TU4, TU12, TU13, TU20
C7	89.3% (25/28)	TU4, TU12, TU13
C8	92.9% (26/28) 🏆	TU4, TU13
C9+	50–82%	Catastrophic forgetting

C8 was selected as best before catastrophic forgetting set in at C9.

Synoema Language Quick Reference

-- Ternary (no if/else!)
max x y = ? x > y -> x : y

-- Pattern matching
fact 0 = 1
fact n = n * fact (n - 1)

-- List comprehension  
evens = [x | x <- [1..20], x % 2 == 0]

-- Space-separated lists (NOT commas)
main = qsort [3 1 4 1 5 9]

-- ADT
Shape = Circle Int | Rect Int Int
area (Circle r) = 3 * r * r

License

Apache 2.0 — same as Qwen2.5-Coder base model.

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

28-Task Agentic Eval (26/28 pass@1)
self-reported

0.929