Qwen2.5-Coder-0.5B — Synoema Tools v1

LoRA adapter fine-tuned on Qwen2.5-Coder-0.5B-Instruct for agentic tool-use with the Synoema MCP server.

Score: 92.9% (26/28 tasks) on the Synoema MCP agentic evaluation benchmark.


What is Synoema?

Synoema is an LLM-native programming language and runtime:

  • BPE-aligned operators — every operator maps to exactly 1 cl100k_base token
  • GBNF grammar for constrained decoding (structural correctness guarantee)
  • Cranelift JIT + WebAssembly compilation targets
  • MCP server exposing file_write, file_read, sno_typecheck, sno_run, search_corpus tools
  • Contract annotations (requires/ensures) for formal verification

Model Details

Property Value
Base model Qwen/Qwen2.5-Coder-0.5B-Instruct (via unsloth 4-bit)
Method QLoRA (4-bit NF4 quantization + LoRA)
LoRA rank r=8, alpha=32
Target modules q/k/v/o proj + gate/up/down proj (all attention + FFN)
Batch 4 × grad_accum=4 = effective batch 16
Sequence length 1024 tokens
Epochs 3 per cycle
Optimizer AdamW with cosine decay
Training corpus ~14,778 examples (tool-use + codegen)
Training time ~84 min/cycle × 8 cycles = ~11h total carousel
Training hardware AMD RX 7900 GRE 16GB (ROCm + unsloth)
Carousel cycles 8 cycles (C1→C8), each starting from best previous
Cycle C8 loss 0.022

Training Approach: Carousel Fine-tuning

This model was trained using a carousel strategy:

Base model → C1 (eval) → C2 (eval) → ... → C8 (best: 92.9%)
                                             ↑ always from best adapter

Each cycle:

  1. Merge corpus — base corpus + all targeted examples for failing tasks
  2. Train 3 epochs from the best previous adapter
  3. Eval on 28 agentic tasks (real tool calls, real typecheck)
  4. Analyze failures → generate targeted examples → add to corpus
  5. Repeat from best adapter

Evaluation: 28 Agentic Tasks

The model is evaluated on real multi-turn tool-use scenarios. Each task requires calling MCP tools correctly in sequence. The eval runs actual sno typecheck and sno run commands — no mock results.

Result: 26/28 tasks passed (92.9%)

Category Tasks Passed
Basic write + typecheck TU1, TU2, TU3, TU5, TU6 5/5 ✅
Multi-step (search→write→run) TU9, TU20 2/2 ✅
Language features (cons/ADT/HOF) TU11, TU14–TU19 7/7 ✅
Ternary + complex expressions TU22, TU30 2/2 ✅
List comprehension TU12, TU26 2/2 ✅
Write-only (no run) TU10 1/1 ✅
String ops TU18, TU25 2/2 ✅
Pattern matching TU11, TU19 2/2 ✅
Fix error (if/else → ternary) TU4, TU13 0/2 ❌

Remaining failures:

  • TU4 — must write if x > y then x else y, see typecheck error, then fix to ? x > y -> x : y (2-write pattern)
  • TU13 — same pattern with classify n = if n > 0 then 1 else 0? n > 0 -> 1 : 0

Both require a strict write→typecheck→rewrite→typecheck sequence with exactly 2 file_write calls.


Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
model = PeftModel.from_pretrained(base, "Delimitter/qwen2.5-0.5b-synoema-tools-v1")

With unsloth (recommended for inference):

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Delimitter/qwen2.5-0.5b-synoema-tools-v1",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

System prompt format (ChatML):

<|im_start|>system
You are an AI coding assistant for the Synoema programming language...
<|im_end|>
<|im_start|>user
Write a quicksort in Synoema to src/qs.sno and run it.
<|im_end|>
<|im_start|>assistant

Corpus Composition

Source Examples Description
tool_use_train_v17_fix.jsonl 676 Fix-error patterns (if/else→ternary)
tool_use_train_v16_gen.jsonl ~3500 Write+check+run patterns
tool_use_train_lang_v1.jsonl ~3000 Synoema language codegen
targeted_seq_c* files ~400 Carousel-generated targeted examples
Other validated sources ~7200 Mixed tool-use patterns
Total ~14,778

All examples validated with sno check + sno run before training.


Training History (Carousel)

Cycle Score Failing tasks
C1 89.3% (25/28) TU4, TU13, TU20
C2 82.1% (23/28) TU4, TU9, TU12, TU13, TU20
C3 85.7% (24/28) TU4, TU12, TU13, TU20
C4 78.6% (22/28) TU4, TU10, TU12, TU13, TU20
C5 85.7% (24/28) TU4, TU12, TU13, TU20
C6 85.7% (24/28) TU4, TU12, TU13, TU20
C7 89.3% (25/28) TU4, TU12, TU13
C8 92.9% (26/28) 🏆 TU4, TU13
C9+ 50–82% Catastrophic forgetting

C8 was selected as best before catastrophic forgetting set in at C9.


Synoema Language Quick Reference

-- Ternary (no if/else!)
max x y = ? x > y -> x : y

-- Pattern matching
fact 0 = 1
fact n = n * fact (n - 1)

-- List comprehension  
evens = [x | x <- [1..20], x % 2 == 0]

-- Space-separated lists (NOT commas)
main = qsort [3 1 4 1 5 9]

-- ADT
Shape = Circle Int | Rect Int Int
area (Circle r) = 3 * r * r

License

Apache 2.0 — same as Qwen2.5-Coder base model.

Synoema is © Andrey Bubnov. See synoema.tech.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results