Qwen2.5-Coder-7B — Text-to-SQL (SFT + DPO)

A LoRA adapter fine-tuned on Qwen/Qwen2.5-Coder-7B-Instruct for text-to-SQL generation, achieving 78.2% result accuracy on Spider V1 — outperforming both frontier models used to build its training data.

Model	Spider V1 Result Accuracy
This model (7B fine-tuned)	78.2%
Grok-4 (xAI)	73.7%
DeepSeek V3	71.8%

GitHub: finetuning-text-to-sql · Dataset: jk200201/spider-dpo-1040

How to Use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "jk200201/qwen2.5-coder-7b-sql-dpo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

schema = """
CREATE TABLE concerts (concert_id INT, concert_name TEXT, year INT);
CREATE TABLE singers (singer_id INT, name TEXT, country TEXT);
CREATE TABLE singer_in_concert (concert_id INT, singer_id INT);
"""

question = "How many singers performed in concerts held after 2015?"

prompt = f"""You are an expert SQL query generator. Given a database schema and a natural language question, generate a single valid SQL query that answers the question. Output ONLY the SQL query, nothing else.

### Schema:
{schema}

### Question:
{question}

### SQL:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Methodology

Stage 1 — SFT on 7,000 Spider training examples (gold SQL), 3 epochs, LR 2e-4.

Stage 2 — DPO on 1,040 preference pairs built from frontier model disagreements:

Ran Grok-4 and DeepSeek V3 on all 1,034 Spider dev questions
For cases where both models were correct but wrote different SQL, used Gemini 2.5 Flash as a third-party judge to pick the better query
Used those judgments as DPO preference signal

DPO pair category	Count
Judge-resolved (both correct, different SQL)	478
Gold SQL vs wrong model output	445
Clear preference (one right, one wrong)	117

Training infra: AWS EC2 g5.xlarge (NVIDIA A10G), QLoRA 4-bit NF4, LoRA rank 32 alpha 64, via LLaMA-Factory.

Training Hyperparameters

Parameter	SFT	DPO
Epochs	3	2
Learning rate	2e-4	5e-5
LR schedule	cosine	cosine
Effective batch size	16	16
LoRA rank / alpha	32 / 64	32 / 64
Quantization	4-bit NF4	4-bit NF4
Final loss	0.0715	0.2208

Citation

If you use the Spider dataset, please cite:

@inproceedings{Yu2018Spider,
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author    = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev},
  booktitle = {EMNLP},
  year      = {2018}
}

Downloads last month: 30

Model tree for jk200201/qwen2.5-coder-7b-sql-dpo

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Adapter

(500)

this model