Qwen2.5-Coder-7B β€” Text-to-SQL (SFT + DPO)

A LoRA adapter fine-tuned on Qwen/Qwen2.5-Coder-7B-Instruct for text-to-SQL generation, achieving 78.2% result accuracy on Spider V1 β€” outperforming both frontier models used to build its training data.

Model Spider V1 Result Accuracy
This model (7B fine-tuned) 78.2%
Grok-4 (xAI) 73.7%
DeepSeek V3 71.8%

GitHub: finetuning-text-to-sql Β· Dataset: jk200201/spider-dpo-1040


How to Use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen2.5-Coder-7B-Instruct"
adapter = "jk200201/qwen2.5-coder-7b-sql-dpo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

schema = """
CREATE TABLE concerts (concert_id INT, concert_name TEXT, year INT);
CREATE TABLE singers (singer_id INT, name TEXT, country TEXT);
CREATE TABLE singer_in_concert (concert_id INT, singer_id INT);
"""

question = "How many singers performed in concerts held after 2015?"

prompt = f"""You are an expert SQL query generator. Given a database schema and a natural language question, generate a single valid SQL query that answers the question. Output ONLY the SQL query, nothing else.

### Schema:
{schema}

### Question:
{question}

### SQL:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.1)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Methodology

Stage 1 β€” SFT on 7,000 Spider training examples (gold SQL), 3 epochs, LR 2e-4.

Stage 2 β€” DPO on 1,040 preference pairs built from frontier model disagreements:

  1. Ran Grok-4 and DeepSeek V3 on all 1,034 Spider dev questions
  2. For cases where both models were correct but wrote different SQL, used Gemini 2.5 Flash as a third-party judge to pick the better query
  3. Used those judgments as DPO preference signal
DPO pair category Count
Judge-resolved (both correct, different SQL) 478
Gold SQL vs wrong model output 445
Clear preference (one right, one wrong) 117

Training infra: AWS EC2 g5.xlarge (NVIDIA A10G), QLoRA 4-bit NF4, LoRA rank 32 alpha 64, via LLaMA-Factory.


Training Hyperparameters

Parameter SFT DPO
Epochs 3 2
Learning rate 2e-4 5e-5
LR schedule cosine cosine
Effective batch size 16 16
LoRA rank / alpha 32 / 64 32 / 64
Quantization 4-bit NF4 4-bit NF4
Final loss 0.0715 0.2208

Citation

If you use the Spider dataset, please cite:

@inproceedings{Yu2018Spider,
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author    = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev},
  booktitle = {EMNLP},
  year      = {2018}
}
Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jk200201/qwen2.5-coder-7b-sql-dpo

Base model

Qwen/Qwen2.5-7B
Adapter
(500)
this model