Tally SQLCoder - Fine-tuned for TallyPrime ERP
Created by: Jay Viramgami
A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for TallyPrime ERP systems.
π― Model Description
This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally.
Key Features
- π¦ Accounting Domain Expertise - Understands financial terms, GST, vouchers, ledgers
- π 28 Database Tables - Covers all master and transaction tables from Tally
- π ICAI Compliant - Based on Indian accounting standards
- π Fast Inference - Optimized with QLoRA for efficient deployment
- π― High Accuracy - Fine-tuned on 5,000+ Tally-specific query pairs
Use Cases
- Customer receivables and vendor payables analysis
- Sales and purchase reporting
- Inventory and stock management queries
- GST and tax compliance reports
- Financial statements (Profit & Loss, Balance Sheet)
- Voucher and transaction searches
π Model Details
- Base Model: defog/llama-3-sqlcoder-8b
- Fine-tuning Method: QLoRA (4-bit quantization)
- Training Data: 5,000 synthetic Tally accounting text-to-SQL pairs
- Target Database: PostgreSQL (Tally migration schema with 28 tables)
- Training Platform: Kaggle (NVIDIA T4 GPU)
- Training Time: ~4 hours
- Final Training Loss: 0.05-0.07
Query Categories Supported
| Category | Templates | Examples |
|---|---|---|
| Simple Filters | 20 | "Show all customers", "List bank accounts" |
| Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" |
| Aggregations | 25 | "Total sales amount", "Top 10 customers" |
| Joins | 25 | "Customer-wise outstanding", "Item sales by godown" |
| Accounting | 20 | "P&L items", "Assets and liabilities" |
| GST/Tax | 15 | "GST collected", "TDS deducted" |
| Inventory | 15 | "Stock movements", "Items with zero balance" |
| Financial Statements | 10 | "Trial balance", "Balance sheet data" |
π Quick Start
Installation
pip install transformers peft torch accelerate bitsandbytes
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"defog/llama-3-sqlcoder-8b",
device_map="auto",
torch_dtype=torch.float16
)
# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned")
tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned")
# Generate SQL
question = "Show all customers with outstanding balance above 50000"
schema = """CREATE TABLE mst_ledger (
name VARCHAR(1024),
parent VARCHAR(1024),
closing_balance DECIMAL(17,2)
);"""
prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{schema}
### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("[SQL]")[-1].strip()
print(sql)
Output:
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Sundry Debtors'
AND closing_balance > 50000
π Example Queries
Simple Customer Query
Question: "Show all customers" Generated SQL:
SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors'
Sales Analysis
Question: "What is the total sales amount for March 2024?" Generated SQL:
SELECT SUM(ABS(amount)) as total_sales
FROM trn_accounting ta
JOIN trn_voucher tv ON ta.guid = tv.guid
WHERE tv.voucher_type = 'Sales'
AND tv.date BETWEEN '2024-03-01' AND '2024-03-31'
Top Customers
Question: "Show top 10 customers by sales" Generated SQL:
SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales
FROM trn_voucher tv
JOIN trn_accounting ta ON tv.guid = ta.guid
WHERE tv.voucher_type = 'Sales'
GROUP BY tv.party_name
ORDER BY total_sales DESC
LIMIT 10
GST Query
Question: "Show GST payable amount" Generated SQL:
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Duties & Taxes'
AND name LIKE '%GST%'
ποΈ Database Schema
The model is trained on a PostgreSQL schema with 28 tables from TallyPrime:
Master Tables (15)
mst_ledger- Customers, vendors, banks, expenses, incomesmst_group- Account group hierarchymst_stock_item- Inventory items with GST detailsmst_stock_group- Stock categoriesmst_vouchertype- Voucher type definitionsmst_godown- Warehouse locationsmst_cost_centre- Cost centers- And 8 more...
Transaction Tables (13)
trn_voucher- All financial transactionstrn_accounting- Ledger-wise entriestrn_inventory- Item-wise stock movementstrn_bill- Bill allocationstrn_bank- Bank transaction details- And 8 more...
π Training Details
Dataset
- Size: 5,000 text-to-SQL pairs
- Source: Synthetically generated using 150 query templates
- Split: 90/10 train/test
- Categories: 8 query types covering all Tally operations
Training Configuration
- Method: QLoRA (Quantized Low-Rank Adaptation)
- Quantization: 4-bit (NF4)
- LoRA Rank: 16
- LoRA Alpha: 32
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Batch Size: 2 per device
- Gradient Accumulation: 4 steps
- Learning Rate: 2e-4
- Epochs: ~3 (1,600 steps)
- Optimizer: PagedAdamW 8-bit
- Max Sequence Length: 2048 tokens
Hardware
- Platform: Kaggle Notebooks
- GPU: NVIDIA T4 (16GB)
- Training Time: ~4 hours
π Performance
- Valid SQL Syntax: >95%
- Keyword Match: >85%
- Exact Match (normalized): >70%
β οΈ Limitations
- Tally-Specific: Optimized for TallyPrime PostgreSQL schema
- PostgreSQL Only: SQL generated for PostgreSQL dialect
- Schema Required: Needs database schema in the prompt
- Context Window: Limited to 2048 tokens
- Custom Schemas: May require additional fine-tuning for non-Tally schemas
π§ Deployment Tips
For Production Use:
- Add validation - Verify generated SQL before execution
- Read-only mode - Restrict to SELECT queries only
- Query timeout - Set execution time limits
- Error handling - Catch and handle syntax errors
- Logging - Track all queries for audit
Optimization:
- Use GPU for faster inference (2-3 seconds per query)
- CPU inference works but is slower (~10-15 seconds)
- Consider caching frequently asked queries
π License
This model is released under the Apache 2.0 license, inheriting from the base model.
π Acknowledgments
- Base Model: defog/llama-3-sqlcoder-8b by Defog.ai
- Training Standards: ICAI (Institute of Chartered Accountants of India) Foundation Course
- Platform: Trained on Kaggle's free GPU infrastructure
π§ Contact
Author: Jay Viramgami
For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page.
π Related Resources
Citation
If you use this model in your work, please cite:
@misc{tally-sqlcoder-finetuned,
author = {Jay Viramgami},
title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned}
}
Model Card created by Jay Viramgami | March 2024
- Downloads last month
- 21
Model tree for jaykv/tally-sqlcoder-finetuned
Base model
defog/llama-3-sqlcoder-8b