Tally SQLCoder - Fine-tuned for TallyPrime ERP

Created by: Jay Viramgami

A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for TallyPrime ERP systems.

🎯 Model Description

This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally.

Key Features

  • 🏦 Accounting Domain Expertise - Understands financial terms, GST, vouchers, ledgers
  • πŸ“Š 28 Database Tables - Covers all master and transaction tables from Tally
  • πŸŽ“ ICAI Compliant - Based on Indian accounting standards
  • πŸš€ Fast Inference - Optimized with QLoRA for efficient deployment
  • πŸ’― High Accuracy - Fine-tuned on 5,000+ Tally-specific query pairs

Use Cases

  • Customer receivables and vendor payables analysis
  • Sales and purchase reporting
  • Inventory and stock management queries
  • GST and tax compliance reports
  • Financial statements (Profit & Loss, Balance Sheet)
  • Voucher and transaction searches

πŸ“Š Model Details

  • Base Model: defog/llama-3-sqlcoder-8b
  • Fine-tuning Method: QLoRA (4-bit quantization)
  • Training Data: 5,000 synthetic Tally accounting text-to-SQL pairs
  • Target Database: PostgreSQL (Tally migration schema with 28 tables)
  • Training Platform: Kaggle (NVIDIA T4 GPU)
  • Training Time: ~4 hours
  • Final Training Loss: 0.05-0.07

Query Categories Supported

Category Templates Examples
Simple Filters 20 "Show all customers", "List bank accounts"
Date Ranges 20 "Sales in March 2024", "Payments this quarter"
Aggregations 25 "Total sales amount", "Top 10 customers"
Joins 25 "Customer-wise outstanding", "Item sales by godown"
Accounting 20 "P&L items", "Assets and liabilities"
GST/Tax 15 "GST collected", "TDS deducted"
Inventory 15 "Stock movements", "Items with zero balance"
Financial Statements 10 "Trial balance", "Balance sheet data"

πŸš€ Quick Start

Installation

pip install transformers peft torch accelerate bitsandbytes

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "defog/llama-3-sqlcoder-8b",
    device_map="auto",
    torch_dtype=torch.float16
)

# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned")
tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned")

# Generate SQL
question = "Show all customers with outstanding balance above 50000"
schema = """CREATE TABLE mst_ledger (
    name VARCHAR(1024),
    parent VARCHAR(1024),
    closing_balance DECIMAL(17,2)
);"""

prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]

### Database Schema
The query will run on a database with the following schema:
{schema}

### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("[SQL]")[-1].strip()
print(sql)

Output:

SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Sundry Debtors'
AND closing_balance > 50000

πŸ“ Example Queries

Simple Customer Query

Question: "Show all customers" Generated SQL:

SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors'

Sales Analysis

Question: "What is the total sales amount for March 2024?" Generated SQL:

SELECT SUM(ABS(amount)) as total_sales
FROM trn_accounting ta
JOIN trn_voucher tv ON ta.guid = tv.guid
WHERE tv.voucher_type = 'Sales'
AND tv.date BETWEEN '2024-03-01' AND '2024-03-31'

Top Customers

Question: "Show top 10 customers by sales" Generated SQL:

SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales
FROM trn_voucher tv
JOIN trn_accounting ta ON tv.guid = ta.guid
WHERE tv.voucher_type = 'Sales'
GROUP BY tv.party_name
ORDER BY total_sales DESC
LIMIT 10

GST Query

Question: "Show GST payable amount" Generated SQL:

SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Duties & Taxes'
AND name LIKE '%GST%'

πŸ—„οΈ Database Schema

The model is trained on a PostgreSQL schema with 28 tables from TallyPrime:

Master Tables (15)

  • mst_ledger - Customers, vendors, banks, expenses, incomes
  • mst_group - Account group hierarchy
  • mst_stock_item - Inventory items with GST details
  • mst_stock_group - Stock categories
  • mst_vouchertype - Voucher type definitions
  • mst_godown - Warehouse locations
  • mst_cost_centre - Cost centers
  • And 8 more...

Transaction Tables (13)

  • trn_voucher - All financial transactions
  • trn_accounting - Ledger-wise entries
  • trn_inventory - Item-wise stock movements
  • trn_bill - Bill allocations
  • trn_bank - Bank transaction details
  • And 8 more...

πŸ“ˆ Training Details

Dataset

  • Size: 5,000 text-to-SQL pairs
  • Source: Synthetically generated using 150 query templates
  • Split: 90/10 train/test
  • Categories: 8 query types covering all Tally operations

Training Configuration

  • Method: QLoRA (Quantized Low-Rank Adaptation)
  • Quantization: 4-bit (NF4)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Batch Size: 2 per device
  • Gradient Accumulation: 4 steps
  • Learning Rate: 2e-4
  • Epochs: ~3 (1,600 steps)
  • Optimizer: PagedAdamW 8-bit
  • Max Sequence Length: 2048 tokens

Hardware

  • Platform: Kaggle Notebooks
  • GPU: NVIDIA T4 (16GB)
  • Training Time: ~4 hours

πŸ“Š Performance

  • Valid SQL Syntax: >95%
  • Keyword Match: >85%
  • Exact Match (normalized): >70%

⚠️ Limitations

  • Tally-Specific: Optimized for TallyPrime PostgreSQL schema
  • PostgreSQL Only: SQL generated for PostgreSQL dialect
  • Schema Required: Needs database schema in the prompt
  • Context Window: Limited to 2048 tokens
  • Custom Schemas: May require additional fine-tuning for non-Tally schemas

πŸ”§ Deployment Tips

For Production Use:

  1. Add validation - Verify generated SQL before execution
  2. Read-only mode - Restrict to SELECT queries only
  3. Query timeout - Set execution time limits
  4. Error handling - Catch and handle syntax errors
  5. Logging - Track all queries for audit

Optimization:

  • Use GPU for faster inference (2-3 seconds per query)
  • CPU inference works but is slower (~10-15 seconds)
  • Consider caching frequently asked queries

πŸ“„ License

This model is released under the Apache 2.0 license, inheriting from the base model.

πŸ™ Acknowledgments

  • Base Model: defog/llama-3-sqlcoder-8b by Defog.ai
  • Training Standards: ICAI (Institute of Chartered Accountants of India) Foundation Course
  • Platform: Trained on Kaggle's free GPU infrastructure

πŸ“§ Contact

Author: Jay Viramgami

For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page.

πŸ”— Related Resources


Citation

If you use this model in your work, please cite:

@misc{tally-sqlcoder-finetuned,
  author = {Jay Viramgami},
  title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned}
}

Model Card created by Jay Viramgami | March 2024

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jaykv/tally-sqlcoder-finetuned

Adapter
(235)
this model