Qwen2.5-7B-Instruct-TISER

This repository contains the full-weight merged version of the Qwen2.5-7B-Instruct model, fine-tuned on the TISER dataset.

The model was trained using LoRA on an NVIDIA A800 (80GB) and subsequently merged with the original base model for seamless deployment. This version does not require additional adapters and can be used as a standalone model.

Model Description

Base Model: Qwen/Qwen2.5-7B-Instruct
Training Method: SFT (Supervised Fine-Tuning) via LoRA
Merge Method: Linear Merge (Adapter weights integrated into base layers)
Target Domain: TISER-specific instructions and conversational logic

Training Hyperparameters

The following hyperparameters were used during the fine-tuning process:

Parameter	Value
Max Sequence Length	2048 (Optimized for TISER long-context)
Batch Size (Global)	16 (8 per device × 2 Gradient Accumulation)
Learning Rate	2e-4
Optimizer	AdamW
LR Scheduler	Cosine with Warmup
Precision	Bfloat16 (BF16)
LoRA Config	r=16, alpha=32, target_modules="all-linear"

Quick Start

Recommended for GPUs with 16GB+ VRAM (e.g., A100, RTX 3090/4090).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "xueyufeizhang/Qwen2.5-7B-Instruct-TISER"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2" # Highly recommended for A100/A800/H100
)
model.eval()

# Chat template usage
messages = [
    {"role": "system", "content": "You are a helpful assistant specialized in TISER logic."},
    {"role": "user", "content": "Your query here..."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7
)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, generated_ids)]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Limitations & Disclaimer

This model is a research artifact. While fine-tuned on TISER data, it inherits the biases of the original Qwen2.5 base model. Users should perform safety filtering and factual verification for production use cases.

Downloads last month: 61

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for xueyufeizhang/Qwen2.5-7B-Instruct-TISER

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2313)

this model

xueyufeizhang
/

Qwen2.5-7B-Instruct-TISER