Instructions to use trajkovnikola/MKLLM-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use trajkovnikola/MKLLM-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="trajkovnikola/MKLLM-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("trajkovnikola/MKLLM-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("trajkovnikola/MKLLM-7B-Instruct", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use trajkovnikola/MKLLM-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "trajkovnikola/MKLLM-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trajkovnikola/MKLLM-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/trajkovnikola/MKLLM-7B-Instruct

SGLang

How to use trajkovnikola/MKLLM-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "trajkovnikola/MKLLM-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trajkovnikola/MKLLM-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "trajkovnikola/MKLLM-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "trajkovnikola/MKLLM-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use trajkovnikola/MKLLM-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/trajkovnikola/MKLLM-7B-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MKLLM-7B-Instruct

MKLLM-7B is an open-source Large Language Model for the Macedonian language. The model is built on top of the amazing Mistral-7B-v0.1 model by continued pretraining on a mix of Macedonian and English text. A corpus of around 300M tokens, repeated in 2 epochs, was used for the training and even though this might be considered small compared to other similar projects, the resulting model is very capable in understanding and processing the Macedonian language.

This is the instruction-tuned version of MKLLM-7B. It was trained by taking MKLLM-7B and then performing a full instruction training with axolotl by using the chatml format for conversations.

We tested the model against Meta's Llama3-8B-Instruct and Mistral's Mistral-7B-Instruct-v0.3 on a set of benchmarks we translated in Macedonian and the model performs better than both leading models in its category. Additionally, these benchmarks are primarily focused on understanding and do not measure generation capabilities and fluency, in these categories we believe there's an even larger difference in performance as MKLLM-7B-Instruct writes much more coherent Macedonian. The benchmarking was done with: https://github.com/N13T/mk-llm-eval

In order to leverage the instruction training your prompt should follow the chatml format:

<|im_start|>system
Разговор помеѓу љубопитен корисник и асистент со вештачка интелигенција. Асистентот дава корисни, детални и љубезни одговори на прашањата на корисникот.<|im_end|>
<|im_start|>user
Која планета е позната како 'Црвената Планета'?<|im_end|>
<|im_start|>assistant
Марс<|im_end|>

This prompt is available as a chat template, which means you can format messages using the tokenizer.apply_chat_template() method:

messages = [
    {"role": "system", "content": "Разговор помеѓу љубопитен корисник и асистент со вештачка интелигенција. Асистентот дава корисни, детални и љубезни одговори на прашањата на корисникот."},
    {"role": "user", "content": "Која планета е позната како 'Црвената Планета'?"}
]
gen_input = tokenizer.apply_chat_template(messages, 
                                          tokenize=True,
                                          return_dict=True,
                                          return_tensors="pt",
                                          add_generation_prompt=True).to("cuda")
with torch.no_grad():
  generated_ids = model.generate(**gen_input, max_new_tokens=150,
                                                do_sample=True,
                                                temperature=0.1,
                                                repetition_penalty=1.1,
                                 )
print(tokenizer.decode(generated_ids[0][prompt["input_ids"].shape[1]:], skip_special_tokens=False))

Notes

MKLLM-7B-Instruct can hallucinate and produce factually incorrect output. This is especially pronounced when discussing Macedonian topics due to the smaller training dataset.

Downloads last month: 4

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for trajkovnikola/MKLLM-7B-Instruct

Finetunes

2 models

Quantizations

6 models