Instructions to use trajkovnikola/MKLLM-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use trajkovnikola/MKLLM-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="trajkovnikola/MKLLM-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("trajkovnikola/MKLLM-7B-Instruct") model = AutoModelForCausalLM.from_pretrained("trajkovnikola/MKLLM-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use trajkovnikola/MKLLM-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "trajkovnikola/MKLLM-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "trajkovnikola/MKLLM-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/trajkovnikola/MKLLM-7B-Instruct
- SGLang
How to use trajkovnikola/MKLLM-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "trajkovnikola/MKLLM-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "trajkovnikola/MKLLM-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "trajkovnikola/MKLLM-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "trajkovnikola/MKLLM-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use trajkovnikola/MKLLM-7B-Instruct with Docker Model Runner:
docker model run hf.co/trajkovnikola/MKLLM-7B-Instruct
MKLLM-7B-Instruct
MKLLM-7B is an open-source Large Language Model for the Macedonian language. The model is built on top of the amazing Mistral-7B-v0.1 model by continued pretraining on a mix of Macedonian and English text. A corpus of around 300M tokens, repeated in 2 epochs, was used for the training and even though this might be considered small compared to other similar projects, the resulting model is very capable in understanding and processing the Macedonian language.
This is the instruction-tuned version of MKLLM-7B. It was trained by taking MKLLM-7B and then performing a full instruction training with axolotl by using the chatml format for conversations.
We tested the model against Meta's Llama3-8B-Instruct and Mistral's Mistral-7B-Instruct-v0.3 on a set of benchmarks we translated in Macedonian and the model performs better than both leading models in its category.
Additionally, these benchmarks are primarily focused on understanding and do not measure generation capabilities and fluency, in these categories we believe there's an even larger difference in performance as MKLLM-7B-Instruct writes much more coherent Macedonian.
The benchmarking was done with: https://github.com/N13T/mk-llm-eval

In order to leverage the instruction training your prompt should follow the chatml format:
<|im_start|>system
Разговор помеѓу љубопитен корисник и асистент со вештачка интелигенција. Асистентот дава корисни, детални и љубезни одговори на прашањата на корисникот.<|im_end|>
<|im_start|>user
Која планета е позната како 'Црвената Планета'?<|im_end|>
<|im_start|>assistant
Марс<|im_end|>
This prompt is available as a chat template, which means you can format messages using the
tokenizer.apply_chat_template() method:
messages = [
{"role": "system", "content": "Разговор помеѓу љубопитен корисник и асистент со вештачка интелигенција. Асистентот дава корисни, детални и љубезни одговори на прашањата на корисникот."},
{"role": "user", "content": "Која планета е позната како 'Црвената Планета'?"}
]
gen_input = tokenizer.apply_chat_template(messages,
tokenize=True,
return_dict=True,
return_tensors="pt",
add_generation_prompt=True).to("cuda")
with torch.no_grad():
generated_ids = model.generate(**gen_input, max_new_tokens=150,
do_sample=True,
temperature=0.1,
repetition_penalty=1.1,
)
print(tokenizer.decode(generated_ids[0][prompt["input_ids"].shape[1]:], skip_special_tokens=False))
Notes
- MKLLM-7B-Instruct can hallucinate and produce factually incorrect output. This is especially pronounced when discussing Macedonian topics due to the smaller training dataset.
- Downloads last month
- 9