Instructions to use kai-os/Carnice-V2-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kai-os/Carnice-V2-27b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="kai-os/Carnice-V2-27b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("kai-os/Carnice-V2-27b")
model = AutoModelForImageTextToText.from_pretrained("kai-os/Carnice-V2-27b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use kai-os/Carnice-V2-27b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kai-os/Carnice-V2-27b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/kai-os/Carnice-V2-27b

SGLang

How to use kai-os/Carnice-V2-27b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kai-os/Carnice-V2-27b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kai-os/Carnice-V2-27b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use kai-os/Carnice-V2-27b with Docker Model Runner:
```
docker model run hf.co/kai-os/Carnice-V2-27b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Carnice-V2-27B for Hermes Agent

Carnice-V2-27B is a full merged BF16 SFT of Qwen/Qwen3.6-27B for Hermes-style agent traces. This repository contains the standalone merged model weights, not only a LoRA adapter.

BF16 Transformers Loading Fix

The BF16 safetensors were republished with corrected Qwen3_5ForConditionalGeneration tensor prefixes. The original merge artifact accidentally serialized an extra Unsloth wrapper prefix, which caused direct HF Transformers loads to report the real weights as unexpected keys and initialize expected layers randomly. GGUF files were not affected because the GGUF conversion path normalized those prefixes.

Benchmarks

Metric	Qwen3.6-27B base	Carnice SFT
IFEval prompt strict, limit 20	85.0%	90.0%
IFEval prompt loose, limit 20	85.0%	90.0%
IFEval instruction strict, limit 20	90.0%	93.3%
IFEval instruction loose, limit 20	90.0%	93.3%
Held-out assistant-token eval loss	0.607	0.414
Held-out assistant-token eval perplexity	1.835	1.513

The benchmark artifact bundle is included under benchmarks/. It contains the rendered graph, extracted metrics.json, benchmark scripts, and raw result files used to make the chart.

Scope note: the IFEval run is a short limit=20 A/B smoke benchmark, not an official full leaderboard score. Held-out loss/perplexity is the exact assistant-only training-format validation metric from the SFT script. The raw BFCL two-case smoke files are included for auditability, but they are too small to use as a model-quality claim.

Training

This checkpoint was produced from the recovered 8K split-window Carnice run:

Item	Value
Base model	`Qwen/Qwen3.6-27B`
SFT framework	Unsloth/PEFT LoRA, then merged to BF16 safetensors
Loss mask	Assistant-token-only
Context/windowing	8,192 token windows with 1,024 token overlap
Train rows before windowing	3,473
Train windows	6,554
Eval examples	110
Source mix	1,508 Carnice rows, 1,015 DJLougen Hermes rows, 950 Lambda GLM-5.1 Hermes rows

Usage

import torch
from transformers import AutoModelForImageTextToText, AutoTokenizer

model_id = "kai-os/carnice-v2-27b"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

This is intended for agentic Hermes-style use. Validate with your own agent harness before relying on it for production behavior.