How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "modelscope/llama3-8b-agent-instruct-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "modelscope/llama3-8b-agent-instruct-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "modelscope/llama3-8b-agent-instruct-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "modelscope/llama3-8b-agent-instruct-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Quick Links

Fine-tuning the llama3-8b-instruct model using the msagent-pro dataset and the loss_scale technique with swift, the script is as follows:

NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MASTER_PORT=29500 \
swift sft \
    --model_type llama3-8b-instruct \
    --learning_rate 2e-5 \
    --sft_type lora \
    --dataset msagent-pro \
    --gradient_checkpointing true \
    --gradient_accumulation_steps 8 \
    --deepspeed default-zero3 \
    --lora_target_modules ALL \
    --use_loss_scale true \
    --save_strategy epoch \
    --batch_size 1 \
    --num_train_epochs 2 \
    --max_length 4096 \
    --preprocess_num_proc 4 \
    --use_loss_scale true \
    --loss_scale_config_path agent-flan \
    --ddp_backend nccl \

Comparison with the Original Model on the ToolBench Evaluation Set

Model ToolBench (in-domain) ToolBench (out-of-domain)
Plan.EM Act.EM HalluRate (lower is better) Avg.F1 R-L Plan.EM Act.EM HalluRate (lower is better) Avg.F1
llama3-8b-instruct 74.22 36.17 15.68 20.0 12.14 69.47 34.21 14.72 20.25
llama3-8b-agent-instruct-v2 85.15 58.1 1.57 52.10 26.02 85.79 59.43 2.56 52.19

For detailed explanations of the evaluation metrics, please refer to document

Deploy this model:

USE_HF=True swift deploy \
  --model_id_or_path modelscope/llama3-8b-agent-instruct-v2 \
  --model_type llama3-8b-instruct \
  --infer_backend vllm \
  --tools_prompt toolbench
Downloads last month
7
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support