Instructions to use R-Kentaren/grok-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use R-Kentaren/grok-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="R-Kentaren/grok-2")# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("R-Kentaren/grok-2") model = AutoModelForCausalLM.from_pretrained("R-Kentaren/grok-2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use R-Kentaren/grok-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "R-Kentaren/grok-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "R-Kentaren/grok-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/R-Kentaren/grok-2
- SGLang
How to use R-Kentaren/grok-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "R-Kentaren/grok-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "R-Kentaren/grok-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "R-Kentaren/grok-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "R-Kentaren/grok-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use R-Kentaren/grok-2 with Docker Model Runner:
docker model run hf.co/R-Kentaren/grok-2
Grok 2
This repository contains the weights of Grok 2, a model trained and used at xAI in 2024.
Usage: Serving with SGLang
Download the weights. You can replace
/local/grok-2with any other folder name you prefer.hf download xai-org/grok-2 --local-dir /local/grok-2You might encounter some errors during the download. Please retry until the download is successful.
If the download succeeds, the folder should contain 42 files and be approximately 500 GB.Launch a server.
Install the latest SGLang inference engine (>= v0.5.1) from https://github.com/sgl-project/sglang/
Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend tritonSend a request.
This is a post-trained model, so please use the correct chat template.
python3 -m sglang.test.send_one --prompt "Human: What is your name?<|separator|>\n\nAssistant:"You should be able to see the model output its name, Grok.
Learn more about other ways to send requests here.
License
The weights are licensed under the Grok 2 Community License Agreement.
- Downloads last month
- 20
Model tree for R-Kentaren/grok-2
Base model
xai-org/grok-2