Instructions to use Xwin-LM/XwinCoder-34B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Xwin-LM/XwinCoder-34B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Xwin-LM/XwinCoder-34B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Xwin-LM/XwinCoder-34B") model = AutoModelForCausalLM.from_pretrained("Xwin-LM/XwinCoder-34B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Xwin-LM/XwinCoder-34B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Xwin-LM/XwinCoder-34B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xwin-LM/XwinCoder-34B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Xwin-LM/XwinCoder-34B
- SGLang
How to use Xwin-LM/XwinCoder-34B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Xwin-LM/XwinCoder-34B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xwin-LM/XwinCoder-34B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Xwin-LM/XwinCoder-34B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xwin-LM/XwinCoder-34B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Xwin-LM/XwinCoder-34B with Docker Model Runner:
docker model run hf.co/Xwin-LM/XwinCoder-34B
How does this compare to deepseekcoder?
Im curious how this compared to the deepseek-coder 33b model? They say that model can outperform gpt-3.5 and even get close to gpt-4
Oooo man im testing them both right now, same quant and settings and they are very similar. I do want to say that xwincoder is slightly better.
@dillfrescott I did my own testing, and although the code output seems ok, idk the logic doesnt seem very good. The model also refuses to admit that its code is incorrect alot and refuses to fix it. Deepseek coder does a great job troubleshooting its own code. I think i give the win to deepseek coder here. Ive tested the 13b xwincoder, 34b xwin coder, 6.7 deepseek and 33b deepseek, and only the 33b deepseek-coder-instruct model has impressed me at all. Its the closest ive seen to gpt-3.5.
Oh cool!