Instructions to use Envoid/Dendrite-II-22B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Envoid/Dendrite-II-22B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Envoid/Dendrite-II-22B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Envoid/Dendrite-II-22B") model = AutoModelForCausalLM.from_pretrained("Envoid/Dendrite-II-22B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Envoid/Dendrite-II-22B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Envoid/Dendrite-II-22B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Envoid/Dendrite-II-22B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Envoid/Dendrite-II-22B
- SGLang
How to use Envoid/Dendrite-II-22B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Envoid/Dendrite-II-22B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Envoid/Dendrite-II-22B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Envoid/Dendrite-II-22B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Envoid/Dendrite-II-22B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Envoid/Dendrite-II-22B with Docker Model Runner:
docker model run hf.co/Envoid/Dendrite-II-22B
This model is lit.
The characters have attitude and give interesting replies. Maybe the original merged-in 33b is better? Not sure as it's the first I'm hearing of it, but this has more soul than the L2 70b. Scale it up.
Glad you are enjoying the model! Unfortunately scaling up 1:1 is dependent upon meta releasing a LLaMa-2-34B.
I did experiment with a 32.9B model that involved diagonal merging 65B (with Enterredaas qlora merged) onto llama-2-chat-13B which results in a roughly 33B model but the model was "oops all attention heads" and the results were unfortunately not very good. I could look for an alternate 33B model as a recipient model for the script but it's quite likely it would not end up with the same temperament as Dendrite which I feel the chat model contributes a lot to. That said I do plan to experiment with other model combinations of finetunes to see what else yields usable results and if LLaMa-2-chat-34B ever comes out I am already planning to try to scale up the project.