IndexError: list index out of range
Hello,
I am running it like the following:
vllm 0.20.1rc1.dev91+ga749a33d8
transformers 5.7.0
mistral-common 1.11.1
vllm serve /vllm-workspace/models/Mistral-Medium-3.5-128B/ --port 1234 --tensor-parallel-size 4 --tool-call-parser mistral --enable-auto-tool-choice --reasoning-parser mistral --max_num_batched_tokens 16384 --max_num_seqs 128 --speculative_config '{"model": "/vllm-workspace/models/Mistral-Medium-3.5-128B-EAGLE", "num_speculative_tokens": 3, "method": "eagle", "max_model_len": "262144"}' --served-model-name mistral-medium --api-key ABC --max-model-len 262144
and running into this issue
(APIServer pid=5779) INFO: Started server process [5779]
(APIServer pid=5779) INFO: Waiting for application startup.
(APIServer pid=5779) INFO: Application startup complete.
(APIServer pid=5779) INFO: 123.456.789.101:12345 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] Error in chat completion stream generator.
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] Traceback (most recent call last):
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 1008, in chat_completion_stream_generator
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] actual_call = tool_parser.streamed_args_for_tool[index]
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] IndexError: list index out of range
Anybody else?
Hey just took a look and I didn't manage to repro your error.
Is it possible for you to create an issue on vLLM directly with the serve command and the request sent so that i can try with your complete workflow ?
Don't hesitate to ping me there (same tag as here @juliendenize ) and adding the issue link here as well
I am also seeing this with pretty much any tool call.
Hey just took a look and I didn't manage to repro your error.
Is it possible for you to create an issue on vLLM directly with the serve command and the request sent so that i can try with your complete workflow ?
Don't hesitate to ping me there (same tag as here @juliendenize ) and adding the issue link here as well
I had left a note on that vllm issue as well. It has happened using every vllm (nightly) version I've tested since release (in opencode as well as pi). There's one comment on that issue that it only happens when streaming the response, but that was for a different model. I haven't tested that yet.
yes, @juliendenize I'll post my workflow on vllm github.
should i post to this issue for housekeeping reasons https://github.com/vllm-project/vllm/issues/33916 or should I create a new one?
Think it can be in the same issue ! I'll get to the bottom of this ASAP i wasn't aware of this but apparently it's been a while. In my setup i wasn't able to repro though so don't hesitate to put as much context as possible.
alright, cu tomorrow
I've also had the same issue for quite a while with Mistral-Small-3.2 and Devstral 2.
Adding an example request to v1/chat/completions:
{
"stream": true,
"parallel_tool_calls": false,
"messages": [
{
"role": "system",
"content": "You are a release notes AI assistant."
},
{
"role": "user",
"content": "Hi, is there anything new on release 26.01?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "retrieve_relevant_changes",
"description": "Arguments for retrieving changes relevant with the given parameters.",
"parameters": {
"properties": {
"query": {
"description": "The question of the user",
"type": "string"
},
"resultcount": {
"description": "How many records to retrieve",
"type": "integer"
},
"releases": {
"default": [],
"description": "The releases given for retrieving changes",
"items": {
"type": "string"
},
"type": "array"
},
"modules": {
"default": [],
"description": "The modules given for retrieving changes",
"items": {
"type": "string"
},
"type": "array"
}
},
"required": [
"query","resultcount","modules"
],
"type": "object"
}
}
}
],
"tool_choice": "auto"
}
Response (truncated):
(...)
data: {"id":"chatcmpl-a4fcf2a6f7760526","object":"chat.completion.chunk","created":1777967400,"model":"mistralai/Devstral-Small-2-24B-Instruct-2512","choices":[{"index":0,"delta":{"tool_calls":[{"id":"ju9zq9ZXY","type":"function","index":0,"function":{"name":"retrieve_relevant_changes","arguments":"{\"query\": \"What's new\", \"resultcount\": "}}]},"logprobs":null,"finish_reason":null,"token_ids":null}]}
data: {"error": {"message": "list index out of range", "type": "InternalServerError", "param": null, "code": 500}}
data: [DONE]
On the vllm sever side, the error is the same as the original comment.
I managed to repro --stream-interval in serve command seems to be the one triggering a path that doesn't seem supported by our tool call parser. I'm taking a deeper look and will update here and on github once i have a pr ready.
In your original issue you didn't paste it with this argument, is it a wrong paste or am i missing something else ?
I can confirm that I am also seeing this issue without using the --stream-interval command, using vLLM nightly over 8 RTX 3090.
My Lllama-swap config -
"mistral-medium-3.5":
name: "Mistral Medium 3.5"
ttl: 1800
env:
- "OMP_NUM_THREADS=1"
- "CUDA_DEVICE_ORDER=PCI_BUS_ID"
- "CUDA_VISIBLE_DEVICES=0,3,6,7,2,4,5,1"
- "PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True"
- "HF_TOKEN=hf_token"
- "CUDA_HOME=/opt/cuda"
- "VLLM_MARLIN_USE_ATOMIC_ADD=1"
- "VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1"
- "VLLM_USE_NCCL_SYMM_MEM=0"
- "SAFETENSORS_FAST_GPU=1"
cmd: |
uv run vllm serve mistralai/Mistral-Medium-3.5-128B
--host 0.0.0.0
--port ${PORT}
--seed 3407
--disable-custom-all-reduce
--served-model-name mistral-medium-3.5
--tensor-parallel-size 8
--max-model-len auto
--gpu-memory-utilization 0.915
--attention-backend flashinfer
--gdn-prefill-backend flashinfer
--enable-flashinfer-autotune
--reasoning-parser mistral
--tool-call-parser mistral
--enable-auto-tool-choice
--language-model-only
--max-num-seqs 4
--max-num-batched-tokens 2048
--enable-prefix-caching
--enable-chunked-prefill
--dtype bfloat16
--kv-cache-dtype fp8
--generation-config auto
--override-generation-config '{"temperature": 0.7}'
--default-chat-template-kwargs '{"reasoning_effort": "high"}'
--speculative_config '{"model": "mistralai/Mistral-Medium-3.5-128B-EAGLE", "num_speculative_tokens": 3, "method": "eagle", "max_model_len": "65536", "attention_backend": "flashinfer"}'
I have attempted a quick test using opencode and your PR works for me for tools like read but it fails with the same "list index out of range" error when attempting the todo-list tool.
Hmm it's annoying.
I've just spined up a model with opencode and i don't have the issue, todo is working. Sorry for the inconvenience but do you think there is any way you could share a reproducible error with this PR.
yes, FYI, did you ask it via opencode something like "can you explore the repository?"
Yeah and here is my provider / model config
"vllm": {
"npm": "@ai-sdk/openai-compatible",
"name": "vLLM",
"options": {
"baseURL": "http://localhost:8000/v1"
},
"models": {
"mistralai/Mistral-Medium-3.5-128B": {
"name": "vLLM"
}
}
@juliendenize it works now. I rebuilt it again, I must have fat fingered something on my first attempt. Sorry to have caused a confusion. Thanks for your effort!