IndexError: list index out of range

#19

by paolovic - opened 23 days ago

Hello,

I am running it like the following:

vllm 0.20.1rc1.dev91+ga749a33d8
transformers 5.7.0
mistral-common 1.11.1

vllm serve /vllm-workspace/models/Mistral-Medium-3.5-128B/ --port 1234 --tensor-parallel-size 4 --tool-call-parser mistral --enable-auto-tool-choice --reasoning-parser mistral --max_num_batched_tokens 16384 --max_num_seqs 128 --speculative_config '{"model": "/vllm-workspace/models/Mistral-Medium-3.5-128B-EAGLE", "num_speculative_tokens": 3, "method": "eagle", "max_model_len": "262144"}' --served-model-name mistral-medium --api-key ABC --max-model-len 262144

and running into this issue

(APIServer pid=5779) INFO:     Started server process [5779]
(APIServer pid=5779) INFO:     Waiting for application startup.
(APIServer pid=5779) INFO:     Application startup complete.
(APIServer pid=5779) INFO:     123.456.789.101:12345 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] Error in chat completion stream generator.
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] Traceback (most recent call last):
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 1008, in chat_completion_stream_generator
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137]     actual_call = tool_parser.streamed_args_for_tool[index]
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
(APIServer pid=5779) ERROR 05-02 13:12:27 [serving.py:1137] IndexError: list index out of range

Anybody else?

juliendenize

Mistral AI_ org 21 days ago

Hey just took a look and I didn't manage to repro your error.

Is it possible for you to create an issue on vLLM directly with the serve command and the request sent so that i can try with your complete workflow ?
Don't hesitate to ping me there (same tag as here @juliendenize ) and adding the issue link here as well

SuperbEmphasis

21 days ago

•

edited 20 days ago

I am also seeing this with pretty much any tool call.

https://github.com/vllm-project/vllm/issues/33916

retowyss

20 days ago

Hey just took a look and I didn't manage to repro your error.

Is it possible for you to create an issue on vLLM directly with the serve command and the request sent so that i can try with your complete workflow ?
Don't hesitate to ping me there (same tag as here @juliendenize ) and adding the issue link here as well

I had left a note on that vllm issue as well. It has happened using every vllm (nightly) version I've tested since release (in opencode as well as pi). There's one comment on that issue that it only happens when streaming the response, but that was for a different model. I haven't tested that yet.

paolovic

20 days ago

yes, @juliendenize I'll post my workflow on vllm github.
should i post to this issue for housekeeping reasons https://github.com/vllm-project/vllm/issues/33916 or should I create a new one?

juliendenize

Mistral AI_ org 20 days ago

Think it can be in the same issue ! I'll get to the bottom of this ASAP i wasn't aware of this but apparently it's been a while. In my setup i wasn't able to repro though so don't hesitate to put as much context as possible.

paolovic

20 days ago

alright, cu tomorrow

piarpiar

20 days ago

•

edited 20 days ago

I've also had the same issue for quite a while with Mistral-Small-3.2 and Devstral 2.

Adding an example request to v1/chat/completions:

{
  "stream": true,
  "parallel_tool_calls": false,
  "messages": [
    {
      "role": "system",
      "content": "You are a release notes AI assistant."
    },
    {
      "role": "user",
      "content": "Hi, is there anything new on release 26.01?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "retrieve_relevant_changes",
        "description": "Arguments for retrieving changes relevant with the given parameters.",
        "parameters": {
          "properties": {
            "query": {
              "description": "The question of the user",
              "type": "string"
            },
            "resultcount": {
  "description": "How many records to retrieve",
  "type": "integer"
},
            "releases": {
              "default": [],
              "description": "The releases given for retrieving changes",
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            "modules": {
              "default": [],
              "description": "The modules given for retrieving changes",
              "items": {
                "type": "string"
              },
              "type": "array"
            }
          },
          "required": [
            "query","resultcount","modules"
          ],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Response (truncated):

(...)
data: {"id":"chatcmpl-a4fcf2a6f7760526","object":"chat.completion.chunk","created":1777967400,"model":"mistralai/Devstral-Small-2-24B-Instruct-2512","choices":[{"index":0,"delta":{"tool_calls":[{"id":"ju9zq9ZXY","type":"function","index":0,"function":{"name":"retrieve_relevant_changes","arguments":"{\"query\": \"What's new\", \"resultcount\": "}}]},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"error": {"message": "list index out of range", "type": "InternalServerError", "param": null, "code": 500}}

data: [DONE]

On the vllm sever side, the error is the same as the original comment.

juliendenize

Mistral AI_ org 20 days ago

•

edited 20 days ago

I managed to repro --stream-interval in serve command seems to be the one triggering a path that doesn't seem supported by our tool call parser. I'm taking a deeper look and will update here and on github once i have a pr ready.

In your original issue you didn't paste it with this argument, is it a wrong paste or am i missing something else ?

rmhubbert

20 days ago

I can confirm that I am also seeing this issue without using the --stream-interval command, using vLLM nightly over 8 RTX 3090.

My Lllama-swap config -

"mistral-medium-3.5":
    name: "Mistral Medium 3.5"
    ttl: 1800
    env:
      - "OMP_NUM_THREADS=1"
      - "CUDA_DEVICE_ORDER=PCI_BUS_ID"
      - "CUDA_VISIBLE_DEVICES=0,3,6,7,2,4,5,1"
      - "PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True"
      - "HF_TOKEN=hf_token"
      - "CUDA_HOME=/opt/cuda"
      - "VLLM_MARLIN_USE_ATOMIC_ADD=1"
      - "VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1"
      - "VLLM_USE_NCCL_SYMM_MEM=0"
      - "SAFETENSORS_FAST_GPU=1"
    cmd: |
      uv run vllm serve mistralai/Mistral-Medium-3.5-128B
      --host 0.0.0.0
      --port ${PORT}
      --seed 3407
      --disable-custom-all-reduce
      --served-model-name mistral-medium-3.5
      --tensor-parallel-size 8
      --max-model-len auto
      --gpu-memory-utilization 0.915
      --attention-backend flashinfer
      --gdn-prefill-backend flashinfer
      --enable-flashinfer-autotune
      --reasoning-parser mistral
      --tool-call-parser mistral
      --enable-auto-tool-choice
      --language-model-only
      --max-num-seqs 4
      --max-num-batched-tokens 2048
      --enable-prefix-caching
      --enable-chunked-prefill
      --dtype bfloat16
      --kv-cache-dtype fp8
      --generation-config auto
      --override-generation-config '{"temperature": 0.7}'
      --default-chat-template-kwargs '{"reasoning_effort": "high"}'
      --speculative_config '{"model": "mistralai/Mistral-Medium-3.5-128B-EAGLE", "num_speculative_tokens": 3, "method": "eagle", "max_model_len": "65536", "attention_backend": "flashinfer"}'

juliendenize

Mistral AI_ org 20 days ago

Could you let me know if this PR fixes your issues ?
https://github.com/vllm-project/vllm/pull/41730

retowyss

20 days ago

I have attempted a quick test using opencode and your PR works for me for tools like read but it fails with the same "list index out of range" error when attempting the todo-list tool.

juliendenize

Mistral AI_ org 20 days ago

Hmm it's annoying.

I've just spined up a model with opencode and i don't have the issue, todo is working. Sorry for the inconvenience but do you think there is any way you could share a reproducible error with this PR.

paolovic

20 days ago

yes, FYI, did you ask it via opencode something like "can you explore the repository?"

juliendenize

Mistral AI_ org 20 days ago

Yeah and here is my provider / model config

"vllm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "vLLM",
      "options": {
        "baseURL": "http://localhost:8000/v1"
      },
      "models": {
        "mistralai/Mistral-Medium-3.5-128B": {
          "name": "vLLM"
        }
      }

retowyss

20 days ago

@juliendenize it works now. I rebuilt it again, I must have fat fingered something on my first attempt. Sorry to have caused a confusion. Thanks for your effort!

paolovic

20 days ago

fixed with https://github.com/vllm-project/vllm/pull/41730

paolovic changed discussion status to closed 20 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment