ravilution/MolmoWeb-4B

This is a full-precision Hugging Face– and vLLM-compatible release of allenai/MolmoWeb-4B, a vision-based web agent model by Ai2 capable of navigating and interacting with web browsers.

It follows the same idea as ravilution/MolmoWeb-8B-8bit-mlx: a personal Hub copy with a clear description and practical loading notes—here for the 4B dense checkpoint rather than an MLX quantization.

Note: This is a 4B parameter model (four safetensors shards). A few post-download patches were applied locally so tokenization and generation metadata match what downstream stacks (including vLLM) expect: eos_token_id / bos_token_id / pad_token_id, transformers_version in config.json and generation_config.json, and the tokenizer pretokenizer regex (Mistral-style (?i:...) fix). Patches are idempotent if you re-run them on a fresh download.

Refer to the original model card for benchmarks, architecture, training data, and intended use.

Use with Transformers

pip install -U transformers accelerate torch pillow

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "ravilution/MolmoWeb-4B"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float32,
    attn_implementation="sdpa",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True,
    padding_side="left",
)

Provenance

Upstream weights: allenai/MolmoWeb-4B
Changes on top: compatibility patches only (config / generation_config / tokenizer metadata as above); no retraining or architectural edits.

License

Apache 2.0 — see the original model for details. Please review Ai2’s Responsible Use Guidelines for intended use.

Downloads last month: 31

Safetensors

Model size

5B params

Tensor type

F32

Model tree for ravilution/MolmoWeb-4B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

allenai/MolmoWeb-4B

Finetuned

(1)

this model