ravilution/MolmoWeb-4B
This is a full-precision Hugging Face– and vLLM-compatible release of allenai/MolmoWeb-4B, a vision-based web agent model by Ai2 capable of navigating and interacting with web browsers.
It follows the same idea as ravilution/MolmoWeb-8B-8bit-mlx: a personal Hub copy with a clear description and practical loading notes—here for the 4B dense checkpoint rather than an MLX quantization.
Note: This is a 4B parameter model (four
safetensorsshards). A few post-download patches were applied locally so tokenization and generation metadata match what downstream stacks (including vLLM) expect:eos_token_id/bos_token_id/pad_token_id,transformers_versioninconfig.jsonandgeneration_config.json, and the tokenizer pretokenizer regex (Mistral-style(?i:...)fix). Patches are idempotent if you re-run them on a fresh download.
Refer to the original model card for benchmarks, architecture, training data, and intended use.
Use with Transformers
pip install -U transformers accelerate torch pillow
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
model_id = "ravilution/MolmoWeb-4B"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float32,
attn_implementation="sdpa",
device_map="auto",
)
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True,
padding_side="left",
)
Provenance
- Upstream weights:
allenai/MolmoWeb-4B - Changes on top: compatibility patches only (config / generation_config / tokenizer metadata as above); no retraining or architectural edits.
License
Apache 2.0 — see the original model for details. Please review Ai2’s Responsible Use Guidelines for intended use.
- Downloads last month
- 31