MV-LLaVA-7B Model Card

Model details

Model type: MV-LLaVA-7B is an open-source chatbot for 3D multi-view images trained by fine-tuning CLIP vision tower and LLaMA/Vicuna on GPT4-Vision-assisted BS-Objaverse data and ShareGPT4V data.

Model date: MV-LLaVA-7B was trained in Apr, 2024.

Paper or resources for more information: [Project] [Paper] [Code]

Usage

You can directly utilize this model as we provide in our [repository].

License

Intended use

Primary intended uses: The primary use of ShareGPT4V-7B is research on large multimodal models and chatbots for 3D content. Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

1.2M ShareGPT4V-PT data
30K GPT4-Vision-generated multi-view image-text pairs
LLaVA instruction-tuning data

Downloads last month: 8

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Zery/MV-LLaVA-7B

Collection including Zery/MV-LLaVA-7B

Bootstrap3D

Collection

[ICCV2025] Official Implementation of "Bootstrap3D: Improving 3D Content Creation with Synthetic Data" • 7 items • Updated Aug 28, 2025 • 4

Paper for Zery/MV-LLaVA-7B

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Paper • 2406.00093 • Published May 31, 2024 • 1