Instructions to use Motif-Technologies/Motif-Video-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Motif-Technologies/Motif-Video-2B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Motif-Technologies/Motif-Video-2B", dtype=torch.bfloat16, device_map="cuda") prompt = "A vibrant blue jay perches gracefully on a slender branch, its feathers shimmering in the soft morning light. The bird's keen eyes scan the surroundings, capturing the essence of the tranquil forest. It flutters its wings briefly, showcasing the intricate patterns of blue, white, and black on its plumage. The background reveals a lush canopy of green leaves, with rays of sunlight filtering through, creating a dappled effect on the forest floor. The blue jay then tilts its head, emitting a melodious call that echoes through the serene woodland, adding a touch of magic to the peaceful scene." image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
There will be model AI video Generator on 14B, 20B, 27B, 32B MoE to active 7B, 8B, 9B? And.... support model AI video Generator type Reasoning/thinking?
Good evening. I saw your model and it looks very interesting, even though it's a small AI model, I think you should try upgrading it from the original 2B to 5B, 7B, 8B, 9B, 10B on MoE. I hope you can outperform the latest closed-code AI models video Generator Wan 2.7, Sora 2, bro.
Thank you! We鈥檙e looking into these directions for Motif Video 2 and will share updates as we progress.
Since Motif is willing to explore ideas for more parameter efficient models, I'd just like to suggest an idea I've had indications would be promising for a while now from hobbyist scale tests. Once the model is stable, you could potentially try removing the text encoder and training direct conditioning embeddings against the frozen DiT (or even initializing them from the text model's encodings, done a word / phrase at a time without padding to build your new tokenizer and embedding layer). I found that could achieve results in older unet models beyond what full finetuning could do, with concept weights able to train in isolated ways without interfering with each other.