AI & ML interests

Research on next generation open source image models

multimodalart 
posted an update 5 months ago
view post
Post
21324
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
  • 1 reply
·
multimodalart 
posted an update 9 months ago
view post
Post
18226
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing
·
isidentical 
posted an update over 1 year ago
isidentical 
posted an update over 1 year ago
view post
Post
723
Added FLUX.1 pro/dev/schnell and AuraFlow v0.2 to fal/imgsys !!! Go play with it and get us some votez
isidentical 
posted an update over 1 year ago
view post
Post
1989
fal/AuraFlow-v0.3 is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
multimodalart 
posted an update over 1 year ago
isidentical 
posted an update over 1 year ago
view post
Post
4086
Announcing the second open model in our Aura series of media models at @fal : fal/AuraFlow

Try it using diffusers or ComfyUI from publicly available weights, and read more about it in our blog https://blog.fal.ai/auraflow.
  • 3 replies
·
isidentical 
posted an update over 1 year ago
view post
Post
1552
It is time for some Aura.

First in our series of fully open sourced / commercially available models by @fal-ai : AuraSR - a 600M parameter upscaler based on GigaGAN.

Blog: https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/

HF: https://huggingface.co/fal-ai/AuraSR

Code: https://github.com/fal-ai/aura-sr

Playground: https://fal.ai/models/fal-ai/aura-sr/playground

What other models would you like to see open-sourced and commercially available? :)
isidentical 
posted an update almost 2 years ago
view post
Post
1272
One shot evaluations is hard. That is honestly what I learnt throughout the last couple of weeks trying to make imgsys.org data more and more relevant. There is just so much diversity in these models that saying one is better than other one even at a particular domain is impossible.

If you have any suggestions on how we can make the testing easier for one shot, single question image model testing; please give your suggestions under this thread so we can provide a more meaningful data point to the community!
multimodalart 
posted an update almost 2 years ago
view post
Post
28621
The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼️✨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🤝 chinese understanding

Try it out by yourself here ▶️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
Warlord-K 
posted an update almost 2 years ago
view post
Post
1701
What are some areas that Image generation models are currently lacking in?
·
isidentical 
posted an update almost 2 years ago
multimodalart 
posted an update about 2 years ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
·
multimodalart 
posted an update about 2 years ago
multimodalart 
posted an update about 2 years ago
isidentical 
posted an update about 2 years ago
view post
Post
What is the current SOTA in terms of fast personalized image generation? Most of the techniques that produce great results (which is hard to objectively measure, but subject similarity index being close to 80-90%) take either too much time (full on DreamBooth fine-tuning the base model) or or loose on the auxilary properties (high rank LoRAs).

We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?
·