Instructions to use stabilityai/stable-diffusion-3-medium-tensorrt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use stabilityai/stable-diffusion-3-medium-tensorrt with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| pipeline_tag: text-to-image | |
| inference: false | |
| library_name: tensorrt | |
| license: other | |
| license_name: stabilityai-nc-research-community | |
| license_link: LICENSE | |
| tags: | |
| - tensorrt | |
| - sd3 | |
| - sd3-medium | |
| - text-to-image | |
| - onnx | |
| extra_gated_prompt: >- | |
| By clicking "Agree", you agree to the [License | |
| Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE) | |
| and acknowledge Stability AI's [Privacy | |
| Policy](https://stability.ai/privacy-policy). | |
| extra_gated_fields: | |
| Name: text | |
| Email: text | |
| Country: country | |
| Organization or Affiliation: text | |
| Receive email updates and promotions on Stability AI products, services, and research?: | |
| type: select | |
| options: | |
| - 'Yes' | |
| - 'No' | |
| I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox | |
| language: | |
| - en | |
| # Stable Diffusion 3 Medium TensorRT | |
| ## Introduction | |
| This repository hosts the TensorRT version of **Stable Diffusion 3 Medium** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency. | |
| Stable Diffusion 3 Medium is a fast generative text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities. | |
| ## Model Details | |
| ### Model Description | |
| Stable Diffusion 3 Medium combines a diffusion transformer architecture and flow matching. | |
| - **Developed by:** Stability AI | |
| - **Model type:** MMDiT text-to-image model | |
| - **Model Description:** This is a conversion of the [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) model | |
| ## Performance using TensorRT 10.1 | |
| #### Timings for 50 steps at 1024x1024 | |
| | Accelerator | CLIP-G | CLIP-L | T5XXL | MMDiT | VAE Decoder | Total | | |
| |-------------|-------------|--------------|---------------|-----------------------|---------------------|------------------------| | |
| | A100 | 11.95 ms | 5.04 ms | 21.39 ms | 5468.17 ms | 72.25 ms | 5622.47 ms | | |
| #### Timings for 30 steps at 1024x1024 with input image conditioning | |
| | Accelerator | VAE Encoder | CLIP-G | CLIP-L | T5XXL | MMDiT | VAE Decoder | Total | | |
| |-------------|----------------|-------------|--------------|---------------|-----------------------|---------------------|----------------| | |
| | A100 | 37.04 ms | 12.07 ms | 5.07 ms | 21.49 ms | 3340.69 ms | 72.02 ms | 3531.49 ms | | |
| ## Int8 quantization with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) | |
| The MMDiT in Stable Diffusion 3 Medium can be further optimized with INT8 quantization using TensorRT Model Optimizer. The estimated end-to-end speedup comparing TensorRT fp16 and TensorRT int8 is 1.2x~1.4x on various NVidia GPUs. The memory saving is about 2x for the int8 MMDiT engine compared with the fp16 counterpart. The image quality can be maintained with minimal to negligible degradation. | |
| ## Usage Example | |
| <!-- Finalize the branch and namespace --> | |
| 1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd3/demo/Diffusion/README.md) on launching a TensorRT NGC container. | |
| ```shell | |
| git clone https://github.com/NVIDIA/TensorRT.git | |
| cd TensorRT | |
| git checkout release/sd3 | |
| docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.05-py3 /bin/bash | |
| ``` | |
| 2. Download the Stable Diffusion 3 Medium TensorRT files from this repo | |
| ```shell | |
| git lfs install | |
| git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt | |
| cd stable-diffusion-3-medium-tensorrt | |
| git lfs pull | |
| cd .. | |
| ``` | |
| 3. Install libraries and requirements | |
| ```shell | |
| cd demo/Diffusion | |
| python3 -m pip install --upgrade pip | |
| pip3 install -r requirements.txt | |
| python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 | |
| ``` | |
| 4. Perform TensorRT optimized inference: | |
| - **Stable Diffusion 3 Medium** | |
| Works best for 1024x1024 images. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. | |
| ``` | |
| python3 demo_txt2img_sd3.py \ | |
| "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \ | |
| --version=sd3 \ | |
| --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \ | |
| --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \ | |
| --seed 42 \ | |
| --width 1024 \ | |
| --height 1024 \ | |
| --build-static-batch \ | |
| --use-cuda-graph | |
| ``` | |
| - **Stable Diffusion 3 Medium with input image conditioning** | |
| Provide an input image conditioning using below. Works best for 1024x1024 but may also work at 512x512. | |
| ``` | |
| wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png -O dog-on-bench.png | |
| python3 demo_txt2img_sd3.py \ | |
| "dog wearing a sweater and a blue collar" \ | |
| --version=sd3 \ | |
| --onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \ | |
| --engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \ | |
| --seed 42 \ | |
| --width 1024 \ | |
| --height 1024 \ | |
| --input-image dog-on-bench.png \ | |
| --build-static-batch \ | |
| --use-cuda-graph | |
| ``` | |