Mark as TensorRT library for download tracking

4de4199 verified 8 months ago

5.44 kB

	---
	pipeline_tag: text-to-image
	inference: false
	library_name: tensorrt
	license: other
	license_name: stabilityai-nc-research-community
	license_link: LICENSE
	tags:
	- tensorrt
	- sd3
	- sd3-medium
	- text-to-image
	- onnx
	extra_gated_prompt: >-
	By clicking "Agree", you agree to the [License
	Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE)
	and acknowledge Stability AI's [Privacy
	Policy](https://stability.ai/privacy-policy).
	extra_gated_fields:
	Name: text
	Email: text
	Country: country
	Organization or Affiliation: text
	Receive email updates and promotions on Stability AI products, services, and research?:
	type: select
	options:
	- 'Yes'
	- 'No'
	I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox
	language:
	- en
	---

	# Stable Diffusion 3 Medium TensorRT
	## Introduction

	This repository hosts the TensorRT version of Stable Diffusion 3 Medium created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency.

	Stable Diffusion 3 Medium is a fast generative text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

	## Model Details

	### Model Description
	Stable Diffusion 3 Medium combines a diffusion transformer architecture and flow matching.

	- Developed by: Stability AI
	- Model type: MMDiT text-to-image model
	- Model Description: This is a conversion of the [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) model


	## Performance using TensorRT 10.1
	#### Timings for 50 steps at 1024x1024

	\| Accelerator \| CLIP-G \| CLIP-L \| T5XXL \| MMDiT \| VAE Decoder \| Total \|
	\|-------------\|-------------\|--------------\|---------------\|-----------------------\|---------------------\|------------------------\|
	\| A100 \| 11.95 ms \| 5.04 ms \| 21.39 ms \| 5468.17 ms \| 72.25 ms \| 5622.47 ms \|

	#### Timings for 30 steps at 1024x1024 with input image conditioning

	\| Accelerator \| VAE Encoder \| CLIP-G \| CLIP-L \| T5XXL \| MMDiT \| VAE Decoder \| Total \|
	\|-------------\|----------------\|-------------\|--------------\|---------------\|-----------------------\|---------------------\|----------------\|
	\| A100 \| 37.04 ms \| 12.07 ms \| 5.07 ms \| 21.49 ms \| 3340.69 ms \| 72.02 ms \| 3531.49 ms \|


	## Int8 quantization with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer)
	The MMDiT in Stable Diffusion 3 Medium can be further optimized with INT8 quantization using TensorRT Model Optimizer. The estimated end-to-end speedup comparing TensorRT fp16 and TensorRT int8 is 1.2x~1.4x on various NVidia GPUs. The memory saving is about 2x for the int8 MMDiT engine compared with the fp16 counterpart. The image quality can be maintained with minimal to negligible degradation.

	## Usage Example
	<!-- Finalize the branch and namespace -->
	1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd3/demo/Diffusion/README.md) on launching a TensorRT NGC container.
	```shell
	git clone https://github.com/NVIDIA/TensorRT.git
	cd TensorRT
	git checkout release/sd3
	docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.05-py3 /bin/bash
	```

	2. Download the Stable Diffusion 3 Medium TensorRT files from this repo
	```shell
	git lfs install
	git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt
	cd stable-diffusion-3-medium-tensorrt
	git lfs pull
	cd ..
	```

	3. Install libraries and requirements
	```shell
	cd demo/Diffusion
	python3 -m pip install --upgrade pip
	pip3 install -r requirements.txt
	python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
	```


	4. Perform TensorRT optimized inference:

	- Stable Diffusion 3 Medium

	Works best for 1024x1024 images. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations.
	```
	python3 demo_txt2img_sd3.py \
	"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
	--version=sd3 \
	--onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \
	--engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \
	--seed 42 \
	--width 1024 \
	--height 1024 \
	--build-static-batch \
	--use-cuda-graph
	```

	- Stable Diffusion 3 Medium with input image conditioning

	Provide an input image conditioning using below. Works best for 1024x1024 but may also work at 512x512.
	```
	wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png -O dog-on-bench.png

	python3 demo_txt2img_sd3.py \
	"dog wearing a sweater and a blue collar" \
	--version=sd3 \
	--onnx-dir /workspace/stable-diffusion-3-medium-tensorrt/ \
	--engine-dir /workspace/stable-diffusion-3-medium-tensorrt/engine \
	--seed 42 \
	--width 1024 \
	--height 1024 \
	--input-image dog-on-bench.png \
	--build-static-batch \
	--use-cuda-graph
	```