Diffusers documentation
Reinforcement learning training with DDPO
Get started
Pipelines
Adapters
Inference
Inference optimization
Modular Diffusers
Training
OverviewCreate a dataset for trainingAdapt a model to a new taskTrain a diffusion model
Models
Methods
Textual InversionDreamBoothLoRACustom DiffusionLatent Consistency DistillationReinforcement learning training with DDPO
NeMo AutomodelQuantization
Model accelerators and hardware
Specific pipeline examples
Resources
API
Reinforcement learning training with DDPO
You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in Training Diffusion Models with Reinforcement Learning, which is implemented in 🤗 TRL with the DDPOTrainer.
For more information, check out the DDPOTrainer API reference and the Finetune Stable Diffusion Models with DDPO via TRL blog post.