HP-Edit: A Human-Preference Post-Training Framework for Image Editing
Abstract
A post-training framework called HP-Edit is introduced to align image editing models with human preferences using a novel automatic evaluator and a real-world dataset, improving editing quality through reinforcement learning techniques.
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.
Community
We propose HP-Edit, a post-training framework for human-preference aligned image editing, and construct RealPref-50K, a real-world dataset covering 8 common image editing tasks. We also present RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments show that our method significantly improves models such as Qwen-Image-Edit-2509 and aligns their outputs more closely with human preferences.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation (2026)
- EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing (2026)
- OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution (2026)
- ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning (2026)
- EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization (2026)
- ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework (2026)
- Enhancing Spatial Understanding in Image Generation via Reward Modeling (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.19406 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper