UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Shanghai Innovation Institute

🔥 News

[2026/01/07] 🔥🔥 For clearer visualization, we provide an online 🖼️gallery that showcases generated images during baseline evaluations. Enjoy it!
[2025/12/05] 🔥🔥 Lumina-DiMOO, UniWorld-V2, and DreamOmni2 are added to all 🏅Leaderboard.
[2025/11/03] 🔥🔥 We release UniREditBench, UniREdit-Data-100K, UniREdit-Bagel-[BF16/FP32], and 🏆 Leaderboard !!
[2025/11/02] 🔥🔥 We release paper and project page of UniREditBench!!

Introduction

We propose UniREditBench, a unified benchmark for reasoning-based image editing assessment with broader evaluation dimension coverage and robust evaluation pipeline. We also design an automated multi-scenario data synthesis pipeline and construct UniREdit-Data-100K, a large-scale synthetic dataset with high-quality chain-of-thought (CoT) reasoning annotations. We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.

✨ Highlights:

Broader Scenario and Reasoning Dimension Coverage: It contains 2,700 high-quality samples organized into 8 primary reasoning dimensions and 18 sub-categories, spanning both real-world and game-world image editing tasks.
Reliable Dual-Reference Evaluation.: For each sample assessment, we design both the textual reference and ground-truth (GT) image reference. This multi-modal reference enables vision-language model (VLM) evaluators to perform direct and fine-grained comparisons at both the textual and visual levels with the generated images, leading to more reliable evaluation.

🔥 Set Up Environment

conda create -n uniredit python=3.10 -y
conda activate uniredit
pip install -r requirements.txt
pip install flash_attn==2.7.0.post1 --no-build-isolation

You can also install flash_attn via:

# for cuda11 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu11torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

# for cuda12 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

🔧 Benchmark and Checkpoint Preparation

Benchmark Preparation

huggingface-cli download --resume-download maplebb/UniREditBench  --local-dir ./UniREditBench
cd UniREditBench
unzip original_image.zip
unzip reference_image.zip

UniREdit-Bagel Checkpoint Preparation

huggingface-cli download --resume-download maplebb/UniREdit-Bagel  --local-dir ./ckpt

pip install safetensors

python merge_ckpt.py

📑 Prompt Introduction

Each prompt in our benchmark is recorded as a dict in a .json file, combining with structured annotations for evaluation.

original_image_path: Path of the original image.
reference_image_path: Path of the reference image.
instruction: The editing instruction.
rules(only for game-world scenario): The concise descriptions of the specific game rules.
name: The name of evaluation dimension.
idx: Index of the evaluation example.
reference_effect: The textual reference of edited effect.

🚀 Inference

GPUS=8
model_path=./ckpt
input_path=./UniREditBench
output_path=./output_images

# Image Editing with Reasoning
torchrun \
    --nnodes=1 \
    --nproc_per_node=$GPUS \
    gen_images_mp_uniredit.py \
    --input_dir $input_path \
    --output_dir $output_path \
    --metadata_file ./UniREditBench/data.json \
    --max_latent_size 64 \
    --model-path $model_path \
    --think

✨ Evaluation

We are using the API version: gpt-4.1-2025-04-14

python -u eval/gpt_eval_uniredit.py \
  --input ./UniREditBench \
  --data ./UniREditBench/data.json \
  --output ./output_images \
  --nproc 6

A detailed .csv results file will also be saved in the /dir_of_edit_images directory.

💻 Training

1. UniREdit-Data-100K Download

huggingface-cli download --repo-type dataset --resume-download maplebb/UniREdit-Data-100K  --local-dir ./UniREdit-Data-100K

cd UniREdit-Data-100K

# For linux (Debian、Ubuntu)
apt-get install p7zip-full 
7z x UniREdit-Data-100K.zip

2. Prepare Training Parquet

mkdir training_data

python -u gen_train_json_and_parquet.py --src_json ./UniREdit-Data-100K/train_data.json --dataset_dir ./UniREdit-Data-100 --out_json ./training_data/all_data.json --out_parquet_dir ./training_data

3. Train

Edit every placeholder in data/dataset_info.py.
Clone the github repository of Bagel.
Replace Bagel's data/ with our data/.
Reference train.sh and the training guide of Bagel for fine-tuning.

📧 Contact

If you have any comments or questions, please open a new issue or feel free to contact Feng Han and Yibin Wang.

⭐ Citation

@article{unireditbench,
  title={UniREditBench: A Unified Reasoning-based Image Editing Benchmark},
  author={Han, Feng and Wang, Yibin and Li, Chenglin and Liang, Zheming and Wang, Dianyi and Jiao, Yang and Wei, Zhipeng and Gong, Chao and Jin, Cheng and Chen, Jingjing and others},
  journal={arXiv preprint arXiv:2511.01295},
  year={2025}
}

Name	Name	Last commit message	Last commit date
Latest commit History 59 Commits 59 Commits
UniREdit-Bagel	UniREdit-Bagel
data	data
docs	docs
eval	eval
.gitignore	.gitignore
README.md	README.md
benchmarking_results_new.png	benchmarking_results_new.png
gen_images_mp_uniredit.py	gen_images_mp_uniredit.py
gen_train_json_and_parquet.py	gen_train_json_and_parquet.py
merge_ckpt.py	merge_ckpt.py
motivation_fig_new.png	motivation_fig_new.png
motivation_tab.png	motivation_tab.png
requirements.txt	requirements.txt
run_mp_editing.sh	run_mp_editing.sh
run_scripts.sh	run_scripts.sh
train.sh	train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

🔥 News

Introduction

✨ Highlights:

🔥 Set Up Environment

🔧 Benchmark and Checkpoint Preparation

📑 Prompt Introduction

🚀 Inference

✨ Evaluation

💻 Training

1. UniREdit-Data-100K Download

2. Prepare Training Parquet

3. Train

📧 Contact

⭐ Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

🔥 News

Introduction

✨ Highlights:

🔥 Set Up Environment

🔧 Benchmark and Checkpoint Preparation

📑 Prompt Introduction

🚀 Inference

✨ Evaluation

💻 Training

1. UniREdit-Data-100K Download

2. Prepare Training Parquet

3. Train

📧 Contact

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages