HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation

Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Xihui Liu, Ziwei Liu, Daniel Y Fu, Ming-Yu Liu, Christopher Re, David W. Romero

HMAR Samples: Class-conditional ImageNet generated samples at 256×256 and 512×512 resolutions.

Method Overview

Install

Ensure torch>=2.0.0 with CUDA is installed.

# clone
git clone https://github.com/Kumbong/HMAR
cd HMAR

# install dependencies
pip install -r requirements.txt

# Download the vqvae tokenizer from VAR
wget https://huggingface.co/FoundationVision/var/resolve/main/vae_ch160v4096z32.pth

# Turn on triton autotuning to ensure kernels are tuned for specific hardware
export TRITON_AUTO_TUNING=1

Training

Prepare the ImageNet dataset. It should be in a path /path/to/imagenet with subfolders train and validate.

Train HMAR-{d16, d20, d24, d30, d36-s} on ImageNet 256x256 or 512x512, for next-scale prediction.

# d16, 256x256, for d20, d24, d30 etc, change the experiment accordingly
torchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py  --experiment=hmar-train-d16 --data_path='/path/to/imagenet'

NOTE: We provide training configs in e.g config/experiment/hmar-train-d16.yaml.

Finetuning

Introduce masked prediction and combine it with next-scale prediction in HMAR-{d16, d20, d24, d30, d36-s} on ImageNet 256x256 or 512x512.

# d16, 256x256, for d20, d24, d30 etc, change the experiment accordingly
torchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py  --experiment=hmar-finetune-mask-d16 --data_path='/path/to/imagenet'

NOTE: We provide finetuning configs in e.g config/experiment/hmar-finetune-mask-d16.yaml.

Sampling

We provide a sampling script sample.py to generate images with HMAR.

# 1) you can change the sampling configs from config/sampling/hmar-d30.yaml
# 2) you can change the number of masked sampling steps from utils/sampling_arg_util.py 
python sample.py --checkpoint=hmar-d30

Evaluation

To compute FID, Inception Score, Precision and Recall, or to reproduce the numbers from our paper

# generate 50K samples to be used for evaluation 
python -m evaluate.generate_samples --checkpoint=hmar-d16

# compute FID, IS, precision, recall on the generated samples
python -m evaluate.compute_metrics --checkpoint=hmar-d16

Benchmarking

To benchmark the attention kernels, e2e training and inference speedups, or reproduce the efficiency numbers reported in our paper.

# Ensure that triton kernels are tuned for specific hardware
export TRITON_AUTO_TUNING=1

# stand alone attention kernels performance
python -m benchmark.attention --sparsity_pattern="block_diagonal"

# end-to-end training performance 
python -m benchmark.training

# inference performance
python -m benchmark.inference

We report numbers on A100 80Gb SXM4, CUDA Version: 12.5 and triton 3.2.0

Acknowledgement

We would like to acknowledge the following projects, from which code in this codebase has been derived:

VAR
MaskGIT.

Citation

 @article{kumbong2024hmar,
            title     = {HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation},
            author    = {Kumbong, Hermann and Liu, Xian and Lin, Tsung-Yi and Liu, Xihui and Liu, Ziwei and Fu, Daniel Y and Liu, Ming-Yu and Re, Christopher and Romero, David W},
            journal   = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year      = {2025},
            url       = {https://arxiv.org/abs/2506.04421}
          }

Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits 2 Commits
assets	assets
attn	attn
benchmark	benchmark
config	config
evaluate	evaluate
models	models
utils	utils
.gitignore	.gitignore
ATTRIBUTIONS.md	ATTRIBUTIONS.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
README.md	README.md
dist.py	dist.py
finetune.py	finetune.py
mp_trainer.py	mp_trainer.py
nsp_trainer.py	nsp_trainer.py
requirements.txt	requirements.txt
sample.py	sample.py
train.py	train.py
trainer.py	trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation

Method Overview

Install

Training

Finetuning

Sampling

Evaluation

Benchmarking

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation

Method Overview

Install

Training

Finetuning

Sampling

Evaluation

Benchmarking

Acknowledgement

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages