Effective and Efficient Masked Image Generation Models
_{Official PyTorch Implementation}

Preparation

Dataset

Download ImageNet dataset, and place it in your IMAGENET_PATH.

Installation

Clone the repository:

git clone https://github.com/ML-GSAI/eMIGM.git
cd eMIGM

Set up environment:

bash init_env.sh

Download pretrained models:
- VAE (for 256×256 generation):
  Download link
- DC-AE (for 512×512 generation):
  Hugging Face model card
Prepare FID reference statistics:
- Download fid_stats
- Place the directory in the project root as fid_stats/
Note: For 256×256 generation, you only need the fid_stats_imagenet256_guided_diffusion.npz file. The fid_stats_imagenet512_guided_diffusion.npz file is required only for 512×512 generation.

For the fid_stats_imagenet256_guided_diffusion.npz file, you need to convert the data type of mu to float32 using the following script:
```
 import numpy as np
 # Load the npz file
 data = np.load('fid-stats/fid_stats_imagenet256_guided_diffusion.npz')
 # Extract data, convert to float32, and save
 mu = data['mu'].astype(np.float32)
 sigma = data['sigma']
 np.savez('fid-stats/fid_stats_imagenet256_guided_diffusion.npz', mu=mu, sigma=sigma)
```

We gratefully acknowledge the original authors for providing the pretrained models.

Accelerate Training with Latent Caching

To optimize training efficiency, we recommend pre-caching the latent representations of your ImageNet dataset. This preprocessing step significantly reduces computational overhead during the training process:

For 256x256 resolution (using VAE):

# Generate latent representations using pre-trained VAE
bash bash/extract/extract_imagenet_256.sh

For 512x512 resolution (using DC-AE):

# Generate latent representations using pre-trained DC-AE 
bash bash/extract/extract_imagenet_512.sh

These scripts will pre-process the dataset into compressed latent representations that can be loaded efficiently during model training. We strongly recommend running this caching process before initiating the main training procedure. In our standard workflow, we always perform this preprocessing step prior to training as a best practice.

Usage

Supported Model Sizes:

256×256 Resolution: Five model variants available
eMIGM-XS, eMIGM-S, eMIGM-B, eMIGM-L, eMIGM-H
512×512 Resolution: Four model variants available
eMIGM-XS, eMIGM-S, eMIGM-B, eMIGM-L

Training Instructions

We provide dedicated training scripts for both 256×256 and 512×512 resolutions:

Configuration and Execution:

Script Locations:
- Training scripts for 256x256 models are located in: bash/train/256
- Training scripts for 512x512 models are located in: bash/train/512

Example Command (256x256, eMIGM-S):

# Train the eMIGM-S model at 256x256 resolution.
bash bash/train/256/train_emigm_small.sh

Parameter Configuration: Before running the training scripts, you must modify the following parameters within the script:
- output_dir: The directory where training outputs (checkpoints, logs) will be saved.
- nnodes: The number of distributed training nodes.
- nproc_per_node: The number of processes to launch per node (typically, the number of GPUs per node).
- data_path: The path to your ImageNet dataset.
- cached_path: The path to the pre-cached latent representations (created by the extraction scripts, e.g., extract_imagenet_256.sh). This is crucial for efficient training.
- vae_path: The path to the pretrained VAE or DC-AE model.

Note: The cached_path should match the output directory specified in your extract_imagenet_256.sh or extract_imagenet_512.sh execution. The vae_path should be the path to the pretrained VAE or DC-AE model, for 256x256 resolution, you can use the pretrained VAE model, and for 512x512 resolution, you can use the pretrained DC-AE model.

Evaluation Instructions

Pre-trained models are available for download at: https://huggingface.co/GSAI-ML/eMIGM

We provide dedicated evaluation scripts for both 256×256 and 512×512 resolutions:

Configuration and Execution:

Script Locations:
- Evaluation scripts for 256x256 models: bash/evaluate/evaluate_256.sh
- Evaluation scripts for 512x512 models: bash/evaluate/evaluate_512.sh

Example Command:

# Evaluate a model at 256x256 resolution
bash bash/evaluate/evaluate_256.sh

Parameter Configuration: Before running the evaluation scripts, you must modify the following parameters:
- ckpt_path: The path to the downloaded pre-trained model checkpoint.
- output_dir: The directory where evaluation outputs and logs will be saved.
- nnodes: The number of distributed evaluation nodes.
- nproc_per_node: The number of processes per node (typically the number of GPUs per node).
- data_path: The path to your ImageNet dataset.
- vae_path: The path to the pretrained VAE (for 256x256) or DC-AE model (for 512x512).

Note: For 256x256 resolution evaluations, use the pretrained VAE model. For 512x512 resolution evaluations, use the pretrained DC-AE model.

Acknowledgements

A large portion of codes in this repo is based on MAR and DPM-Solver.

Contact

If you have any questions, feel free to contact me through email ([email protected]). Enjoy using eMIGM!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bash		bash
diffusion		diffusion
models		models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_emigm.py		engine_emigm.py
init_env.sh		init_env.sh
main_cache.py		main_cache.py
main_cache_dc.py		main_cache_dc.py
main_emigm.py		main_emigm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Effective and Efficient Masked Image Generation Models
_{Official PyTorch Implementation}

Preparation

Dataset

Installation

Accelerate Training with Latent Caching

Usage

Training Instructions

Evaluation Instructions

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Languages

License

ML-GSAI/eMIGM

Folders and files

Latest commit

History

Repository files navigation

Effective and Efficient Masked Image Generation ModelsOfficial PyTorch Implementation

Preparation

Dataset

Installation

Accelerate Training with Latent Caching

Usage

Training Instructions

Evaluation Instructions

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Effective and Efficient Masked Image Generation Models
_{Official PyTorch Implementation}

Packages