Download ImageNet dataset, and place it in your IMAGENET_PATH
.
- Clone the repository:
git clone https://github.com/ML-GSAI/eMIGM.git
cd eMIGM
- Set up environment:
bash init_env.sh
-
Download pretrained models:
- VAE (for 256×256 generation):
Download link - DC-AE (for 512×512 generation):
Hugging Face model card
- VAE (for 256×256 generation):
-
Prepare FID reference statistics:
- Download
fid_stats
- Place the directory in the project root as
fid_stats/
Note: For 256×256 generation, you only need the
fid_stats_imagenet256_guided_diffusion.npz
file. Thefid_stats_imagenet512_guided_diffusion.npz
file is required only for 512×512 generation.For the
fid_stats_imagenet256_guided_diffusion.npz
file, you need to convert the data type ofmu
to float32 using the following script:import numpy as np # Load the npz file data = np.load('fid-stats/fid_stats_imagenet256_guided_diffusion.npz') # Extract data, convert to float32, and save mu = data['mu'].astype(np.float32) sigma = data['sigma'] np.savez('fid-stats/fid_stats_imagenet256_guided_diffusion.npz', mu=mu, sigma=sigma)
- Download
We gratefully acknowledge the original authors for providing the pretrained models.
To optimize training efficiency, we recommend pre-caching the latent representations of your ImageNet dataset. This preprocessing step significantly reduces computational overhead during the training process:
For 256x256 resolution (using VAE):
# Generate latent representations using pre-trained VAE
bash bash/extract/extract_imagenet_256.sh
For 512x512 resolution (using DC-AE):
# Generate latent representations using pre-trained DC-AE
bash bash/extract/extract_imagenet_512.sh
These scripts will pre-process the dataset into compressed latent representations that can be loaded efficiently during model training. We strongly recommend running this caching process before initiating the main training procedure. In our standard workflow, we always perform this preprocessing step prior to training as a best practice.
Supported Model Sizes:
- 256×256 Resolution: Five model variants available
eMIGM-XS
,eMIGM-S
,eMIGM-B
,eMIGM-L
,eMIGM-H
- 512×512 Resolution: Four model variants available
eMIGM-XS
,eMIGM-S
,eMIGM-B
,eMIGM-L
We provide dedicated training scripts for both 256×256 and 512×512 resolutions:
Configuration and Execution:
-
Script Locations:
- Training scripts for 256x256 models are located in:
bash/train/256
- Training scripts for 512x512 models are located in:
bash/train/512
- Training scripts for 256x256 models are located in:
-
Example Command (256x256, eMIGM-S):
# Train the eMIGM-S model at 256x256 resolution. bash bash/train/256/train_emigm_small.sh
-
Parameter Configuration: Before running the training scripts, you must modify the following parameters within the script:
output_dir
: The directory where training outputs (checkpoints, logs) will be saved.nnodes
: The number of distributed training nodes.nproc_per_node
: The number of processes to launch per node (typically, the number of GPUs per node).data_path
: The path to your ImageNet dataset.cached_path
: The path to the pre-cached latent representations (created by the extraction scripts, e.g.,extract_imagenet_256.sh
). This is crucial for efficient training.vae_path
: The path to the pretrained VAE or DC-AE model.
Note: The
cached_path
should match the output directory specified in yourextract_imagenet_256.sh
orextract_imagenet_512.sh
execution. Thevae_path
should be the path to the pretrained VAE or DC-AE model, for 256x256 resolution, you can use the pretrained VAE model, and for 512x512 resolution, you can use the pretrained DC-AE model.
Pre-trained models are available for download at: https://huggingface.co/GSAI-ML/eMIGM
We provide dedicated evaluation scripts for both 256×256 and 512×512 resolutions:
Configuration and Execution:
-
Script Locations:
- Evaluation scripts for 256x256 models:
bash/evaluate/evaluate_256.sh
- Evaluation scripts for 512x512 models:
bash/evaluate/evaluate_512.sh
- Evaluation scripts for 256x256 models:
-
Example Command:
# Evaluate a model at 256x256 resolution bash bash/evaluate/evaluate_256.sh
-
Parameter Configuration: Before running the evaluation scripts, you must modify the following parameters:
ckpt_path
: The path to the downloaded pre-trained model checkpoint.output_dir
: The directory where evaluation outputs and logs will be saved.nnodes
: The number of distributed evaluation nodes.nproc_per_node
: The number of processes per node (typically the number of GPUs per node).data_path
: The path to your ImageNet dataset.vae_path
: The path to the pretrained VAE (for 256x256) or DC-AE model (for 512x512).
Note: For 256x256 resolution evaluations, use the pretrained VAE model. For 512x512 resolution evaluations, use the pretrained DC-AE model.
A large portion of codes in this repo is based on MAR and DPM-Solver.
If you have any questions, feel free to contact me through email ([email protected]). Enjoy using eMIGM!