Skip to content

[ICML 2025] Official code for the paper 'DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space'

License

Notifications You must be signed in to change notification settings

forever208/DCTdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DCTdiff

Official PyTorch implementation of our ICML 2025 paper: DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

The repository is based on UViT. We keep the UViT code unchanged in the UViT branch where you can find our baseline implementation.

We will also release the DCTdiff based on DiT in the near future

img.png

Installation

The installation works both for DCTdiff and UViT

conda create -n dctdiff python==3.9
conda activate dctdiff

pip install matplotlib
pip install accelerate==0.33.0   # (auto install pytorch 2.4)
pip install absl-py ml_collections einops wandb ftfy==6.1.1 transformers==4.23.1
pip install opencv-python
pip install scipy

# xformers is optional, but it would greatly speed up the attention computation.
pip install -U xformers
pip install torchvision==0.19.0

Trained Models (DCTdiff)

CIFAR-10: cifar10_small.pth, cifar10_mid.pth, cifar10_mid_deep.pth

CelebA 64x64: celeba64_small_DPMSolver.pth, celeba64_small_ddim.pth

ImageNet 64x64: imgnet64_small.pth, imgnet64_mid.pth, imgnet64_mid_deep.pth

FFHQ 128x128: ffhq128_small.pth, ffhq128_mid.pth, ffhq128_mid_deep.pth

FFHQ 256x256: ffhq256_mid.pth

FFHQ 512x512: ffhq512_mid.pth

AFHQ 512x512: afhq512_mid.pth

Trained Models (UViT)

CIFAR-10: cifar10_mid.pth

CelebA 64x64: celeba64_small.pth

ImageNet 64x64: imgnet64_small.pth

FFHQ 128x128: ffhq128_small.pth

FFHQ 256x256: ffhq256_mid.pth

FFHQ 512x512: ffhq512_mid.pth

AFHQ 512x512: afhq512_mid.pth

Preparation Before Training and Evaluation

Data

Each dataset is organized in a 'folder' format. You can either use jpg or png for the datasets.

But note that, if you use png images for training, your generated images must be saved into png as well before computing FID. In our experiments, we use jpg images and the fid_stats is computed from the jpg image folder.

  • CIFAR-10 32x32: we provide a script (tools/download_cifar10.py) to download CIFAR-10
  • CelebA 64x64: download the dataset, then do center crop to 64x64 using the script tools/dataset_celeba64.py. Alternatively, you can directly download our processed one from GoogleDrive
  • ImageNet 64x64: download the dataset, use the train folder for training.
  • FFHQ 128x128: download the dataset
  • FFHQ 256x256: download the dataset
  • FFHQ 512x512: download the dataset
  • AFHQ 512x512: download the dataset

Reference statistics for FID

Download fid_stats directory from this link (which contains reference statistics for FID).

Put the downloaded fid_stats into directory assets/fid_stats, the path of fid_stats is set in the script datasets.py for FID comutation in both training and inference.

Using pytorch-FID, you can also

  • generate your own fid_stats for a given dataset
  • compute the FID whenever you need.
python -m pytorch_fid --save-stats path/to/dataset_folder path/to/fid_stats  # generate fid_stats
python -m pytorch_fid path/to/dataset1_folder/to/dataset2_folder  # FID calculation

Parameters (eta, entropy, m*)

We have provided all DCT-related parameters in the config files configs,

If you need to train on the other datasets, please use the script DCT_datasets_statis.py to compute eta and entropy, or determine m*

Training

We use the huggingface accelerate library to help train with distributed data parallel and mixed precision.

We provide all commands to reproduce DCTdiff training in the paper (4xA100 are used for all experiments):

Feel free to change the sampler, NFE and num_samples in the config file.

# CIFAR10 32x32
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/cifar10_uvit_mid_2by2.py --workdir YOUR_DIR

# CelebA 64x64
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/celeba64_uvit_small_2by2.py --workdir YOUR_DIR

# ImageNet 64x64
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/imgnet64_uvit_small_2by2.py --workdir YOUR_DIR

# FFHQ 128x128
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq128_uvit_small_4by4.py --workdir YOUR_DIR

# FFHQ 256x256
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq256_uvit_mid_4by4.py --workdir YOUR_DIR

# FFHQ 512x512
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq512_uvit_mid_8by8.py --workdir YOUR_DIR

# AFHQ 512x512
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/afhq512_uvit_mid_8by8.py --workdir YOUR_DIR

FID Evaluation

We use the huggingface accelerate library for efficient inference with mixed precision and multiple gpus. The following is the evaluation command:

An example of evaluating the FID on CIFAR-10 is shown below. Evaluation on other datasets just require to change the --config and --nnet_path accordingly,

# CIFAR10 32x32
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 eval.py --config=configs/cifar10_uvit_mid_2by2.py --nnet_path=DCTdiff_cifar10_mid.pth --output_path YOUR_DIR

References

If you find the code useful for your research, please consider citing

@article{ning2024dctdiff,
  title={DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space},
  author={Ning, Mang and Li, Mingxiao and Su, Jianlin and Jia, Haozhe and Liu, Lanmiao and Bene{\v{s}}, Martin and Salah, Albert Ali and Ertugrul, Itir Onal},
  journal={arXiv preprint arXiv:2412.15032},
  year={2024}
}

About

[ICML 2025] Official code for the paper 'DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages