GitHub - forever208/DCTdiff: [ICML 2025] Official code for the paper 'DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space'

DCTdiff

Official PyTorch implementation of our ICML 2025 paper: DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

The repository is based on UViT. We keep the UViT code unchanged in the UViT branch where you can find our baseline implementation.

We will also release the DCTdiff based on DiT in the near future

Installation

The installation works both for DCTdiff and UViT

conda create -n dctdiff python==3.9
conda activate dctdiff

pip install matplotlib
pip install accelerate==0.33.0   # (auto install pytorch 2.4)
pip install absl-py ml_collections einops wandb ftfy==6.1.1 transformers==4.23.1
pip install opencv-python
pip install scipy

# xformers is optional, but it would greatly speed up the attention computation.
pip install -U xformers
pip install torchvision==0.19.0

Trained Models (DCTdiff)

CIFAR-10: cifar10_small.pth, cifar10_mid.pth, cifar10_mid_deep.pth

CelebA 64x64: celeba64_small_DPMSolver.pth, celeba64_small_ddim.pth

ImageNet 64x64: imgnet64_small.pth, imgnet64_mid.pth, imgnet64_mid_deep.pth

FFHQ 128x128: ffhq128_small.pth, ffhq128_mid.pth, ffhq128_mid_deep.pth

FFHQ 256x256: ffhq256_mid.pth

FFHQ 512x512: ffhq512_mid.pth

AFHQ 512x512: afhq512_mid.pth

Trained Models (UViT)

CIFAR-10: cifar10_mid.pth

CelebA 64x64: celeba64_small.pth

ImageNet 64x64: imgnet64_small.pth

FFHQ 128x128: ffhq128_small.pth

FFHQ 256x256: ffhq256_mid.pth

FFHQ 512x512: ffhq512_mid.pth

AFHQ 512x512: afhq512_mid.pth

Preparation Before Training and Evaluation

Data

Each dataset is organized in a 'folder' format. You can either use jpg or png for the datasets.

But note that, if you use png images for training, your generated images must be saved into png as well before computing FID. In our experiments, we use jpg images and the fid_stats is computed from the jpg image folder.

CIFAR-10 32x32: we provide a script (tools/download_cifar10.py) to download CIFAR-10
CelebA 64x64: download the dataset, then do center crop to 64x64 using the script tools/dataset_celeba64.py. Alternatively, you can directly download our processed one from GoogleDrive
ImageNet 64x64: download the dataset, use the train folder for training.
FFHQ 128x128: download the dataset
FFHQ 256x256: download the dataset
FFHQ 512x512: download the dataset
AFHQ 512x512: download the dataset

Reference statistics for FID

Download fid_stats directory from this link (which contains reference statistics for FID).

Put the downloaded fid_stats into directory assets/fid_stats, the path of fid_stats is set in the script datasets.py for FID comutation in both training and inference.

Using pytorch-FID, you can also

generate your own fid_stats for a given dataset
compute the FID whenever you need.

python -m pytorch_fid --save-stats path/to/dataset_folder path/to/fid_stats  # generate fid_stats
python -m pytorch_fid path/to/dataset1_folder/to/dataset2_folder  # FID calculation

Parameters (eta, entropy, m*)

We have provided all DCT-related parameters in the config files configs,

If you need to train on the other datasets, please use the script DCT_datasets_statis.py to compute eta and entropy, or determine m*

Training

We use the huggingface accelerate library to help train with distributed data parallel and mixed precision.

We provide all commands to reproduce DCTdiff training in the paper (4xA100 are used for all experiments):

Feel free to change the sampler, NFE and num_samples in the config file.

# CIFAR10 32x32
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/cifar10_uvit_mid_2by2.py --workdir YOUR_DIR

# CelebA 64x64
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/celeba64_uvit_small_2by2.py --workdir YOUR_DIR

# ImageNet 64x64
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/imgnet64_uvit_small_2by2.py --workdir YOUR_DIR

# FFHQ 128x128
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq128_uvit_small_4by4.py --workdir YOUR_DIR

# FFHQ 256x256
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq256_uvit_mid_4by4.py --workdir YOUR_DIR

# FFHQ 512x512
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/ffhq512_uvit_mid_8by8.py --workdir YOUR_DIR

# AFHQ 512x512
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train.py --config=configs/afhq512_uvit_mid_8by8.py --workdir YOUR_DIR

FID Evaluation

We use the huggingface accelerate library for efficient inference with mixed precision and multiple gpus. The following is the evaluation command:

An example of evaluating the FID on CIFAR-10 is shown below. Evaluation on other datasets just require to change the --config and --nnet_path accordingly,

# CIFAR10 32x32
accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 eval.py --config=configs/cifar10_uvit_mid_2by2.py --nnet_path=DCTdiff_cifar10_mid.pth --output_path YOUR_DIR

References

If you find the code useful for your research, please consider citing

@article{ning2024dctdiff,
  title={DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space},
  author={Ning, Mang and Li, Mingxiao and Su, Jianlin and Jia, Haozhe and Liu, Lanmiao and Bene{\v{s}}, Martin and Salah, Albert Ali and Ertugrul, Itir Onal},
  journal={arXiv preprint arXiv:2412.15032},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
configs		configs
libs		libs
scripts		scripts
tools		tools
ADM_evaluator.py		ADM_evaluator.py
DCT_analysis_upsampling.py		DCT_analysis_upsampling.py
DCT_datasets_statis.py		DCT_datasets_statis.py
DCT_utils.py		DCT_utils.py
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
dpm_solver_pp.py		dpm_solver_pp.py
dpm_solver_pytorch.py		dpm_solver_pytorch.py
eval.py		eval.py
eval_ldm.py		eval_ldm.py
eval_ldm_discrete.py		eval_ldm_discrete.py
eval_t2i_discrete.py		eval_t2i_discrete.py
img.png		img.png
sample_t2i_discrete.py		sample_t2i_discrete.py
sde.py		sde.py
skip_im.png		skip_im.png
train.py		train.py
train_ldm.py		train_ldm.py
train_ldm_discrete.py		train_ldm_discrete.py
train_t2i_discrete.py		train_t2i_discrete.py
utils.py		utils.py
uvit.png		uvit.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DCTdiff

Installation

Trained Models (DCTdiff)

Trained Models (UViT)

Preparation Before Training and Evaluation

Data

Reference statistics for FID

Parameters (eta, entropy, m*)

Training

FID Evaluation

References

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

forever208/DCTdiff

Folders and files

Latest commit

History

Repository files navigation

DCTdiff

Installation

Trained Models (DCTdiff)

Trained Models (UViT)

Preparation Before Training and Evaluation

Data

Reference statistics for FID

Parameters (eta, entropy, m*)

Training

FID Evaluation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages