[New Pipeline]: SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers #11135

New issue

Open

[New Pipeline]: SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers#11135

ziyu-guo

Model/Pipeline/Scheduler description

Repo: https://github.com/Roblox/SmoothCache
Paper: https://huggingface.co/papers/2411.10510

This is a training-free acceleration technique for DiT pipelines, that controls caching behavior of individual components and works across different pipelines and modalities.

There is a non-intrusive helper class implemented for plug-and-play integration with Diffusers DiT Pipeline. No changes inside Diffusers needed for this to work.

We're happy to add SmoothCache in the form of a doc-only PR to benefit a broader user-base.

Open source status

The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.

Provide useful links for the implementation

No response

ukaprch

I ran into a problem with pip install dit-smoothcache:
It didn't copy over all the components.

Once I got that settled away, I had to comment out the following from init.py:

`'''

Try to import DiTCacheHelper

try:
from .dit_cache_helper import DiTCacheHelper
all.append('DiTCacheHelper')
except ImportError:
print("Warning: DiTCacheHelper not imported. Ensure necessary dependencies are installed.")
'''`

I kept getting the error when importing DitCacheHelper:

try:
# Assuming DiTBlock is defined in 'models/dit.py' in the DiT repository
from models import DiTBlock
except ImportError:
print("Warning: DiT library is not accessible. DiTCacheHelper cannot be used.")
DiTBlock = None

which I assume needs to be investigated or implemented for this to work?

Then after the above I ran your included code in my FluxPipeline inference:

# Import SmoothCacheHelper from SmoothCache import DiffuserCacheHelper import json # Initialize the DiffuserCacheHelper with the model with open("./smoothcache_schedules/diffuser_schedule.json", "r") as f: schedule = json.load(f) cache_helper = DiffuserCacheHelper(pipe.transformer, schedule=schedule) # Enable the caching helper cache_helper.enable()

But I didn't see any performance improvement. Is it possible this is due to my transformer being quantized?

ziyu-guo

Author

Hi @ukaprch, thank you for trying it out. We have implemented inference helpers for the original DiT and Diffusers DiT Pipeline. The optional import warnings are to check if either of these dependencies exist.

Assuming you have latest Diffusers package installed, you can start your experiment with following code:

import json
import torch
from diffusers import DiTPipeline, DPMSolverMultistepScheduler

# Import SmoothCacheHelper
from SmoothCache import DiffuserCacheHelper  

# Load the DiT pipeline and scheduler
pipe = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

# Initialize the DiffuserCacheHelper with the model
with open("smoothcache_schedules/50-N-3-threshold-0.35.json", "r") as f:
    schedule = json.load(f)
cache_helper = DiffuserCacheHelper(pipe.transformer, schedule=schedule)

# Enable the caching helper
cache_helper.enable()
# Prepare the input
words = ["Labrador retriever"]
class_ids = pipe.get_label_ids(words)

# Generate images with the pipeline
generator = torch.manual_seed(33)
image = pipe(class_labels=class_ids, num_inference_steps=50, generator=generator).images[0]

# Restore the original forward method and disable the helper
# disable() should be paired up with enable() 
cache_helper.disable()

We've also provided a set of example caching schedules for 30/50/70 timesteps here.

If you wish to adapt the SmoothCacheHelper class to a new pipeline, you can follow the code example for DiffuserCacheHelper

Let us know if this anwsers your questions.

ziyu-guo

Author

A quick follow-up: we're working on extending SmoothCache to make it stackable on top of FluxPipeline. Please stay tuned.

github-actionsbot

Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions

added

stale

on May 10, 2025

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

stale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[New Pipeline]: SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers #11135

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Try to import DiTCacheHelper

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[New Pipeline]: SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers #11135

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Activity

ukaprch commented on Mar 23, 2025

Try to import DiTCacheHelper

ziyu-guo commented on Mar 24, 2025

ziyu-guo commented on Apr 15, 2025

github-actions commented on May 10, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions