How can we make diffusion models fast enough for real-time interactive applications?
Diffusion models and flow-based methods have revolutionized generative learning in the visual domain, setting new standards for image, video, and 3D content creation. However, as the field shifts toward interactive applications—such as real-time editing, world models, and embodied AI—the need for low-latency feedback has become critical. Currently, the high computational cost of iterative sampling hinders real-world deployment. While various acceleration techniques exist, the lack of a unified resource makes it difficult to bridge the gap between theory and practice.
To address this challenge, this tutorial offers a practice-oriented course designed to equip researchers and practitioners with the tools to accelerate diffusion pipelines, supported by the open-source FastGen library. The curriculum covers three primary areas: general sampling acceleration, training-based distillation for efficient few-step samplers, and applications in video and interactive world models.
| 9:00-9:50 am |
General Paradigms to Accelerating Diffusion Models Covering advanced differential equation solvers, low-dimensional latent diffusions, improved noising processes, and architecture-based accelerations. |
Arash Vahdat |
| 10 min break | ||
| 10:00-10:50 am |
Accelerating Diffusion Models with Step Distillation Covering trajectory-based distillation approaches (such as knowledge distillation, consistency models, and flow maps) and distribution distillation methods (such as adversarial distillation and variational score distillation). |
Julius Berner |
| 10 min break | ||
| 11:00-11:50 am |
From Images to Interactive World Models Covering key challenges in video-based interactive world models (such as real-time sampling, long-context memory, and block-wise causal generation) and representative approaches (such as CausVid, Self-Forcing, and APT2). |
Weili Nie |