11월 23일
15분

Back to Basics: Let Denoising Generative Models Denoise

This academic paper, introduces "Just image Transformers" (JiT), a novel approach to denoising diffusion models that advocates for directly predicting clean data (**x-prediction**) rather than predicting noise or a noised quantity. The authors argue this shift is critical based on the **manifold assumption**, which posits that clean data lies on a low-dimensional manifold while noise is inherently off-manifold. Experiments, including a toy model and high-resolution ImageNet generation using plain Vision Transformers (ViT), demonstrate that x-prediction successfully handles high-dimensional spaces where conventional noise-predicting methods catastrophically fail. This research emphasizes a return to first principles for a self-contained **"Diffusion + Transformer"** paradigm on raw pixel data, without relying on complex architectures, pre-training, or auxiliary losses. Ultimately, the paper provides extensive ablation studies on loss combinations and architectural components to validate that **x-prediction** is fundamentally more tractable for limited-capacity networks in high-dimensional generative modeling.

에피소드 웹페이지

프로그램

Best AI papers explained
주기

매주 업데이트
발행일

2025년 11월 23일 오전 4:45 UTC
길이

15분
등급

전체 연령 사용가

Back to Basics: Let Denoising Generative Models Denoise

정보