JanusFlow: Unified Multimodal Understanding and Generation via Autoregressive Flow

Suraj Khanal

JanusFlow unifies image understanding/generation using rectified flow and an autoregressive model. It decouples encoders for tasks, aligning representations during training. Training occurs in 3 stages: adaptation, pre-training, and fine-tuning. This achieves strong performance in both areas with 1.3B parameters

About

JanusFlow unifies image understanding/generation using rectified flow and an autoregressive model. It decouples encoders for tasks, aligning representations during training. Training occurs in 3 stages: adaptation, pre-training, and fine-tuning. This achieves strong performance in both areas with 1.3B parameters