-1 J
12 MIN

Base models know how to reason, thinking models learn when

This paper argues that thinking language models (LLMs that reason step-by-step) do not acquire entirely new capabilities during post-training but rather learn when to deploy pre-existing reasoning mechanisms latent in their base counterparts. The authors use an unsupervised clustering methodology via Sparse Autoencoders (SAEs) to derive an interpretable taxonomy of distinct reasoning behaviors, such as numeric computation and planning next steps. They then implement a hybrid model that uses the base model for generation but is guided by the thinking model's activation patterns via steering vectors to activate specific reasoning behaviors. This hybrid approach successfully recovered up to 91% of the performance gap between base and thinking models on reasoning benchmarks like MATH500 while steering only a small fraction of tokens, supporting the idea that the primary benefit of complex training is teaching efficient mechanism deployment.

Page Web de l'épisode

Émission

Best AI papers explained
Fréquence

Chaque semaine
Publiée

11 octobre 2025 à 18:38 UTC
Durée

12 min
Classification

Tous publics

Base models know how to reason, thinking models learn when

Informations