16시간 전
12분

Base models know how to reason, thinking models learn when

This paper argues that thinking language models (LLMs that reason step-by-step) do not acquire entirely new capabilities during post-training but rather learn when to deploy pre-existing reasoning mechanisms latent in their base counterparts. The authors use an unsupervised clustering methodology via Sparse Autoencoders (SAEs) to derive an interpretable taxonomy of distinct reasoning behaviors, such as numeric computation and planning next steps. They then implement a hybrid model that uses the base model for generation but is guided by the thinking model's activation patterns via steering vectors to activate specific reasoning behaviors. This hybrid approach successfully recovered up to 91% of the performance gap between base and thinking models on reasoning benchmarks like MATH500 while steering only a small fraction of tokens, supporting the idea that the primary benefit of complex training is teaching efficient mechanism deployment.

에피소드 웹페이지

프로그램

Best AI papers explained
주기

매주 업데이트
발행일

2025년 10월 11일 오후 6:38 UTC
길이

12분
등급

전체 연령 사용가

Base models know how to reason, thinking models learn when

정보