This research explores the mechanisms of in-context learning (ICL) in Large Language Models, proposing that transformers learn by implicitly updating their internal weights during inference. The authors demonstrate that a transformer block effectively transforms prompt examples into a rank-1 weight update of the model's MLP layer. This process allows the model to adapt to new patterns without permanent training, mathematically mirroring stochastic gradient descent as tokens are processed. Theoretical formulas are provided to map these context-driven adjustments exactly, showing that MLP layers are naturally structured to absorb and store contextual information. Experimental results on linear regression tasks confirm that modifying model weights using these formulas produces identical predictions to providing the original in-context prompt. The study ultimately unifies ICL with model editing and steering vectors, offering a principled framework for understanding how LLMs reorganize their internal representations dynamically.
Informações
- Podcast
- FrequênciaDiário
- Publicado7 de março de 2026 às 22:46 UTC
- Duração24 min
- ClassificaçãoLivre
