This research paper demonstrates that Multi-Layer Perceptrons (MLPs) can perform In-Context Learning (ICL), an ability often attributed exclusively to Transformer models. The researchers show that MLPs, and related MLP-Mixer models, achieve performance comparable to Transformers on synthetic ICL tasks involving regression and classification. Furthermore, in experiments testing relational reasoning—which is related to ICL classification—MLPs surprisingly outperformed Transformers in terms of both compute efficiency and generalization. These findings suggest that ICL is not solely dependent on attention-based architectures and challenge previous assumptions about the limitations of simple neural networks like MLPs in solving relational tasks. The study encourages further exploration of non-Transformer architectures to better understand the mechanisms of ICL.
정보
- 프로그램
- 주기매주 업데이트
- 발행일2025년 10월 11일 오전 4:39 UTC
- 길이16분
- 등급전체 연령 사용가