This research paper proposes a novel approach to address catastrophic forgetting in large language models (LLMs) during continual learning, introducing sparse memory finetuning. This method utilizes memory layer models, which are designed for sparse updates, by selectively training only the memory slots that are highly activated by new knowledge relative to existing information, using a TF-IDF ranking score. The authors demonstrate that this technique achieves new knowledge acquisition comparable to full finetuning and LoRA, but with substantially less degradation of previously acquired capabilities on held-out question-answering benchmarks. The results suggest that leveraging sparsity in memory layers is a highly promising strategy for enabling LLMs to continually accumulate knowledge over time.
정보
- 프로그램
- 주기매주 업데이트
- 발행일2025년 10월 26일 오전 11:17 UTC
- 길이14분
- 등급전체 연령 사용가
