05/22/2024
EPISODE 16
23 MIN

Infini-Attention: Google's Solution for Infinite Memory in LLMs

In this episode of the AI Paper Club Podcast, hosts Rafael Herrera and Sonia Marques welcome Leticia Fernandes, a Deeper Insights Senior Data Scientist and Generative AI Ambassador. Together, they explore the groundbreaking "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" paper from Google. This paper addresses the challenge of fitting infinite context into large language models, introducing the Infini-attention method. The trio discusses how this approach works, including how it uses linear attention and employs compressive memory to store key-value pairs, enabling models to handle extensive contexts.

We also extend a special thank you to the research team Google for developing this month’s paper. If you are interested in reading the paper for yourself, please check this link: https://arxiv.org/pdf/2404.07143.pdf

For more information on all things artificial intelligence, machine learning, and engineering for your business, please visit www.deeperinsights.com or reach out to us at thepaperclub@deeperinsights.com.

Episode Webpage

Show

The AI Paper Club
Frequency

Updated Monthly
Published

May 22, 2024 at 6:57 AM UTC
Length

23 min
Episode

16
Rating

Clean

Infini-Attention: Google's Solution for Infinite Memory in LLMs

Information