Life with AI Filipe Lauar
-
- Technology
In this podcast I explain some hard concepts of AI in a way that anyone can understand. I also show how AI is influencing our lives and we don’t know.
-
#83- LLM copilot for enterprise.
Fala galera, nesse episódio eu falo com o João Batista, Technical Product Manager da Stackspot AI. No episódio a gente falou bastante sobre o uso de LLMs como um copilot usando os próprios documentos da empresa para dar as respostas usando RAG.
Hey guys, in the brazilian version of the podcast I discussed with Joao from Stackspot AI. In the episode I talk about how they are developing enterprise copilot assistants using RAG.
In the episode we talk both about technical and product aspects, like similarity metrics, how many documents to use, how to show the answer to the user, how to metrify the quality of the answers...
Linkedin do Joao: https://www.linkedin.com/in/joaobatista-cordeironeto/
Linkedin da Stackspot AI: https://www.linkedin.com/company/stackspot/
Instagram do podcast: https://www.instagram.com/podcast.lifewithai
Linkedin do podcast: https://www.linkedin.com/company/life-with-ai -
#82- BitNet, 1 bit Transformers.
Hey guys, in this episode I talk about two papers, BitNet and 1.58 bit Transformer. These two papers from microsoft tell a new receipe to train 1 bit transformers, improve hugely the memory and energy consumption along with lower inference times.
BitNet paper: https://arxiv.org/pdf/2310.11453
1.58 bit paper: https://arxiv.org/pdf/2402.17764
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai -
-
#80- Layer pruning and Mixture of Depths.
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.
Paper MoD: https://arxiv.org/pdf/2404.02258.pdf
Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai -
#79- LoRA and QLoRA.
Hey guys, this is the first episode in a series of episodes about PEFT, Parameter Efficient Fine Tuning. In this episode I talk about LoRA and QLoRA, two widely used methods that allowed us to fine tune LLMs way faster and in a single GPU without losing performance.
Video sobre QLoRA: https://www.youtube.com/watch?v=6l8GZDPbFn8
LoRA paper: https://arxiv.org/pdf/2106.09685.pdf
QLoRA paper: https://arxiv.org/pdf/2305.14314.pdf
Instagram do podcast: https://www.instagram.com/podcast.lifewithai
Linkedin do podcast: https://www.linkedin.com/company/life-with-ai -
#78- RAFT: Why just to use RAG if you can also fine tune?
Hello, in this episode I talk a Retrieval Aware Fine Tuning (RAFT), a paper that proposes a new technique to use both domain specific fine-tuning and RAG to improve the retrieval capabilities of LLMs.
In the episode I also talk about another paper that is called RAFT, but this time Reward rAnking Fine Tuning, which proposes a new technique to perform RLHF without the convergence problems of Reinforcement Learning.
Retrieval Aware Fine Tuning: https://arxiv.org/abs/2403.10131v1
Reward rAnking Fine Tuning: https://arxiv.org/pdf/2304.06767.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai