33 min

Arash Ahmadian on Rethinking RLHF TalkRL: The Reinforcement Learning Podcast

    • Technology

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References
Self-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References
Self-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

33 min

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
Waveform: The MKBHD Podcast
Vox Media Podcast Network
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lenny's Podcast: Product | Growth | Career
Lenny Rachitsky
The TED AI Show
TED