33 min

Arash Ahmadian on Rethinking RLHF TalkRL: The Reinforcement Learning Podcast

    • Technology

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References
Self-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References
Self-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

33 min

Top Podcasts In Technology

Podcast o technologii
Kanał o technologii
Bo czemu nie?
Krzysztof Kołacz
Techstorie - rozmowy o technologiach
TOK FM - Sylwia Czubkowska, Joanna Sosnowska
The TED AI Show
TED
AI CODZIENNIE - czyli co słychać w sztucznej inteligencji
Michał Dobrzański
Acquired
Ben Gilbert and David Rosenthal