46 min

Natasha Jaques 2 TalkRL: The Reinforcement Learning Podcast

    • Technology

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! 
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar 
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine 
Additional References  
Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! 
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar 
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine 
Additional References  
Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  

46 min

Top Podcasts In Technology

All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
Hard Fork
The New York Times
Darknet Diaries
Jack Rhysider