1 hr 26 min

Sam Bowman on benchmarking and AI alignment CS224U

    • Technology

Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.

Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/


Sam's website
Sam on Twitter
NYU Linguistics
NYU Data Science
NYU Computer Science
Anthropic
SNLI paper: A large annotated corpus for learning natural language inference
SNLI leaderboard
FraCaS
SICK
A SICK cure for the evaluation of compositional distributional semantic models
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
RTE Knowledge Resources
Richard Socher
Chris Manning
Andrew Ng
Ray Kurtzweil
SQuAD
Gabor Angeli
Adina Williams
Adina Williams podcast episode
MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference
MultiNLI leaderboards
Twitter discussion of LLMs and negation
GLUE
SuperGLUE
DecaNLP
GPT-3 paper: Language Models are Few-Shot Learners
FLAN
Winograd schema challenges
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
JSALT: General-Purpose Sentence Representation Learning
Ellie Pavlick
Ellie Pavlick podcast episode
Tal Linzen
Ian Tenney
Dipanjan Das
Yoav Goldberg
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Big Bench
Upwork
Surge AI
Dynabench
Douwe Kiela
Douwe Kiela podcast episode
Ethan Perez
NYU Alignment Research Group
Eliezer Shlomo Yudkowsky
Alignment Research Center
Redwood Research
Percy Liang podcast episode
Richard Socher podcast episode

Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.

Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/


Sam's website
Sam on Twitter
NYU Linguistics
NYU Data Science
NYU Computer Science
Anthropic
SNLI paper: A large annotated corpus for learning natural language inference
SNLI leaderboard
FraCaS
SICK
A SICK cure for the evaluation of compositional distributional semantic models
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
RTE Knowledge Resources
Richard Socher
Chris Manning
Andrew Ng
Ray Kurtzweil
SQuAD
Gabor Angeli
Adina Williams
Adina Williams podcast episode
MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference
MultiNLI leaderboards
Twitter discussion of LLMs and negation
GLUE
SuperGLUE
DecaNLP
GPT-3 paper: Language Models are Few-Shot Learners
FLAN
Winograd schema challenges
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
JSALT: General-Purpose Sentence Representation Learning
Ellie Pavlick
Ellie Pavlick podcast episode
Tal Linzen
Ian Tenney
Dipanjan Das
Yoav Goldberg
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Big Bench
Upwork
Surge AI
Dynabench
Douwe Kiela
Douwe Kiela podcast episode
Ethan Perez
NYU Alignment Research Group
Eliezer Shlomo Yudkowsky
Alignment Research Center
Redwood Research
Percy Liang podcast episode
Richard Socher podcast episode

1 hr 26 min

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Hard Fork
The New York Times
TED Radio Hour
NPR
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People