Benchtalks

Snorkel AI

0.0 (0)
Science

Benchtalks is Snorkel AI's podcast series at the intersection of AI evaluation, data quality, and real-world impact. Hosted by the Snorkel team, each episode brings together researchers, practitioners, and leaders to dig into the questions that matter most as AI benchmarks grow more sophisticated, dynamic, and reflective of the complexity found in real-world deployments. We explore the full stack of what it takes to build AI that actually works — from the design of rigorous, open benchmarks that close the gap between what we measure and what we encounter in production, to the expert-in-the-loop data creation and curation pipelines that make reliable evaluation possible. Along the way, we get into reinforcement learning, reward modeling, and the evolving science of data quality that underpins it all. Whether you're building agents that operate over long horizons, crafting rubrics that go beyond pass/fail, or trying to understand what "good" looks like for a multi-artifact deliverable — this is the conversation for you. New episodes drop regularly on YouTube and wherever you get your podcasts. Follow us @SnorkelAI (LinkedIn, X, YouTube) to stay current as the field moves fast.

3 Episodes

Benchtalks is Snorkel AI's podcast series at the intersection of AI evaluation, data quality, and real-world impact. Hosted by the Snorkel team, each episode brings together researchers, practitioners, and leaders to dig into the questions that matter most as AI benchmarks grow more sophisticated, dynamic, and reflective of the complexity found in real-world deployments. We explore the full stack of what it takes to build AI that actually works — from the design of rigorous, open benchmarks that close the gap between what we measure and what we encounter in production, to the expert-in-the-loop data creation and curation pipelines that make reliable evaluation possible. Along the way, we get into reinforcement learning, reward modeling, and the evolving science of data quality that underpins it all. Whether you're building agents that operate over long horizons, crafting rubrics that go beyond pass/fail, or trying to understand what "good" looks like for a multi-artifact deliverable — this is the conversation for you. New episodes drop regularly on YouTube and wherever you get your podcasts. Follow us @SnorkelAI (LinkedIn, X, YouTube) to stay current as the field moves fast.

Creator

Snorkel AI
Years Active

2K
Episodes

3
Rating

Clean
Show Website

Benchtalks

News Commentary

News Commentary

Updated Weekly

Benchtalks

Episodes

Benchtalks #3: Parth Asawa (Continual Learning Bench) - We Taught AI Everything Except How to Learn

Benchtalks #2: John Yang (SWE-bench, ProgramBench) - The future of coding benchmarks

Benchtalks #1: Alex Shaw (Terminal-Bench, Harbor) - Building the benchmark factory

About

Information

You Might Also Like