The Test Set by Posit

Posit, PBC

A Posit podcast for data science junkies, anomaly hunters, and those who play outside the confidence interval. Hosted by Michael Chow, with co-hosts Wes McKinney & Hadley Wickham.

  1. JAN 26

    Emily Riederer: Column selectors, data quality, and learning in public

    Emily Riederer writes Python with an R accent, and we’re all comfortable with it. In this episode, Emily reflects on her journey through R, Python, and SQL — from lessons learned in averaging default values (oops, we're not all rich!) to discovering that column selectors are way cooler than they sound. She weighs in on the delicate art of learning in public, why frustration often makes the best teacher, and how to find your niche by solving the boring problems. Oh, Oh, and the crew casually drops that she's keynoting posit::conf 2026! Episode Notes Emily’s had a wild ride through modeling, data engineering, machine learning, and back again, and she knows a thing or three about the evolution of SQL tooling (from nightmare multi-page scripts to the dbt renaissance). She reveals how building internal packages became her gateway to making work enjoyable. Plus: the surprising Stata origins of column selectors, the eternal struggle of naming packages across R and Python, and why watching people code teaches you more than any tutorial ever could. The conversation gets real about imposter syndrome and the magic of tacit knowledge. What’s Inside Why real-world data is chaos, not truthThe path from modeling to data engineering (and back)What a data pipeline really is (extract, load, transform) and why organization mattersHow dbt changed the SQL game Learning by watching: Tacit knowledge and coding over the shoulder Imposter syndrome and learning in public Building internal tools to escape busyworkposit::conf 2025 keynote preview

    58 min
  2. 12/15/2025

    Marco Gorelli: Narwhals, ecosystem glue, and the value of boring work

    You’ve probably used Narwhals without realizing it. It’s the compatibility layer helping apps and libraries like Plotly play nice with Pandas, Polars, Arrow, and more — while keeping computation native instead of converting everything to Pandas. In this episode, Marco Gorelli explains how his weekend experiment turned into essential ecosystem infrastructure and why data types, not APIs, are where interoperability gets tricky. Plus what it takes to build trust and community around an open-source project. Episode Notes Marco shares the Narwhals origin story (including the meme-powered name), the hard edge cases that live in data types and null semantics, and why he’s cautious about using AI for code generation when correctness hinges on tiny details. We also jam on proactive “GitHub surfing,” conference talks as trust-building exercises, celebrating contributors, and how early commit messages capture the genuine excitement of building something new. What’s Inside Narwhals 101: You’ve probably used it (even if you didn’t know it)The real interoperability traps: data types, null semantics, and “looks-the-same” operationsWhy expression systems won, and how they shaped Marco’s approach — with nods to Ibis, Polars, and PandasOpen source as social work: proactive outreach, trust-building, and a Discord-powered communityExtending Narwhals to new engines, starting with the Daft plugin

    52 min

Ratings & Reviews

5
out of 5
25 Ratings

About

A Posit podcast for data science junkies, anomaly hunters, and those who play outside the confidence interval. Hosted by Michael Chow, with co-hosts Wes McKinney & Hadley Wickham.

You Might Also Like