291 episodes

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

Linear Digressions Ben Jaffe and Katie Malone

    • Technology
    • 4.7 • 19 Ratings

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

    So long, and thanks for all the fish

    So long, and thanks for all the fish

    All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years.

    It’s been a ride, and a real pleasure and privilege to talk to you each week. Thanks, best wishes, and good night!

    —Katie and Ben

    • 35 min
    A Reality Check on AI-Driven Medical Assistants

    A Reality Check on AI-Driven Medical Assistants

    The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting better care, and achieving better outcomes, with these algorithms in the mix? The answer isn’t no, exactly, but it’s not a resounding yes, because these algorithms interact with a very complex system (the healthcare system) and other shortcomings of that system are proving hard to automate away. Getting a faster diagnosis from an image might not be an improvement if the image is now harder to capture (because of strict data quality requirements associated with the algorithm that wouldn’t stop a human doing the same job). Likewise, an algorithm getting a prediction mostly correct might not be an overall benefit if it introduces more dramatic failures when the prediction happens to be wrong. For every data scientist whose work is deployed into some kind of product, and is being used to solve real-world problems, these papers underscore how important and difficult it is to consider all the context around those problems.

    • 14 min
    A Data Science Take on Open Policing Data

    A Data Science Take on Open Policing Data

    A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.

    • 23 min
    Procella: YouTube's super-system for analytics data storage

    Procella: YouTube's super-system for analytics data storage

    This is a re-release of an episode that originally ran in October 2019.

    If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”

    • 29 min
    The Data Science Open Source Ecosystem

    The Data Science Open Source Ecosystem

    Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level.

    https://hdsr.mitpress.mit.edu/pub/xsrt4zs2/release/2

    • 23 min
    Rock the ROC Curve

    Rock the ROC Curve

    This is a re-release of an episode that first ran on January 29, 2017.

    This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.

    • 15 min

Customer Reviews

4.7 out of 5
19 Ratings

19 Ratings

Kanikanfly ,

Mind = Blown

Great podcast, explains Machine Learning in an extremely accessible and entertaining manner. I'm hooked!

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
Darknet Diaries
Jack Rhysider
Lenny's Podcast: Product | Growth | Career
Lenny Rachitsky
Deep Questions with Cal Newport
Cal Newport
Waveform: The MKBHD Podcast
Vox Media Podcast Network

You Might Also Like

Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn
Practical AI: Machine Learning, Data Science
Changelog Media
Machine Learning Guide
Dept
Casual Inference
Lucy D'Agostino McGowan and Ellie Murray