25 episodes

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.

Vanishing Gradients Hugo Bowne-Anderson

    • Technology
    • 5.0 • 10 Ratings

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.

    Episode 25: Fully Reproducible ML & AI Workflows

    Episode 25: Fully Reproducible ML & AI Workflows

    Hugo speaks with Omoju Miller, a machine learning guru and founder and CEO of Fimio, where she is building 21st century dev tooling. In the past, she was Technical Advisor to the CEO at GitHub, spent time co-leading non-profit investment in Computer Science Education for Google, and served as a volunteer advisor to the Obama administration’s White House Presidential Innovation Fellows.


    We need open tools, open data, provenance, and the ability to build fully reproducible, transparent machine learning workflows. With the advent of closed-source, vendor-based APIs and compute becoming a form of gate-keeping, developer tools are at the risk of becoming commoditized and developers becoming consumers.


    We’ll talk about how ideas for escaping these burgeoning walled gardens. We’ll dive into



    What fully reproducible ML workflows would look like, including git for the workflow build process,
    The need for loosely coupled and composable tools that embrace a UNIX-like philosophy,
    What a much more scientific toolchain would look like,
    What a future open sources commons for Generative AI could look like,
    What an open compute ecosystem could look like,
    How to create LLMs and tooling so everyone can use them to build production-ready apps,


    And much more!


    LINKS



    The livestream on YouTube
    Omoju on Twitter
    Hugo on Twitter
    Vanishing Gradients on Twitter
    Lu.ma Calendar that includes details of Hugo's European Tour for Outerbounds
    Blog post that includes details of Hugo's European Tour for Outerbounds

    • 1 hr 20 min
    Episode 24: LLM and GenAI Accessibility

    Episode 24: LLM and GenAI Accessibility

    Hugo speaks with Johno Whitaker, a Data Scientist/AI Researcher doing R&D with answer.ai. His current focus is on generative AI, flitting between different modalities. He also likes teaching and making courses, having worked with both Hugging Face and fast.ai in these capacities.


    Johno recently reminded Hugo how hard everything was 10 years ago: “Want to install TensorFlow? Good luck. Need data? Perhaps try ImageNet. But now you can use big models from Hugging Face with hi-res satellite data and do all of this in a Colab notebook. Or think ecology and vision models… or medicine and multimodal models!”


    We talk about where we’ve come from regarding tooling and accessibility for foundation models, ML, and AI, where we are, and where we’re going. We’ll delve into



    What the Generative AI mindset is, in terms of using atomic building blocks, and how it evolved from both the data science and ML mindsets;
    How fast.ai democratized access to deep learning, what successes they had, and what was learned;
    The moving parts now required to make GenAI and ML as accessible as possible;
    The importance of focusing on UX and the application in the world of generative AI and foundation models;
    The skillset and toolkit needed to be an LLM and AI guru;
    What they’re up to at answer.ai to democratize LLMs and foundation models.


    LINKS



    The livestream on YouTube
    Zindi, the largest professional network for data scientists in Africa
    A new old kind of R&D lab: Announcing Answer.AI
    Why and how I’m shifting focus to LLMs by Johno Whitaker
    Applying AI to Immune Cell Networks by Rachel Thomas
    Replicate -- a cool place to explore GenAI models, among other things
    Hands-On Generative AI with Transformers and Diffusion Models
    Johno on Twitter
    Hugo on Twitter
    Vanishing Gradients on Twitter
    SciPy 2024 CFP
    Escaping Generative AI Walled Gardens with Omoju Miller, a Vanishing Gradients Livestream

    • 1 hr 30 min
    Episode 23: Statistical and Algorithmic Thinking in the AI Age

    Episode 23: Statistical and Algorithmic Thinking in the AI Age

    Hugo speaks with Allen Downey, a curriculum designer at Brilliant, Professor Emeritus at Olin College, and the author of Think Python, Think Bayes, Think Stats, and other computer science and data science books. In 2019-20 he was a Visiting Professor at Harvard University. He previously taught at Wellesley College and Colby College and was a Visiting Scientist at Google. He is also the author of the upcoming book Probably Overthinking It!


    They discuss Allen's new book and the key statistical and data skills we all need to navigate an increasingly data-driven and algorithmic world. The goal was to dive deep into the statistical paradoxes and fallacies that get in the way of using data to make informed decisions.


    For example, when it was reported in 2021 that “in the United Kingdom, 70-plus percent of the people who die now from COVID are fully vaccinated,” this was correct but the implication was entirely wrong. Their conversation jumps into many such concrete examples to get to the bottom of using data for more than “lies, damned lies, and statistics.” They cover



    Information and misinformation around pandemics and the base rate fallacy;
    The tools we need to comprehend the small probabilities of high-risk events such as stock market crashes, earthquakes, and more;
    The many definitions of algorithmic fairness, why they can't all be met at once, and what we can do about it;
    Public health, the need for robust causal inference, and variations on Berkson’s paradox, such as the low-birthweight paradox: an influential paper found that that the mortality rate for children of smokers is lower for low-birthweight babies;
    Why none of us are normal in any sense of the word, both in physical and psychological measurements;
    The Inspection paradox, which shows up in the criminal justice system and distorts our perception of prison sentences and the risk of repeat offenders.


    LINKS



    The livestream on YouTube
    Allen Downey on Github
    Allen's new book Probably Overthinking It!
    Allen on Twitter
    Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions by Mitchell et al.

    • 1 hr 20 min
    Episode 22: LLMs, OpenAI, and the Existential Crisis for Machine Learning Engineering

    Episode 22: LLMs, OpenAI, and the Existential Crisis for Machine Learning Engineering

    Jeremy Howard (Fast.ai), Shreya Shankar (UC Berkeley), and Hamel Husain (Parlance Labs) join Hugo Bowne-Anderson to talk about how LLMs and OpenAI are changing the worlds of data science, machine learning, and machine learning engineering.


    Jeremy Howard is co-founder of fast.ai, an ex-Chief Scientist at Kaggle, and creator of the ULMFiT approach on which all modern language models are based. Shreya Shankar is at UC Berkeley, ex Google brain, Facebook, and Viaduct. Hamel Husain has his own generative AI and LLM consultancy Parlance Labs and was previously at Outerbounds, Github, and Airbnb.


    They talk about



    How LLMs shift the nature of the work we do in DS and ML,
    How they change the tools we use,
    The ways in which they could displace the role of traditional ML (e.g. will we stop using xgboost any time soon?),
    How to navigate all the new tools and techniques,
    The trade-offs between open and closed models,
    Reactions to the recent Open Developer Day and the increasing existential crisis for ML.


    LINKS



    The panel on YouTube
    Hugo and Jeremy's upcoming livestream on what the hell happened recently at OpenAI, among many other things
    Vanishing Gradients on YouTube
    Vanishing Gradients on twitter

    • 1 hr 20 min
    Episode 21: Deploying LLMs in Production: Lessons Learned

    Episode 21: Deploying LLMs in Production: Lessons Learned

    Hugo speaks with Hamel Husain, a machine learning engineer who loves building machine learning infrastructure and tools 👷. Hamel leads and contributes to many popular open-source machine learning projects. He also has extensive experience (20+ years) as a machine learning engineer across various industries, including large tech companies like Airbnb and GitHub. At GitHub, he led CodeSearchNet, a large language model for semantic search that was a precursor to CoPilot. Hamel is the founder of Parlance-Labs, a research and consultancy focused on LLMs.


    They talk about generative AI, large language models, the business value they can generate, and how to get started.


    They delve into



    Where Hamel is seeing the most business interest in LLMs (spoiler: the answer isn’t only tech);
    Common misconceptions about LLMs;
    The skills you need to work with LLMs and GenAI models;
    Tools and techniques, such as fine-tuning, RAGs, LoRA, hardware, and more!
    Vendor APIs vs OSS models.


    LINKS



    Our upcoming livestream LLMs, OpenAI Dev Day, and the Existential Crisis for Machine Learning Engineering with Jeremy Howard (Fast.ai), Shreya Shankar (UC Berkeley), and Hamel Husain (Parlance Labs): Sign up for free!
    Our recent livestream Data and DevOps Tools for Evaluating and Productionizing LLMs with Hamel and Emil Sedgh, Lead AI engineer at Rechat -- in it, we showcase an actual industrial use case that Hamel and Emil are working on with Rechat, a real estate CRM, taking you through LLM workflows and tools.
    Extended Guide: Instruction-tune Llama 2 by Philipp Schmid
    The livestream recoding of this episode!
    Hamel on twitter

    • 1 hr 8 min
    Episode 20: Data Science: Past, Present, and Future

    Episode 20: Data Science: Past, Present, and Future

    Hugo speaks with Chris Wiggins (Columbia, NYTimes) and Matthew Jones (Princeton) about their recent book How Data Happened, and the Columbia course it expands upon, data: past, present, and future.


    Chris is an associate professor of applied mathematics at Columbia University and the New York Times’ chief data scientist, and Matthew is a professor of history at Princeton University and former Guggenheim Fellow.


    From facial recognition to automated decision systems that inform who gets loans and who receives bail, we all now move through a world determined by data-empowered algorithms. These technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search.


    DJ Patil, former U.S. Chief Data Scientist, said of the book "This is the first comprehensive look at the history of data and how power has played a critical role in shaping the history. It’s a must read for any data scientist about how we got here and what we need to do to ensure that data works for everyone."


    If you’re a data scientist, machine learning engineer, or work with data in any way, it’s increasingly important to know more about the history and future of the work that you do and understand how your work impacts society and the world.


    Among other things, they'll delve into



    the history of human use of data;
    how data are used to reveal insight and support decisions;
    how data and data-powered algorithms shape, constrain, and manipulate our commercial, civic, and personal transactions and experiences; and
    how exploration and analysis of data have become part of our logic and rhetoric of communication and persuasion.


    You can also sign up for our next livestreamed podcast recording here!


    LINKS



    How Data Happened, the book!
    data: past, present, and future, the course
    Race After Technology, by Ruha Benjamin
    The problem with metrics is a big problem for AI by Rachel Thomas
    Vanishing Gradients on YouTube

    • 1 hr 26 min

Customer Reviews

5.0 out of 5
10 Ratings

10 Ratings

vishalthatsme ,

Best data science podcast to come out in a while

[see title]

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
Acquired
Ben Gilbert and David Rosenthal
Darknet Diaries
Jack Rhysider
Hard Fork
The New York Times

You Might Also Like

Data Skeptic
Kyle Polich
Talk Python To Me
Michael Kennedy (@mkennedy)
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
Hard Fork
The New York Times
Freakonomics Radio
Freakonomics Radio + Stitcher
Dwarkesh Podcast
Dwarkesh Patel