Machine Learning Engineered

Charlie You

This podcast helps Machine Learning Engineers become the best at what they do. Join host Charlie You every week as he talks to the brightest minds in data science, artificial intelligence, and software engineering to discover how they bring cutting edge research out of the lab and into products that people love. You'll learn the skills, tools, and best practices you can use to build better ML systems and accelerate your career in this flourishing new field.

  1. Diving Deep into Synthetic Data with Alex Watson of Gretel.ai

    04/20/2021

    Diving Deep into Synthetic Data with Alex Watson of Gretel.ai

    Alex Watson is the co-founder and CEO of Gretel.ai, a startup that offers APIs for creating anonymized and synthetic datasets. Previously he was the founder of Harvest.ai, whose product Macie, an analytics platform protecting against data breaches, was acquired by AWS. Learn more about Alex and Gretel AI: http://gretel.ai Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 Introducing Alex Watson 03:45 How Alex was first exposed to programming 05:00 Alex's experience starting Harvest AI, getting acquired by AWS, and integrating their product at massive scale 21:20 How Alex first saw the opportunity for Gretel.ai 24:20 The most exciting use-cases for synthetic data 28:55 Theoretical guarantees of anonymized data with differential privacy 36:40 Combining pre-training with synthetic data 38:40 When to anonymize data and when to synthesize it 41:25 How Gretel's synthetic data engine works 44:50 Requirements of a dataset to create a synthetic version 49:25 Augmenting datasets with synthetic examples to address representation bias 52:45 How Alex recommends teams get started with Gretel.ai 59:00 Expected accuracy loss from training models on synthetic data 01:03:15 Biggest surprises from building Gretel.ai 01:05:25 Organizational patterns for protecting sensitive data 01:07:40 Alex's vision for Gretel's data catalog 01:11:15 Rapid fire questions Links: Gretel.ai Blog NetFlix Cancels Recommendation Contest After Privacy Lawsuit Greylock - The Github of Data Improving massively imbalanced datasets in machine learning with synthetic data Deep dive on generating synthetic data for Healthcare Gretel’s New Synthetic Performance Report The...

    1h 19m
  2. A Practical Approach to Learning Machine Learning with Radek Osmulski (Earth Species Project)

    03/30/2021

    A Practical Approach to Learning Machine Learning with Radek Osmulski (Earth Species Project)

    Radek Osmulski is a fully self-taught machine learning engineer. After getting tired of his corporate job, he taught himself programming and started a new career as a Ruby on Rails developer. He then set out to learn machine learning. Since then, he's been a Fast AI International Fellow, become a Kaggle Master, and is now an AI Data Engineer on the Earth Species Project. Learn more about Radek: https://www.radekosmulski.com https://twitter.com/radekosmulski Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 How Radek got interested in programming and computer science 09:00 How Radek taught himself machine learning 26:40 The skills Radek learned from Fast AI 39:20 Radek's recommendations for people learning ML now 51:30 Why Radek is writing a book 01:01:20 Radek's work at the Earth Species Project 01:10:15 How the ESP collects animal language data 01:21:05 Rapid fire questions Links: Radek's Book "Meta-Learning" Andrew Ng ML Coursera Fast AI Universal Language Model Fine-tuning for Text Classification How to do Machine Learning Efficiently NPR - Two Heartbeats a Minute Earth Species Project A Guide to the Good Life The Origin of Wealth Make Time You Are Here

    1h 38m
  3. From Data Science Leader to ML Researcher with Rodrigo Rivera (Skoltech ADASE, Samsung  NEXT)

    03/23/2021

    From Data Science Leader to ML Researcher with Rodrigo Rivera (Skoltech ADASE, Samsung NEXT)

    Rodrigo Rivera is a machine learning researcher at the Advanced Data Analytics in Science and Engineering Group at Skoltech and technical director of Samsung Next. He's previously been in data science and research leadership roles at companies all around the world including Rocket Internet and Philip-Morris. Learn more about Rodrigo: https://rodrigo-rivera.com/ https://twitter.com/rodrigorivr Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 03:00 How Rodrigo got started in computer science and started his first company 10:40 Rodrigo's experiences leading data science teams at Rocket Internet and PMI 26:15 Leaving industry to get a PhD in machine learning 28:55 Data science collaboration between business and academia 32:45 Rodrigo's research interest in time series data 39:25 Topological data analysis 45:35 Framing effective research as a startup 48:15 Neural Prophet 01:04:10 The potential future of Julia for numerical computing 01:08:20 Most exciting opportunities for ML in industry 01:15:05 Rodrigo's advice for listeners 01:17:00 Rapid fire questions Links: Rodrigo's Google Scholar Advanced Data Analytics in Science and Engineering Group Neural Prophet M-Competitions Machine Learning Refined Foundations of Machine Learning A First Course in Machine Learning

    1h 24m
  4. The Future of ML and AI Infrastructure and Ethics with Dan Jeffries (Pachyderm, AI Infrastructure Alliance)

    03/16/2021

    The Future of ML and AI Infrastructure and Ethics with Dan Jeffries (Pachyderm, AI Infrastructure Alliance)

    Dan Jeffries is the chief technical evangelist at Pachyderm, a leading data science platform. He's a prominent writer and speaker on all things related to the future. He's been in software for over two decades, many of those at Redhat, and is the founder of the AI Infrastructure Alliance and Practical AI Ethics. Learn more about Dan: https://twitter.com/Dan_Jeffries1 https://medium.com/@dan.jeffries Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 How Dan got started in computer science 06:50 What Dan is most excited about in AI 14:45 Where we are in the adoption curve of ML 20:40 The "Canonical Stack" of ML 32:00 Dan's goal for the AI Infrastructure Alliance 40:55 "Problems that ML startups don't know they're going to have" 49:00 Closed vs open source tools in the Canonical Stack 01:00:05 Building out the "boring" part of the infrastructure to enable exciting applications 01:08:40 Dan's practical approach to AI Ethics 01:23:50 Rapid fire questions Links: Pachyderm AI Infrastructure Alliance Practical AI Ethics Alliance Rise of the Canonical Stack in Machine Learning Rise of AI - The Age of AI in 2030 Google Magenta AlphaGo Documentary Thinking in Bets A History of the World in 6 Glasses Super-Thinking

    1h 37m
  5. Developing Feast, the Leading Open Source Feature Store, with Willem Pienaar (Gojek, Tecton)

    03/09/2021

    Developing Feast, the Leading Open Source Feature Store, with Willem Pienaar (Gojek, Tecton)

    Willem Pienaar is the co-creator of Feast, the leading open source feature store, which he leads the development of as a tech lead at Tecton. Previously, he led the ML platform team at Gojek, a super-app in Southeast Asia. Learn more: https://twitter.com/willpienaar https://feast.dev/ Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 How Willem got started in computer science 03:40 Paying for college by starting an ISP 05:25 Willem's experience creating Gojek's ML platform 21:45 Issues faced that led to the creation of Feast 26:45 Lessons learned building Feast 33:45 Integrating Feast with data quality monitoring tools 40:10 What it looks like for a team to adopt Feast 44:20 Feast's current integrations and future roadmap 46:05 How a data scientist would use Feast when creating a model 49:40 How the feature store pattern handles DAGs of models 52:00 Priorities for a startup's data infrastructure 55:00 Integrating with Amundsen, Lyft's data catalog 57:15 The evolution of data and MLOps tool standards for interoperability 01:01:35 Other tools in the modern data stack 01:04:30 The interplay between open and closed source offerings Links: Feast's Github Gojek Data Science Blog Data Build Tool (DBT) Tensorflow Data Validation (TFDV) A State of Feast Google BigQuery Lyft Amundsen Cortex Kubeflow MLFlow

    1h 12m
  6. Bringing DevOps Best Practices into Machine Learning with Benedikt Koller from ZenML

    03/02/2021

    Bringing DevOps Best Practices into Machine Learning with Benedikt Koller from ZenML

    Benedikt Koller is a self-professed "Ops guy", having spent over 12 years working in roles such as DevOps engineer, platform engineer, and infrastructure tech lead at companies like Stylight and Talentry in addition to his own consultancy KEMB. He's recently dove head first into the world of ML, where he hopes to bring his extensive ops knowledge into the field as the co-founder of Maiot, the company behind ZenML, an open source MLOps framework. Learn more: https://zenml.io/ https://maiot.io/ Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 Introducing Benedikt Koller 05:30 What the "DevOps revolution" was 10:10 Bringing good Ops practices into ML projects 30:50 Pivoting from vehicle predictive analytics to open source ML tooling 34:35 Design decisions made in ZenML 39:20 Most common problems faced by applied ML teams 49:00 The importance of separating configurations from code 55:25 Resources Ben recommends for learning Ops 57:30 What to monitor in an ML pipelines 01:00:45 Why you should run experiments in automated pipelines 01:08:20 The essential components of an MLOps stack 01:10:25 Building an open source business and what's next for ZenML 01:20:20 Rapid fire questions Links: ZenML's GitHub Maiot Blog The Twelve Factor App 12 Factors of reproducible Machine Learning in production Seldon Pachyderm KubeFlow Something Deeply Hidden The Expanse Series The Three Body Problem Extreme Ownership

    1h 28m
  7. Starting an Independent AI Research Lab with Josh Albrecht from Generally Intelligent

    02/23/2021

    Starting an Independent AI Research Lab with Josh Albrecht from Generally Intelligent

    Josh Albrecht is the co-founder and CTO of Generally Intelligent, an independent research lab investigating the fundamentals of learning across humans and machines. Previously, he was the lead data architect at Addepar, CTO of CloudFab, and CTO of Sourceress, which Generally Intelligent is a pivot from. Learn more about Josh: http://joshalbrecht.com/ http://generallyintelligent.ai/ Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 Introducing Josh Albrecht 03:30 How Josh got started in computer science 06:35 Josh's first two startup attempts 09:15 The tech behind Sourceress, an AI recruiting platform 16:10 Pivoting from Sourceress to Generally Intelligent, an AI research lab 23:50 How Josh defines "general intelligence" 28:35 Why Josh thinks self-supervised learning is the current most promising research area 36:15 Generally Intelligent's immediate research roadmap: BYOL, simulated environments 59:20 How Josh thinks about creating an optimal research environment 01:11:35 The "why" behind starting an independent research lab 01:13:30 AI alignment 01:17:00 Rapid fire questions Links: Bootstrap your own latent: A new approach to self-supervised Learning Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL) BYOL works even without batch statistics Generally Intelligent Podcast Consequences of Misaligned AI Why We Sleep Peak

    1h 25m
  8. Industrial Machine Learning and Building Tools for Data and Model Monitoring with Evidently AI Co-Founders Elena Samuylova and Emeli Dral

    02/16/2021

    Industrial Machine Learning and Building Tools for Data and Model Monitoring with Evidently AI Co-Founders Elena Samuylova and Emeli Dral

    Elena Samuylova and Emeli Dral are the co-founders of Evidently AI, where they build open source tools to analyze and monitor machine learning models. Elena was previously the head of the startup ecosystem at Yandex, director of business development at their data factory and chief product officer at Mechanica AI. Emeli was previously a data scientist at Yandex, chief data scientist at the data factory and Mechanica AI in addition to teaching machine learning both online and at multiple universities. Learn more about Elena, Emeli, and Evidently AI: https://evidentlyai.com/ https://twitter.com/elenasamuylova https://twitter.com/EmeliDral Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter Follow Charlie on Twitter: https://twitter.com/CharlieYouAI Subscribe to ML Engineered: https://mlengineered.com/listen Comments? Questions? Submit them here: http://bit.ly/mle-survey Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ Timestamps: 02:15 How Emeli and Elena each got started in data science 07:10 Applying machine learning across a wide variety of industries at the Yandex Data Factory 14:55 Using ML for industrial process improvement 23:35 Challenges encountered in industrial ML and technical solutions 27:15 The huge opportunity for ML in manufacturing 34:35 How to ensure safety when using models in physical systems 37:40 Why they started working on tools for data and ML monitoring 42:50 Different kinds of data drift and how to address them 48:25 Common mistakes ML teams make in monitoring 55:25 Features of Evidently AI's library 57:35 Building open source software 01:02:25 Technical roadmap for Evidently 01:05:50 Monitoring complex data 01:08:50 Business roadmap for Evidently 01:11:35 Rapid fire questions Links: Evidently on Github Evidently AI's Blog Thinking Fast and Slow Flow Doing Good Better

    1h 21m
5
out of 5
3 Ratings

About

This podcast helps Machine Learning Engineers become the best at what they do. Join host Charlie You every week as he talks to the brightest minds in data science, artificial intelligence, and software engineering to discover how they bring cutting edge research out of the lab and into products that people love. You'll learn the skills, tools, and best practices you can use to build better ML systems and accelerate your career in this flourishing new field.