The MLOps Podcast

Dean Pleban @ DagsHub

A podcast from DagsHub about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production

  1. 📡 Building Scalable ML Models with Natanel Davidovits

    12/16/2024

    📡 Building Scalable ML Models with Natanel Davidovits

    In this episode, Dean and Natanel Davidovits explore the intricacies of AI and machine learning, focusing on model efficiency, the use of APIs versus self-hosting, and the importance of defining success metrics in real-world applications. They discuss the challenges of data quality and labeling, the evolving role of data scientists in the age of LLMs, and the significance of effective communication between data science and product teams. The conversation also touches on the future of robotics in AI and the need for specialization in a rapidly changing landscape. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction to Natanel Davidovits 02:10 Optimizing AI Models for Real-World Tasks 03:47 Success Metrics in Industry vs. Academia 07:52 The Importance of Communication Between Teams 11:33 Handling Data Quality and Labeling Challenges 12:11 The Impact of LLMs on Data Science Careers 16:29 Navigating Specialized Domain Data 22:15 Trends in Machine Learning and AI 27:27 The Future of AI and Robotics 28:28 The Role of AI in Physics 33:36 Controversial Views on AI and Machine Learning 34:05 Final Thoughts and Recommendations ➡️ Natanel Davidovits on LinkedIn – https://www.linkedin.com/in/natanel-davidovits-28695312/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

    36 min
  2. 🌲 Machine Learning in Agriculture: Scaling AI for Crop Management with Dror Haor

    09/15/2024

    🌲 Machine Learning in Agriculture: Scaling AI for Crop Management with Dror Haor

    In this episode, Dean speaks with Dror Haor, CTO at SeeTree, about the challenges of deploying AI in agriculture at scale. They explore how SeeTree integrates AI and sensor fusion to manage vast amounts of remote sensing data, helping farmers improve crop yields with high accuracy at low costs. Dror shares insights on handling data drift, customizing models for different regions, and balancing the trade-offs between cost and performance. This conversation dives deep into practical machine learning applications in agriculture, offering valuable lessons for anyone working with large-scale data and AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:32 Production in machine learning at SeeTree 07:34 Sensor fusion in machine learning 16:26 Balancing accuracy and cost in agriculture 20:09 Customizing models for different customers and crops 24:19 Dealing with data in different domains 30:10 Tools and processes for ML at SeeTree 35:58 Building for scale 40:17 Collecting user feedback and self-improving products 42:45 Exciting developments in ML & AI 45:12 Hot takes in ML - Overfitting is good 46:34 Recommendations for the Audience ➡️ Dror Haor on LinkedIn – https://www.linkedin.com/in/dror-haor-phd-77152322/ ➡️ Dror Haor on Twitter – https://x.com/DrorHaor 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

    51 min
  3. 📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci

    08/15/2024

    📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci

    In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 01:59 Owning the ML Pipeline 02:56 Deployment Process 05:58 Testing and Feedback 07:40 Different Deployment Strategies 11:19 Explainability and Feature Importance 13:46 Challenges in Forecasting 22:33 ML Stack and Tools 26:47 Orchestrating Data Pipelines with Airflow 31:27 Exciting Developments in ML 35:58 Recommendations and Closing Links Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken ➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/ ➡️ Federico Bacci on Twitter – https://x.com/fedebyes 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

    40 min
  4. 🚗 Driving Innovation: Machine Learning in Auto Claims Processing

    07/15/2024

    🚗 Driving Innovation: Machine Learning in Auto Claims Processing

    In this episode, Dean speaks with Michał Oleszak, an ML engineering manager at Solera. Michał shares insights into how his team is using machine learning to transform the automotive claims process, from recognizing vehicle damages in images to estimating repair costs. The conversation covers the challenges of deploying ML pipelines in production, managing data quality for computer vision tasks, and balancing technical implementation with business needs. Michał also discusses his approach to model evaluation, the benefits of monorepo architecture, and his views on exciting developments in self-supervised learning for computer vision. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:42 Production for Machine Learning at Solera 03:49 Transitioning from Images to Structured Data 04:58 Combining Deep Learning and Non-Deep Learning Models 05:15 Deployment Process for Machine Learning Models 08:01 Challenges and Solutions in Monorepo Adoption 12:57 Evaluating Model and Pipeline Versions 21:57 Tools for ML Projects: Monorepo, Pants, GitHub Actions 24:04 Data Management and Data Quality 30:14 Challenges in ML Efforts: Data Quality 30:37 Excitement about Self-Supervised Learning and JEPA Architectures 34:45 Controversial Opinion: Importance of Statistics for ML 36:40 Recommendations Links 🌎Prisoners of Geography by Tim Marshall: https://www.amazon.com/Prisoners-Geography-Explain-Everything-Politics/dp/1501121472 ➡️ Michał Oleszak on LinkedIn – https://www.linkedin.com/in/michal-oleszak/ ➡️ Michał Oleszak on Twitter – https://x.com/MichalOleszak 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

    39 min
  5. 🌊 AI-Native with Idan Gazit – The future of AI products and interfaces + Getting AI to production

    05/16/2024

    🌊 AI-Native with Idan Gazit – The future of AI products and interfaces + Getting AI to production

    In this episode, Idan Gazit, Senior Director of Research at GitHub Next, discusses his role in exploring strategic technologies and incubating long bet projects. He explains how the GitHub Next team chooses research projects and the process of exploration and theme selection. Idan also shares insights into the ML focus at GitHub Next and the challenges of evaluating the impact of AI products. He reflects on his journey into the AI space and provides advice for testing AI products in smaller organizations. Finally, he shares his thoughts on the future of AI interfaces. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 00:56 Choosing Research Projects at GitHub Next 06:09 ML Focus in GitHub Next 10:52 ML Work and the Leaky Abstraction 13:16 Idan's Journey into the AI Space 17:54 Evaluating the Impact of AI Products 24:36 Testing AI Products in Smaller Organizations 32:52 The Future of AI Interfaces 40:01 Transitioning from Prototype to Product 46:45 Challenges in the ML/AI Space 56:03 Recommendations ➡️ Idan Gazit on LinkedIn – https://www.linkedin.com/in/idangazit/ ➡️ Idan Gazit on Twitter – https://twitter.com/idangazit 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

    1h 3m

Ratings & Reviews

3
out of 5
2 Ratings

About

A podcast from DagsHub about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production