330 episodes

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.

MLOps.community Demetrios Brinkmann

    • Technology
    • 4.9 • 17 Ratings

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.

    What is AI Quality? // Mohamed Elgendy // MLOps Podcast #228

    What is AI Quality? // Mohamed Elgendy // MLOps Podcast #228

    Mohamed Elgendy is the Co-Founder & CEO at Kolena. Additionally, Mohamed Elgendy has had 1 past job as the Director Of Product and Engineering at Synapse Technology Corporation.

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    MLOps podcast #228 with Mohamed Elgendy, Co-founder & CEO of Kolena Inc., What is AI Quality?

    // Abstract
    Delve into the multifaceted concept of AI Quality. Demetrios and Mo explore the idea that AI quality is dependent on the specific domain, equitable to the difference in desired qualities between a $1 pen and a $100 pen. Mo underscores the performance of a product being in sync with its intended functionality and the absence of unknown risks as the pillars of AI Quality. They emphasize the need for comprehensive quality checks and adaptability of standards to differing product traits. Issues affecting edge deployments like latency are also highlighted. A deep dive into the formation of gold standards for AI, the nuanced necessities for various use cases, and the paramount need for collaboration among AI builders, regulators, and infrastructure firms form the core of the discussion. Elgendy brings to light their ambitious AI Quality Conference, aiming to set tangible, effective, but innovation-friendly Quality standards for AI. The dialogue also accentuates the urgent need for diversification and representation in the tech industry, the variability of standards and regulations, and the pivotal role of testing in AI and machine learning. The episode concludes with an articulate portrayal of how enhanced testing can streamline the entire process of machine learning.

    // Bio
    Mohamed is the Co-founder & CEO of Kolena and the author of the book “Deep Learning for Vision Systems”. Previously, he built and managed AI/ML organizations at Amazon, Twilio, Rakuten, and Synapse. Mohamed regularly speaks at AI conferences like Amazon's DevCon, O'Reilly's AI conference, and Google's I/O.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Website: www.kolena.io
    Deep Learning for Vision Systems book: https://www.amazon.com/Learning-Vision-Systems-Mohamed-Elgendy/dp/1617296198/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Mo on LinkedIn: https://www.linkedin.com/in/moelgendy/

    • 45 min
    Handling Multi-Terabyte LLM Checkpoints // Simon Karasik // #228

    Handling Multi-Terabyte LLM Checkpoints // Simon Karasik // #228

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com

    Simon Karasik⁠ is a proactive and curious ML Engineer with 5 years of experience. Developed & deployed ML models at WEB and Big scale for Ads and Tax.

    Huge thank you to Nebius AI for sponsoring this episode. Nebius AI - https://nebius.ai/

    MLOps podcast #228 with Simon Karasik, Machine Learning Engineer at Nebius AI, Handling Multi-Terabyte LLM Checkpoints.

    // Abstract
    The talk provides a gentle introduction to the topic of LLM checkpointing: why is it hard, how big are the checkpoints. It covers various tips and tricks for saving and loading multi-terabyte checkpoints, as well as the selection of cloud storage options for checkpointing.

    // Bio
    Full-stack Machine Learning Engineer, currently working on infrastructure for LLM training, with previous experience in ML for Ads, Speech, and Tax.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links


    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Simon on LinkedIn: https://www.linkedin.com/in/simon-karasik/

    Timestamps:
    [00:00] Simon preferred beverage
    [01:23] Takeaways
    [04:22] Simon's tech background
    [08:42] Zombie models garbage collection
    [10:52] The road to LLMs
    [15:09] Trained models Simon worked on
    [16:26] LLM Checkpoints
    [20:36] Confidence in AI Training
    [22:07] Different Checkpoints
    [25:06] Checkpoint parts
    [29:05] Slurm vs Kubernetes
    [30:43] Storage choices lessons
    [36:02] Paramount components for setup
    [37:13] Argo workflows
    [39:49] Kubernetes node troubleshooting
    [42:35] Cloud virtual machines have pre-installed mentoring
    [45:41] Fine-tuning
    [48:16] Storage, networking, and complexity in network design
    [50:56] Start simple before advanced; consider model needs.
    [53:58] Join us at our first in-person conference on June 25 all about AI Quality

    • 55 min
    Leading Enterprise Data Teams // Sol Rashidi // #227

    Leading Enterprise Data Teams // Sol Rashidi // #227

    Sol Rashidi is an esteemed executive, leader, and influencer within the AI, Data, and Technology space.  Having helped IBM launch Watson in 2011 as one of the earliest world applications of Artificial Intelligence, Sol has pioneered some of the early advancements of space. 

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Huge thank you to  @WeightsBiases  for sponsoring this episode. WandB Free Courses - http://wandb.me/courses_mlops

    MLOps podcast #227 with Sol Rashidi, CEO & Co-Founder of ExecutiveAI, Leading Enterprise Data Teams.

    // Abstract
    In the dynamic landscape of MLOps and data leadership, Sol shares invaluable insights on building successful teams and driving impactful projects. In this podcast episode, Sol delves into the importance of prioritizing relationships, introduces a pragmatic "Wrong Use Cases Formula" to streamline project prioritization, and emphasizes the critical role of effective communication in data leadership. Her wealth of experience and practical advice provide a roadmap for navigating the complexities of MLOps and leading data-driven initiatives to success.

    // Bio
    With eight (8) patents granted, 21 filed, and received awards that include:
    "Top 100 AI People" 2023
    "The Top 75 Innovators of 2023"
    "Top 65 Most Influential Women in 2023"
    "Forbes AI Maverick of the 21st Century" 2022
    “Top 10 Global Women in AI & Data”, 2023
    "Top AI 100 Award", 2023
    “50 Most Powerful Women in Tech”, 2022
    “Global 100 Power List” - 2021, 2022, 2023
    “Top 20 CDOs Globally” - 2022
    "Chief Analytics Officer of the Year" - 2022
    "Isomer Innovators of the Year" - 2021, 2022, 2023
    "Top 100 Innovators in Data & Analytics” - 2020, 2021, 2022, 2023
    "Top 100 Women in Business" - 2022

    Sol is an energetic business executive and a goal-oriented technologist, skilled at coupling her technical acumen with story-telling abilities to articulate business value with both startups and Fortune 100's who are leaning into data, AI, and technology as a competitive advantage while wanting to preserve the legacy in which they were founded upon. Sol has served as a C-Suite member across several Fortune 100 & Fortune 500 companies including:

    Chief Analytics Officer - Estee Lauder
    Chief Data & Analytics Officer - Merck Pharmaceuticals
    EVP, Chief Data Officer - Sony Music
    Chief Data & AI Officer - Royal Caribbean Cruise Lines
    Sr. Partner leading the Digital & Innovation Practice- Ernsty & Young
    Partner leading Watson Go-To-Market & Commercialization - IBM

    Sol now serves as the CEO of ExecutiveAI LLC. A company dedicated to democratizing Artificial Intelligence for Humanity and is considered an outstanding and influential business leader who is influencing the space traveling the world as a keynote speaker, and serving as the bridge between established Gen1.0 markets and those evolving into 4.0.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Sol's Book will be out on April 30, 2024
    Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments: https://www.amazon.com/Your-Survival-Guide-Real-World-Deployments/dp/1394272634?ref_=ast_author_mpb

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Sol on LinkedIn: https://www.linkedin.com/in/sol-rashidi-a672291/

    • 42 min
    The Rise of Modern Data Management // Chad Sanderson // #226

    The Rise of Modern Data Management // Chad Sanderson // #226

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/


    Chad Sanderson is passionate about data quality, and fixing the muddy relationship between data producers and consumers. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp.

    Huge thank you to @amazonwebservices for sponsoring this episode. AWS - https://aws.amazon.com/

    MLOps podcast #226 with Chad Sanderson, CEO & Co-Founder of Gable, The Rise of Modern Data Management.

    // Abstract
    In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.

    // Bio
    Chad Sanderson, CEO of Gable.ai, is a prominent figure in the data tech industry, having held key data positions at leading companies such as Convoy, Microsoft, Sephora, Subway, and Oracle. He is also the author of the upcoming O'Reilly book, "Data Contracts” and writes about the future of data infrastructure, modeling, and contracts in his newsletter “Data Products.”

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    AWS Trainium and Inferentia:
    https://aws.amazon.com/machine-learning/trainium/
    https://aws.amazon.com/machine-learning/inferentia/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Chad on LinkedIn: https://www.linkedin.com/in/chad-sanderson/

    • 57 min
    Beyond AGI, Can AI Help Save the Planet? // Patrick Beukema // #225

    Beyond AGI, Can AI Help Save the Planet? // Patrick Beukema // #225

    Patrick Beukema has a Ph.D. in neuroscience and has worked on AI models for brain decoding, which analyzes the brain's activity to decipher what people are seeing and thinking.

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Huge thank you to LatticeFlow for sponsoring this episode. LatticeFlow - https://latticeflow.ai/

    MLOps podcast #225 with Patrick Beukema, Head / Technical Lead of the Environmental AI, Applied Science Organization at AI2, Beyond AGI, Can AI Help Save the Planet?

    // Abstract
    AI will play a central role in solving some of our greatest environmental challenges. The technology that we need to solve these problems is in a nascent stage -- we are just getting started. For example, the combination of remote sensing (satellites) and high-performance AI operating at a global scale in real-time unlocks unprecedented avenues to new intelligence.

    MLOPs is often overlooked on AI teams, and typically there is a lot of friction in integrating software engineering best practices into the ML/AI workflow. However, performance ML/AI depends on extremely tight feedback loops from the user back to the model that enables high iteration velocity and ultimately continual improvement.

    We are making progress but environmental causes need your help. Join us fight for sustainability and conservation.

    // Bio
    Patrick is a machine learning engineer and scientist with a deep passion for leveraging artificial intelligence for social good. He currently leads the environmental AI team at the Allen Institute for Artificial Intelligence (AI2). His professional interests extend to enhancing scientific rigor in academia, where he is a strong advocate for the integration of professional software engineering practices to ensure reliability and reproducibility in academic research. Patrick holds a Ph.D. from the Center for Neuroscience at the University of Pittsburgh and the Center for the Neural Basis of Cognition at Carnegie Mellon University, where his research focused on neural plasticity and accelerated learning. He applied this expertise to develop state-of-the-art deep learning models for brain decoding of patient populations at a startup, later acquired by BlackRock. His earlier academic work spanned research on recurrent neural networks, causal inference, and ecology and biodiversity.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Variety of relevant papers/talks/links on Patrick's website: https://pbeukema.github.io/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Patrick on LinkedIn: https://www.linkedin.com/in/plbeukema/

    Timestamps:
    [00:00] AI Quality Conference
    [01:29] Patrick's preferred coffee
    [02:00] Takeaways
    [04:14] Learning how to learn journey
    [07:04] Patrick's day to day
    [08:39] Environmental AI
    [11:07] Environmental AI models
    [14:35] Nature Inspires Scientific Advances
    [18:11] R&D
    [24:58] Iterative Feedback-Driven Development
    [26:37 - 28:07] LatticeFlow Ad
    [33:58] Balancing Metrics for Success
    [38:16] Model Retraining Pipeline
    [44:11] Series Models: Versatility
    [45:57] Edge Models Enhance Output
    [50:22] Custom Models for Specific Data
    [53:53] Wrap up

    • 54 min
    GenAI in Production - Challenges and Trends // Verena Weber // #224

    GenAI in Production - Challenges and Trends // Verena Weber // #224

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Verena Weber believes that GenAI is going to transform the way we work and interact with devices. Her mission is to help companies prepare for this transformation. She has strong expertise in NLP and over 7 years of experience in Machine Learning.

    Huge thank you to  @zilliz  for sponsoring this episode. Zilliz - https://zilliz.com/

    MLOps podcast #224 with Verena Weber, Generative AI Consultant at Verena Weber, GenAI in Production - Challenges and Trends.

    // Abstract
    The goal of this talk is to provide insights into challenges for Generative AI in production as well as trends aiming to solve some of these challenges. The challenges and trends Verena see are:

    Model size and moving towards mixture of experts architectures
    context window - new breakthroughs for context lengths
    from unimodality to multimodality, next step large action models?
    regulation in form of the EU AI Act

    Verena uses the differences between Gemini 1.0 and Gemini 1.5 to exemplify some of these trends.

    // Bio
    Verena leverages GenAI in natural language to elevate business competitiveness and navigate its transformative impact. Her varied experience in multiple roles and sectors underpins her ability to extract business value from AI, blending deep technical expertise with strong business acumen. Post-graduation, she consulted in Data Science at Deloitte and then advanced her skills in NLP, Deep Learning, and GenAI as a Research Scientist at Alexa team, Amazon. Passionate about gender diversity in tech, she mentors women to thrive in this field.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Website: verenaweber.de
    Sign up for Verena's newsletter: https://verenas-newsletter-63558b.beehiiv.com/
    Zilliz - https://zilliz.com/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Verena on LinkedIn: https://www.linkedin.com/in/verena-weber-134178b9/

    Timestamps:
    [00:00] AI Quality Conference
    [01:33] Verena's preferred coffee
    [02:15] Takeaways
    [06:33] Ski Person of Influence
    [11:31] Verena's background in the last 5-10 years
    [14:24] Tech Evolution: Rapid Transformation
    [18:13] Working at Amazon and key challenges
    [20:10] Research-inspired suggestions
    [22:21] AI Updates Impact Workflows
    [22:52] Alexa Query Distribution Analysis
    [24:06] Innovative Solutions for Alexa
    [25:27] Robust T5 Data Prompting
    [27:38] Audio Data Quality Challenges
    [28:21-29:28] Zilliz ad
    [29:28] Alexa data transcription and data cleaning

    [35:38] Considering needs, costs, and complexity

    [37:44] ChatGPt is not ideal for classification

    [39:32] Comparison of model building using TF, IDF

    [45:08] Struggle to boost diversity in conference speakers

    [47:30] Creating safe environments helps underrepresented individuals participate

    [48:29] Wrap up

    • 48 min

Customer Reviews

4.9 out of 5
17 Ratings

17 Ratings

Stealth912 ,

Consistently good information from operators

No fluff. Getting into the details. There's no other podcast like this. Thank you for sharing!

goalieagk ,

Interesting discussions covering a rapidly developing field

I’m a senior ML engineer who deals with a lot of MLOps related items since we don’t have a dedicated role for that on our team. This show (and the Slack community) have been great resources for inspiration and staying up-to-date on the constant evolution of tools and best practices. It’s very useful to hear from other practitioners as we all try to navigate this landscape together

bmorphism ,

This podcast is art! Grazie ragazzə 🎉

Long time listener, glad the show is going strong. 🦾

More generative art with ANNs in production, please! Looking forward to applying insights from Valerio Velardo episode in my genart collaboration for DEF CON AI Village

Top Podcasts In Technology

No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
Hard Fork
The New York Times
TED Radio Hour
NPR

You Might Also Like

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
Practical AI: Machine Learning, Data Science
Changelog Media
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn
Data Skeptic
Kyle Polich
Last Week in AI
Skynet Today