329 episodes

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.

MLOps.community Demetrios Brinkmann

    • Technologie
    • 1.0 • 1 Rating

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.

    Handling Multi-Terabyte LLM Checkpoints // Simon Karasik // #228

    Handling Multi-Terabyte LLM Checkpoints // Simon Karasik // #228

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com

    Simon Karasik⁠ is a proactive and curious ML Engineer with 5 years of experience. Developed & deployed ML models at WEB and Big scale for Ads and Tax.

    Huge thank you to Nebius AI for sponsoring this episode. Nebius AI - https://nebius.ai/

    MLOps podcast #228 with Simon Karasik, Machine Learning Engineer at Nebius AI, Handling Multi-Terabyte LLM Checkpoints.

    // Abstract
    The talk provides a gentle introduction to the topic of LLM checkpointing: why is it hard, how big are the checkpoints. It covers various tips and tricks for saving and loading multi-terabyte checkpoints, as well as the selection of cloud storage options for checkpointing.

    // Bio
    Full-stack Machine Learning Engineer, currently working on infrastructure for LLM training, with previous experience in ML for Ads, Speech, and Tax.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links


    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Simon on LinkedIn: https://www.linkedin.com/in/simon-karasik/

    Timestamps:
    [00:00] Simon preferred beverage
    [01:23] Takeaways
    [04:22] Simon's tech background
    [08:42] Zombie models garbage collection
    [10:52] The road to LLMs
    [15:09] Trained models Simon worked on
    [16:26] LLM Checkpoints
    [20:36] Confidence in AI Training
    [22:07] Different Checkpoints
    [25:06] Checkpoint parts
    [29:05] Slurm vs Kubernetes
    [30:43] Storage choices lessons
    [36:02] Paramount components for setup
    [37:13] Argo workflows
    [39:49] Kubernetes node troubleshooting
    [42:35] Cloud virtual machines have pre-installed mentoring
    [45:41] Fine-tuning
    [48:16] Storage, networking, and complexity in network design
    [50:56] Start simple before advanced; consider model needs.
    [53:58] Join us at our first in-person conference on June 25 all about AI Quality

    • 55 min
    Leading Enterprise Data Teams // Sol Rashidi // #227

    Leading Enterprise Data Teams // Sol Rashidi // #227

    Sol Rashidi is an esteemed executive, leader, and influencer within the AI, Data, and Technology space.  Having helped IBM launch Watson in 2011 as one of the earliest world applications of Artificial Intelligence, Sol has pioneered some of the early advancements of space. 

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Huge thank you to  @WeightsBiases  for sponsoring this episode. WandB Free Courses - http://wandb.me/courses_mlops

    MLOps podcast #227 with Sol Rashidi, CEO & Co-Founder of ExecutiveAI, Leading Enterprise Data Teams.

    // Abstract
    In the dynamic landscape of MLOps and data leadership, Sol shares invaluable insights on building successful teams and driving impactful projects. In this podcast episode, Sol delves into the importance of prioritizing relationships, introduces a pragmatic "Wrong Use Cases Formula" to streamline project prioritization, and emphasizes the critical role of effective communication in data leadership. Her wealth of experience and practical advice provide a roadmap for navigating the complexities of MLOps and leading data-driven initiatives to success.

    // Bio
    With eight (8) patents granted, 21 filed, and received awards that include:
    "Top 100 AI People" 2023
    "The Top 75 Innovators of 2023"
    "Top 65 Most Influential Women in 2023"
    "Forbes AI Maverick of the 21st Century" 2022
    “Top 10 Global Women in AI & Data”, 2023
    "Top AI 100 Award", 2023
    “50 Most Powerful Women in Tech”, 2022
    “Global 100 Power List” - 2021, 2022, 2023
    “Top 20 CDOs Globally” - 2022
    "Chief Analytics Officer of the Year" - 2022
    "Isomer Innovators of the Year" - 2021, 2022, 2023
    "Top 100 Innovators in Data & Analytics” - 2020, 2021, 2022, 2023
    "Top 100 Women in Business" - 2022

    Sol is an energetic business executive and a goal-oriented technologist, skilled at coupling her technical acumen with story-telling abilities to articulate business value with both startups and Fortune 100's who are leaning into data, AI, and technology as a competitive advantage while wanting to preserve the legacy in which they were founded upon. Sol has served as a C-Suite member across several Fortune 100 & Fortune 500 companies including:

    Chief Analytics Officer - Estee Lauder
    Chief Data & Analytics Officer - Merck Pharmaceuticals
    EVP, Chief Data Officer - Sony Music
    Chief Data & AI Officer - Royal Caribbean Cruise Lines
    Sr. Partner leading the Digital & Innovation Practice- Ernsty & Young
    Partner leading Watson Go-To-Market & Commercialization - IBM

    Sol now serves as the CEO of ExecutiveAI LLC. A company dedicated to democratizing Artificial Intelligence for Humanity and is considered an outstanding and influential business leader who is influencing the space traveling the world as a keynote speaker, and serving as the bridge between established Gen1.0 markets and those evolving into 4.0.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Sol's Book will be out on April 30, 2024
    Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments: https://www.amazon.com/Your-Survival-Guide-Real-World-Deployments/dp/1394272634?ref_=ast_author_mpb

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Sol on LinkedIn: https://www.linkedin.com/in/sol-rashidi-a672291/

    • 42 min
    The Rise of Modern Data Management // Chad Sanderson // #226

    The Rise of Modern Data Management // Chad Sanderson // #226

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/


    Chad Sanderson is passionate about data quality, and fixing the muddy relationship between data producers and consumers. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp.

    Huge thank you to @amazonwebservices for sponsoring this episode. AWS - https://aws.amazon.com/

    MLOps podcast #226 with Chad Sanderson, CEO & Co-Founder of Gable, The Rise of Modern Data Management.

    // Abstract
    In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.

    // Bio
    Chad Sanderson, CEO of Gable.ai, is a prominent figure in the data tech industry, having held key data positions at leading companies such as Convoy, Microsoft, Sephora, Subway, and Oracle. He is also the author of the upcoming O'Reilly book, "Data Contracts” and writes about the future of data infrastructure, modeling, and contracts in his newsletter “Data Products.”

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    AWS Trainium and Inferentia:
    https://aws.amazon.com/machine-learning/trainium/
    https://aws.amazon.com/machine-learning/inferentia/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Chad on LinkedIn: https://www.linkedin.com/in/chad-sanderson/

    • 57 min
    Beyond AGI, Can AI Help Save the Planet? // Patrick Beukema // #225

    Beyond AGI, Can AI Help Save the Planet? // Patrick Beukema // #225

    Patrick Beukema has a Ph.D. in neuroscience and has worked on AI models for brain decoding, which analyzes the brain's activity to decipher what people are seeing and thinking.

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Huge thank you to LatticeFlow for sponsoring this episode. LatticeFlow - https://latticeflow.ai/

    MLOps podcast #225 with Patrick Beukema, Head / Technical Lead of the Environmental AI, Applied Science Organization at AI2, Beyond AGI, Can AI Help Save the Planet?

    // Abstract
    AI will play a central role in solving some of our greatest environmental challenges. The technology that we need to solve these problems is in a nascent stage -- we are just getting started. For example, the combination of remote sensing (satellites) and high-performance AI operating at a global scale in real-time unlocks unprecedented avenues to new intelligence.

    MLOPs is often overlooked on AI teams, and typically there is a lot of friction in integrating software engineering best practices into the ML/AI workflow. However, performance ML/AI depends on extremely tight feedback loops from the user back to the model that enables high iteration velocity and ultimately continual improvement.

    We are making progress but environmental causes need your help. Join us fight for sustainability and conservation.

    // Bio
    Patrick is a machine learning engineer and scientist with a deep passion for leveraging artificial intelligence for social good. He currently leads the environmental AI team at the Allen Institute for Artificial Intelligence (AI2). His professional interests extend to enhancing scientific rigor in academia, where he is a strong advocate for the integration of professional software engineering practices to ensure reliability and reproducibility in academic research. Patrick holds a Ph.D. from the Center for Neuroscience at the University of Pittsburgh and the Center for the Neural Basis of Cognition at Carnegie Mellon University, where his research focused on neural plasticity and accelerated learning. He applied this expertise to develop state-of-the-art deep learning models for brain decoding of patient populations at a startup, later acquired by BlackRock. His earlier academic work spanned research on recurrent neural networks, causal inference, and ecology and biodiversity.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Variety of relevant papers/talks/links on Patrick's website: https://pbeukema.github.io/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Patrick on LinkedIn: https://www.linkedin.com/in/plbeukema/

    Timestamps:
    [00:00] AI Quality Conference
    [01:29] Patrick's preferred coffee
    [02:00] Takeaways
    [04:14] Learning how to learn journey
    [07:04] Patrick's day to day
    [08:39] Environmental AI
    [11:07] Environmental AI models
    [14:35] Nature Inspires Scientific Advances
    [18:11] R&D
    [24:58] Iterative Feedback-Driven Development
    [26:37 - 28:07] LatticeFlow Ad
    [33:58] Balancing Metrics for Success
    [38:16] Model Retraining Pipeline
    [44:11] Series Models: Versatility
    [45:57] Edge Models Enhance Output
    [50:22] Custom Models for Specific Data
    [53:53] Wrap up

    • 54 min
    GenAI in Production - Challenges and Trends // Verena Weber // #224

    GenAI in Production - Challenges and Trends // Verena Weber // #224

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

    Verena Weber believes that GenAI is going to transform the way we work and interact with devices. Her mission is to help companies prepare for this transformation. She has strong expertise in NLP and over 7 years of experience in Machine Learning.

    Huge thank you to  @zilliz  for sponsoring this episode. Zilliz - https://zilliz.com/

    MLOps podcast #224 with Verena Weber, Generative AI Consultant at Verena Weber, GenAI in Production - Challenges and Trends.

    // Abstract
    The goal of this talk is to provide insights into challenges for Generative AI in production as well as trends aiming to solve some of these challenges. The challenges and trends Verena see are:

    Model size and moving towards mixture of experts architectures
    context window - new breakthroughs for context lengths
    from unimodality to multimodality, next step large action models?
    regulation in form of the EU AI Act

    Verena uses the differences between Gemini 1.0 and Gemini 1.5 to exemplify some of these trends.

    // Bio
    Verena leverages GenAI in natural language to elevate business competitiveness and navigate its transformative impact. Her varied experience in multiple roles and sectors underpins her ability to extract business value from AI, blending deep technical expertise with strong business acumen. Post-graduation, she consulted in Data Science at Deloitte and then advanced her skills in NLP, Deep Learning, and GenAI as a Research Scientist at Alexa team, Amazon. Passionate about gender diversity in tech, she mentors women to thrive in this field.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Website: verenaweber.de
    Sign up for Verena's newsletter: https://verenas-newsletter-63558b.beehiiv.com/
    Zilliz - https://zilliz.com/

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
    Connect with Verena on LinkedIn: https://www.linkedin.com/in/verena-weber-134178b9/

    Timestamps:
    [00:00] AI Quality Conference
    [01:33] Verena's preferred coffee
    [02:15] Takeaways
    [06:33] Ski Person of Influence
    [11:31] Verena's background in the last 5-10 years
    [14:24] Tech Evolution: Rapid Transformation
    [18:13] Working at Amazon and key challenges
    [20:10] Research-inspired suggestions
    [22:21] AI Updates Impact Workflows
    [22:52] Alexa Query Distribution Analysis
    [24:06] Innovative Solutions for Alexa
    [25:27] Robust T5 Data Prompting
    [27:38] Audio Data Quality Challenges
    [28:21-29:28] Zilliz ad
    [29:28] Alexa data transcription and data cleaning

    [35:38] Considering needs, costs, and complexity

    [37:44] ChatGPt is not ideal for classification

    [39:32] Comparison of model building using TF, IDF

    [45:08] Struggle to boost diversity in conference speakers

    [47:30] Creating safe environments helps underrepresented individuals participate

    [48:29] Wrap up

    • 48 min
    Introducing DBRX: The Future of Language Models // [Exclusive] Databricks Roundtable

    Introducing DBRX: The Future of Language Models // [Exclusive] Databricks Roundtable

    Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/


    MLOps Coffee Sessions Special episode with Databricks, Introducing DBRX: The Future of Language Models, fueled by our Premium Brand Partner, Databricks.

    DBRX is designed to be especially capable of a wide range of tasks and outperforms other open LLMs on standard benchmarks. It also promises to excel at code and math problems, areas where others have struggled.
    Our panel of experts will get into the technical nuances, potential applications, and implications of DBRx for businesses, developers, and the broader tech community.
    This session is a great opportunity to hear from insiders about how DBRX's capabilities can benefit you.

    // Bio
    Denny Lee - Co-host
    Denny Lee is a long-time Apache Spark™ and MLflow contributor, Delta Lake maintainer, and a Sr. Staff Developer Advocate at Databricks. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms and predictive analytics systems. He has previously built enterprise DW/BI and big data systems at Microsoft, including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server.

    Davis Blalock
    Davis Blalock is a research scientist and the first employee at MosaicML. He previously worked at PocketSonics (acquired 2013) and completed his PhD at MIT, where he was advised by John Guttag. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar. He is also the author of Davis Summarizes Papers, one of the most widely-read machine learning newsletters.

    Bandish Shah
    Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers.

    Abhi Venigalla
    Abhi is an NLP architect working on helping organizations build their own LLMs using Databricks. Joined as part of the MosaicML team and used to work as a researcher at Cerebras Systems.

    Ajay Saini
    Ajay is an engineering manager at Databricks leading the GenAI training platform team. He was one of the early engineers at MosaicML (acquired by Databricks) where he first helped build and launch Composer (an open source deep learning training framework) and afterwards led the development of the MosaicML training platform which enabled customers to train models (such as LLMs) from scratch on their own datasets at scale. Prior to MosaicML, Ajay was co-founder and CEO of Overfit, an online personal training startup (YC S20). Before that, Ajay worked on ML solutions for ransomware detection and data governance at Rubrik. Ajay has both a B.S. and MEng in computer science with a concentration in AI from MIT.

    // MLOps Jobs board
    https://mlops.pallet.xyz/jobs

    // MLOps Swag/Merch
    https://mlops-community.myshopify.com/

    // Related Links
    Website: https://www.databricks.com/
    Databricks DBRX: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

    --------------- ✌️Connect With Us ✌️ -------------
    Join our slack community: https://go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: https://go.mlops.community/register
    Catch all episodes, blogs, newsletters, and more: https://mlops.community/

    • 48 min

Customer Reviews

1.0 out of 5
1 Rating

1 Rating

Top Podcasts In Technologie

Digital Podcast
Schweizer Radio und Fernsehen (SRF)
Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Flugforensik - Abstürze und ihre Geschichte
Flugforensik
Apple Events (video)
Apple

You Might Also Like

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
Practical AI: Machine Learning, Data Science
Changelog Media
Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn
Data Engineering Podcast
Tobias Macey
Last Week in AI
Skynet Today