AI Engineering Podcast

Tobias Macey

4.3 (6)
TECHNOLOGY

This show is your guidebook to building scalable and maintainable AI systems. You will learn how to architect AI applications, apply AI to your work, and the considerations involved in building or customizing new models. Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.

1D AGO

Building the Internet of Agents: Identity, Observability, and Open Protocols

Summary In this episode Guillaume de Saint Marc, VP of Engineering at Cisco Outshift, talks about the complexities and opportunities of scaling multi‑agent systems. Guillaume explains why specialized agents collaborating as a team inspire trust in enterprise settings, and contrasts rigid, “lift-and-shift” agentic workflows with fully self-forming systems. We explore the emerging Internet of Agents, the need for open, interoperable protocols (A2A for peer collaboration and MCP for tool calling), and new layers in the stack for syntactic and semantic communication. Guillaume details foundational needs around discovery, identity, observability, and fine-grained, task/tool/transaction-based access control (TBAC), along with Cisco’s open-source Agency initiative, directory concepts, and OpenTelemetry extensions for agent traces. He shares concrete wins in IT/NetOps—network config validation, root-cause analysis, and the CAPE platform engineer agent—showing dramatic productivity gains. We close with human-in-the-loop UX patterns for multi-agent teams and SLIM, a high-performance group communication layer designed for agent collaboration. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Guillaume de Saint Marc about the complexities and opportunities of scaling multi-agent systemsInterview IntroductionHow did you get involved in machine learning?Can you start by giving an overview of what constitutes a "multi-agent" system?Many of the multi-agent services that I have read or spoken about are designed and operated by a single department or organization. What are some of the new challenges that arise when allowing agents to communicate and co-ordinate outside of organizational boundaries?The web is the most famous example of a successful decentralized system, with HTTP being the most ubiquitous protocol powering it. What does the internet of agents look like?What is the role of humans in that equation?The web has evolved in a combination of organic and planned growth and is vastly more complex and complicated than when it was first introduced. What are some of the most important lessons that we should carry forward into the connectivity of AI agents?Security is a critical aspect of the modern web. What are the controls, assertions, and constraints that we need to implement to enable agents to operate with a degree of trust while also being appropriately constrained?The AGNTCY project is a substantial investment in an open architecture for the internet of agents. What does it provide in terms of building blocks for teams and businesses who are investing in agentic services?What are the most interesting, innovative, or unexpected ways that you have seen AGNTCY/multi-agent systems used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on multi-agent systems?When is a multi-agent system the wrong choice?What do you have planned for the future of AGNTCY/multi-agent systems?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links Outshift by CiscoMulti-Agent SystemsDeep LearningMerakiSymbolic ReasoningTransformer ArchitectureDeepSeekLLM ReasoningRené DescartesKanbanA2A (Agent-to-Agent) ProtocolMCP == Model Context ProtocolAGNTCYICANN == Internet Corporation for Assigned Names and NumbersOSI LayersOCI == Open Container InitiativeOASF == Open Agentic Schema FrameworkOracle AgentSpecSplunkOpenTelemetryCAIPE == Community AI Platform EngineerAGNTCY Coffee ShopThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

1h 7m
NOV 2

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

Summary In this episode of the AI Engineering Podcast Will Vincent, Python developer advocate at JetBrains (PyCharm), talks about how AI utilities are revolutionizing software engineering beyond basic code completion. He discusses the shift from "vibe coding" to "vibe engineering," where engineers collaborate with AI agents through clear guidelines, iterative specs, and tight guardrails. Will shares practical techniques for getting real value from these tools, including loading the whole codebase for context, creating agent specifications, constraining blast radius, and favoring step-by-step plans over one-shot generations. The conversation covers code review gaps, deployment context, and why continuity across tools matters, as well as JetBrains' evolving approach to integrated AI, including support for external and local models. Will emphasizes the importance of human oversight, particularly for architectural choices and production changes, and encourages experimentation and playfulness while acknowledging the ethics, security, and reliability tradeoffs that come with modern LLMs. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Will Vincent about selecting and using AI software engineering utilities and making them work for your teamInterview IntroductionHow did you get involved in machine learning?Software engineering is a discipline that is relatively young in relative terms, but does have several decades of history. As someone working for a developer tools company, what is your broad opinion on the impact of AI on software engineering as an occupation?There are many permutations of AI development tools. What are the broad categories that you see?What are the major areas of overlap?What are the styles of coding agents that you are seeing the broadest adoption for?What are your thoughts on the role of editors/IDEs in an AI-driven development workflow?Many of the code generation utilities are executed on a developer's computer in a single-player mode. What are some strategies that you have seen or experimented with to extract and share techniques/best practices/prompt templates at the team level?While there are many AI-powered services that hook into various stages of the software development and delivery lifecycle, what are the areas where you are seeing gaps in the user experience?What are the most interesting, innovative, or unexpected ways that you have seen AI used in the context of software engineering workflows?What are the most interesting, unexpected, or challenging lessons that you have learned while working on developer tooling in the age of AI?When is AI-powered the wrong choice?What do you have planned for the future of AI in the context of Jetbrains?What are your predictions/hopes for the future of AI for software engineering?Contact Info Will VincentParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links JetBrainsSimon WillisonVibe Engineering PostGitHub CopilotAGENTS.mdCopilot AGENTS.md instructionsKiro IDEClaude CodeJetbrains QuickEditClaude Agent in JetBrains IDEsRuff linteruv package managerty type checkerpyreflyIDE == Integrated Development EnvironmentOllamaLM StudioGoogle GemmaDeepseekgpt-ossOllama CloudGemini DiffusionDjango Annual SurveyCo-Intelligence by Ethan Mollick (affiliate link)The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

59 min
OCT 27

From MRI to World Models: How AI Is Changing What We See

Summary In this episode of the AI Engineering Podcast Daniel Sodickson, Chief of Innovation in Radiology at NYU Grossman School of Medicine, talks about harnessing AI systems to truly understand images and revolutionize science and healthcare. Dan shares his journey from linear reconstruction to early deep learning for accelerated MRI, highlighting the importance of domain expertise when adapting models to specialized modalities. He explores "upstream" AI that changes what and how we measure, using physics-guided networks, prior knowledge, and personal baselines to enable faster, cheaper, and more accessible imaging. The conversation covers multimodal world models, cross-disciplinary translation, explainability, and a future where agents flag abnormalities while humans apply judgment, as well as provocative frontiers like "imaging without images," continuous health monitoring, and decoding brain activity. Dan stresses the need to preserve truth, context, and human oversight in AI-driven imaging, and calls for tools that distill core methodologies across disciplines to accelerate understanding and progress. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Daniel Sodickson about the impact and applications of AI that is capable of image understandingInterview IntroductionHow did you get involved in machine learning?Images and vision are concepts that we understand intuitively, but which have a large potential semantic range. How would you characterize the scope and application of imagery in the context of AI and other autonomous technologies?Can you give an overview of the current state of image/vision capabilities in AI systems?A predominant application of machine vision has been for object recognition/tracking. How are advances in AI changing the range of problems that can be solved with computer vision systems?A substantial amount of work has been done on processing of images such as the digital pictures taken by smartphones. As you move to other types of image data, particularly in non-visible light ranges, what are the areas of similarity and in what ways do we need to develop new processing/analysis techniques?What are some of the ways that AI systems will change the ways that we conceive of What are the most interesting, innovative, or unexpected ways that you have seen AI vision used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on imaging technologies and techniques?When is AI the wrong choice for vision/imaging applications?What are your predictions for the future of AI image understanding?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links MRI == Magnetic Resonance ImagingLinear AlgorithmNon-Linear AlgorithmCompressed SensingDictionary Learning AlgorithmDeep LearningCT ScanCambrian ExplosionLIDAR Point CloudSynthetic Aperture RadarGeoffrey HintonCo-Intelligence by Ethan Mollick (affiliate link)TomographyX-Ray CrystallographyCERNCLIP ModelPhysics-Guided Neural NetworkFunctional MRIA Path Toward Autonomous Machine Intelligence by Yann LeCunThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

49 min
OCT 19

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

Summary In this episode Andrew Filev, CEO and founder of ZenCoder, takes a deep dive into the system design, workflows, and organizational changes behind building agentic coding systems. He traces the evolution from autocomplete to truly agentic models, discusses why context engineering and verification are the real unlocks for reliability, and outlines a pragmatic path from “vibe coding” to AI‑first engineering. Andrew shares ZenCoder’s internal playbook: PRD and tech spec co‑creation with AI, human‑in‑the‑loop gates, test‑driven development, and emerging BDD-style acceptance testing. He explores multi-repo context, cross-service reasoning, and how AI reshapes team communication, ownership, and architecture decisions. He also covers cost strategies, when to choose agents vs. manual edits, and why self‑verification and collaborative agent UX will define the next wave. Andrew offers candid lessons from building ZenCoder—why speed of iteration beats optimizing for weak models, how ignoring the emotional impact of vibe coding slowed brand momentum, and where agentic tools fit across greenfield and legacy systems. He closes with predictions for the next year: self‑verification, parallelized agent workflows, background execution in CI, and collaborative spec‑driven development moving code review upstream. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Andrew Filev about the system design and integration strategies behind building coding agents at ZencoderInterview IntroductionHow did you get involved in ML/AI?There have been several iterations of applications for generative AI models in the context of software engineering. How would you characterize the different approaches or categories?Over the course of this summer (2025) the term "vibe coding" gained prominence with the idea that the human just needs to be worried about whether the software does what you ask, not how it is written. How does that sentiment compare to your philosophies on the role of agentic AI in the lifecycle of software?This points at a broader challenge for software engineers in the AI era; how much control can and should we cede to the LLMs, and over what elements of the software process?This also brings up useful questions around the experience of the engineer collaborating with the agent. What are the different interaction patterns that individuals and teams should be thinking of in their use of AI engineering tools?Should the agent be proactive? reactive? what are the triggers for an action to be taken and to what extent?What differentiates a coding agent from an agentic editor?The key challenge in any agent system is context engineering. Software is inherently structured and provides strong feedback loops. But it can also be very messy or difficult to encapsulate in a single context window. What are some of the data structures/indexing strategies/retrieval methods that are most useful when providing guidance to an agent?Software projects are rarely fully self-contained, and often need to cross repository boundaries, as well as manage dependencies. What are some of the more challenging aspects of identifying and accounting for those sometimes implicit relationships?What are some of the strategies that are most effective for yielding productive results from an agent in terms of prompting and scoping of the problem?What are some of the heuristics that you use to determine whether and how to employ an agent for a given task vs. doing it manually?How can the agents assist in the decomposition and planning of complex projects?What are some of the ways that single-player interaction strategies can be turned into team/multi-player strategies?What are some of the ways that teams can create and curate productive patterns to accelerate everyone equally?What are the most interesting, innovative, or unexpected ways that you have seen coding agents used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on coding agents at Zencoder?When is/are Zencoder/coding agents the wrong choice?What do you have planned for the future of Zencoder/agentic software engineering?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links ZencoderWrikeDARPA Robotics ChallengeCognitive ComputingAndrew NgSebastian ThrunGithub CopilotRAG == Retrieval Augmented GenerationRe-rankingClaude Sonnet 3.5SWE-BenchVibe CodingAI First EngineeringWaterfall Software EngineeringAgile Software EngineeringPRD == Project Requirements DocumentBDD == Behavior-Driven DevelopmentVSCodeThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

1h 6m
OCT 11

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

Summary In this episode of the AI Engineering Podcast Lucas Thelosen and Drew Gillson talk about Orion, their agentic analytics platform that delivers proactive, push-based insights to business users through asynchronous thinking with rich organizational context. Lucas and Drew share their approach to building trustworthy analysis by grounding in semantic layers, fact tables, and quality-assurance loops, as well as their focus on accuracy through parallel test-time compute and evolving from probabilistic steps to deterministic tools. They discuss the importance of context engineering, multi-agent orchestration, and security boundaries for enterprise deployments, and share lessons learned on consistency, tool design, user change management, and the emerging role of "AI manager" as a career path. The conversation highlights the future of AI knowledge workers collaborating across organizations and tools while simplifying UIs and raising the bar on actionable, trustworthy analytics. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gillson about their experiences building an agentic analytics platform and the challenges of ensuring accuracy to build trustInterview IntroductionHow did you get involved in machine learning?Can you describe what Orion is and the story behind it?Business analytics is a field that requires a high degree of accuracy and detail because of the potential for substantial impact on the business (positive and negative). These are areas that generative AI has struggled with achieving consistently. What was your process for building confidence in your ability to achieve that threshold before committing to the path you are on now?There are numerous ways that generative AI can be incorporated into the process of designing, building, and delivering analytical insights. How would you characterize the different strategies that data teams and vendors have approached that problem?What do you see as the organizational benefits of moving to a push-based model for analytics?Can you describe the system architecture of Orion?Agentic design patterns are still in the early days of being developed and proven out. Can you give a breakdown of the approach that you are using?How do you think about the responsibility boundaries, communication paths, temporal patterns, etc. across the different agents?Tool use is a key component of agentic architectures. What is your process for identifying, developing, validating, and securing the tools that you provide to your agents?What are the boundaries and extension points that you see when building agentic systems? What are the opportunities for using e.g. A2A for protocol for managing agentic hand-offs?What is your process for managing the experimentation loop for changes to your models, data, prompts, etc. as you iterate on your product?What are some of the ways that you are using the agents that power your system to identify and act on opportunities for self-improvement?What are the most interesting, innovative, or unexpected ways that you have seen Orion used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?When is an agentic approach the wrong choice?What do you have planned for the future of Orion?Contact Info LucasLinkedInDrewLinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links GravityOrion Data Engineering Podcast EpisodeSite Reliability EngineeringAnthropic Claude Sonnet 4.5A2A (Agent2Agent) ProtocolSimon WillisonAI Lethal TrifectaBehavioral ScienceGrounded TheoryLLM as a JudgeRLHF == Reinforcement Learning from Human FeedbackThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

1h 12m
OCT 7

Building Production-Ready AI Agents with Pydantic AI

Summary In this episode of the AI Engineering Podcast Samuel Colvin, creator of Pydantic and founder of Pydantic Inc, talks about Pydantic AI - a type-safe framework for building structured AI agents in Python. Samuel explains why he built Pydantic AI to bring FastAPI-like ergonomics and production-grade engineering to agents, focusing on strong typing, minimal abstractions, and reliability, observability, and stability. He explores the evolving agent ecosystem, patterns for single vs. many agents, graphs vs. durable execution, and how Pydantic AI approaches structured I/O, tool calling, and MCP with type safety in mind. Samuel also shares insights on design trade-offs, model-provider churn, schema unification, safe code execution, security gaps, and the importance of open standards and OpenTelemetry for observability. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Samuel Colvin about the Pydantic AI framework for building structured AI agentsInterview IntroductionHow did you get involved in machine learning?Can you describe what Pydantic AI is and the story behind it?What are the core use cases and capabilities that you are focusing on with PydanticAI?The agent SDK landscape has been incredibly crowded and volatile since the introduction of LangChain and LlamaIndex. Can you give your summary of the current state of the ecosystem?What are the broad categories that you use when evaluating the various frameworks?Beyond the volatility of the frameworks, there is also a rapid pace of evolution in the different styles/patterns of agents. What are the patterns and integrations that Pydantic AI is best suited for?Can you describe the overall design/architecture of the Pydantic AI framework?How have the design and scope evolved since you first started working on it?For someone who wants to build a sophisticated, production-ready AI agent with Pydantic AI, what is your recommended path from idea to deployment?What are the elements of the framework that help engineers across those different stages of the lifecycle?What are some of the key learnings that you gained from all of your efforts on Pydantic that have been most helpful in developing and promoting Pydantic AI?What are some of the new and exciting failure modes that agentic applications introduce as compared to web/mobile/scientific/etc. applications?What are the most interesting, innovative, or unexpected ways that you have seen Pydantic AI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pydantic AI?When is Pydantic AI the wrong choice?What do you have planned for the future of Pydantic AI?Contact Info GitHubLinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links PydanticPydantic AIPydantic IncPydantic LogfireOpenAI AgentsGoogle ADKLangChainLlamaIndexCrewAIDurable ExecutionTemporalMCP == Model Context ProtocolClaude CodeTypescriptGemini Structured OutputOpenAI Structured OutputDottxt Outlines SDKsmolagentsLiteLLMOpenRouterOpenAI Responses APIFastAPISQLModelAI SDK JavaScriptLangGraphNextJSPyodideAI Elements frontend component libraryThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

51 min
SEP 28

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

Summary In this episode of the AI Engineering Podcast Brijesh Tripathi, CEO of Flex AI, talks about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. He discusses Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterview IntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

55 min
SEP 20

Right-Sizing AI: Small Language Models for Real-World Production

Summary In this episode of the AI Engineering Podcast Steven Huels, Vice President of AI Engineering & Product Strategy at Red Hat, talks about the practical applications of small language models (SLMs) for production workloads. He discusses how SLMs offer a pragmatic choice due to their ability to fit on single enterprise GPUs and provide model selection trade-offs. The conversation covers self-hosting vs using API providers, organizational capabilities needed for running production-grade LLMs, and the importance of guardrails and automated evaluation at scale. They also explore the rise of agentic systems and service-oriented approaches powered by smaller models, highlighting advances in customization and deployment strategies. Steven shares real-world examples and looks to the future of agent cataloging, continuous retraining, and resource efficiency in AI engineering. Announcements Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsWhen ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.Your host is Tobias Macey and today I'm interviewing Steven Huels about the benefits of small language models for production workloadsInterview IntroductionHow did you get involved in machine learning?Language models are available in a wide range of sizes, measured both in terms of parameters and disk space. What are your heuristics for deciding what qualifies as a "small" vs. "large" language model?What are the corresponding heuristics for when to use a small vs. large model?The predominant use case for small models is in self-hosted contexts, which requires a certain amount of organizational sophistication. What are some helpful questions to ask yourself when determining whether to implement a model-serving stack vs. relying on hosted options?What are some examples of "small" models that you have seen used effectively?The buzzword right now is "agentic" for AI driven workloads. How do small models fit in the context of agent-based workloads?When and where should you rely on larger models?When speaking of small models, one of the common requirements for making them truly useful is to fine-tune them for your problem domain and organizational data. How has the complexity and difficulty of that operation changed over the past ~2 years?Serving models requires several operational capabilities beyond the raw inference serving. What are the other infrastructure and organizational investments that teams should be aware of as they embark on that path?What are the most interesting, innovative, or unexpected ways that you have seen small language models used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing inference and model customization?When is a small or self-hosted language model the wrong choice?What are your predictions for the near future of small language model capabilities/availability?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.Links RedHat AI EngineeringGenerative AIPredictive AIChatGPTQLORAHuggingFacevLLMOpenShift AILlama ModelsDeepSeekGPT-OSSMistralMixture of Experts (MoE)QwenInstructLabSFT == Supervised Fine TuningLORAThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

51 min

See All (69)

TRAILER

Introducing The Show

Hello, and welcome to the Machine Learning Podcast. I’m your host, Tobias Macey. You might know me from the Data Engineering Podcast or the Python Podcast.__init__. If you work with machine learning and AI, or you’re curious about it and want to learn more, then this show is for you. We’ll go beyond the esoteric research and flashy headlines and find out how machine learning is making an impact on the world and creating value for business. Along the way we’ll be joined by the researchers, engineers, and entrepreneurs who are shaping the industry. So go to themachinelearningpodcast.com today to subscribe and stay informed on how ML/AI are being used, how it works, and how to go from idea to production. Support The Machine Learning Podcast

06/03/2022

•

1 min

4.3

out of 5

6 Ratings

Practical AI

03/13/2023

accidentalguru

Is the title. How much clearer could Tobias be? What is with the 1 Star review? Tobias always makes expectations clear and delivers in a way that only somebody who has made his living in the trenches of data, AI, and Python.

Creator

Tobias Macey
Years Active

2022 - 2025
Episodes

69
Rating

Clean
Show Website

AI Engineering Podcast

Technology

Technology

Updated Weekly
Technology

Technology

Updated Semiweekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Weekly

AI Engineering Podcast

Building the Internet of Agents: Identity, Observability, and Open Protocols

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

From MRI to World Models: How AI Is Changing What We See

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

Building Production-Ready AI Agents with Pydantic AI

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

Right-Sizing AI: Small Language Models for Real-World Production

Trailer

Introducing The Show

Ratings & Reviews

Practical AI

About

Information

You Might Also Like

AI Engineering Podcast

Episodes

Building the Internet of Agents: Identity, Observability, and Open Protocols

Agents, IDEs, and the Blast Radius: Practical AI for Software Engineers

From MRI to World Models: How AI Is Changing What We See

Specs, Tests, and Self‑Verification: The Playbook for Agentic Engineering Teams

From Probabilistic to Trustworthy: Building Orion, an Agentic Analytics Platform

Building Production-Ready AI Agents with Pydantic AI

From GPUs to Workloads: Flex AI’s Blueprint for Fast, Cost‑Efficient AI

Right-Sizing AI: Small Language Models for Real-World Production

Trailer

Ratings & Reviews

About

Information

You Might Also Like