Max Agency

LangChain

Welcome to Max Agency, a podcast about how the best AI agents are actually being built. Hosted by Harrison Chase, CEO of LangChain, each episode goes deep with the builders designing, deploying, and learning from real agent systems in the wild. From architecture decisions to evals, tooling, and failure modes, Max Agency is for people who want to understand what it really takes to build useful agents.

Episodes

  1. The best AI agents are secretly teams | Ben Tannyhill, LangChain

    11 hr ago

    The best AI agents are secretly teams | Ben Tannyhill, LangChain

    Ben Tannyhill is a product manager at LangChain, where he's building LangSmith Engine—an agent that finds and fixes your agent's failures. Engine continuously analyzes your production traces, clusters them into actionable issues, and opens pull requests to fix them. Engine's architecture is a lot like an org chart: a main model delegating to a team of cheaper, faster sub-agents. It launched in public beta at Interrupt 2026, and in this conversation, Ben unpacks why it uses a sandbox as a tool, how the team turned it into a self-improving agent that learns from its own traces, and the hard problem of testing a fix before it ships. – We also discuss: Why Engine is "the agent for agent engineers"Making LangSmith agent-native with condensed trace viewsWhy the team keeps handing more control to the agentInside Engine's four sub-agents: the screener, verifier, and moreGiving Engine memory with an agent overview documentHow to keep an always-on agent from blowing the inference budgetWhere Insights, Polly, and Engine are converging– Timestamps: (00:00) Introduction(01:25) LangSmith 101(02:22) Why Engine is "the agent for agent engineers"(03:49) Under the hood: Engine is a deep agent(06:08) Clustering millions of traces with condensed views(10:10) Why the team keeps handing more control to the agent(13:21) Why Engine uses a sandbox as a tool(14:11) Engine's four sub-agents and the org-chart analogy(16:51) Evals for Engine: IssueBench, Harbor, and synthetic environments(23:05) How Engine evolved: from noisy PRs to an issue inbox(25:56) Inside Engine's memory: the agent overview document(29:25) How to keep an always-on agent from blowing the inference budget(30:52) What models Engine uses(31:30) How Engine was rolled out: from Forge to public beta at Interrupt(34:18) Inside the two teams building Engine(35:53) Where Insights, Polly, and Engine are converging(40:06) The missing piece: testing a fix before it ships(42:22) Running a branched agent, and the write-access eval problem(46:35) Using Engine as long-term memory(47:39) Pointing Engine at coding-agent traces(48:49) Running Engine on Engine: the meta self-improvement loop – References: AnthropicChat LangChainClaude CodeClaude HaikuClaude OpusCodexContext HubCredit GenieDeep AgentsGeminiGPT-5.5HarborHexInsightsInterruptLangGraphLangSmithLangSmith Chat (formerly Polly)LangSmith EngineLangSmith ObservabilityMintlifyOpenAIPalash ShahTerminal-BenchUnify– Where to find Ben: LinkedInTwitter/X– Where to find Harrison: LinkedInTwitter/X– Where to find LangChain: WebsiteDocs– Send feedback or questions to maxagency@langchain.dev

    50 min
  2. The best AI agents are simpler than you think | Zack Reneau-Wedeen, Sierra

    18 Jun

    The best AI agents are simpler than you think | Zack Reneau-Wedeen, Sierra

    Zack Reneau-Wedeen is the Head of Product at Sierra, the conversational AI platform behind customer-facing agents for most of the Fortune 20. Before Sierra, he spent seven years at Google as the founding PM for Google Lens and Google Podcasts, then led product at Robinhood and CoinTracker. Sierra is mostly known for customer support, but Zack reveals how and why the company is building agents that span the entire customer lifecycle, from browsing and booking to sales and loyalty. In this conversation, he argues agentic commerce will be bigger than e-commerce, explains why he's a "monolith loyalist", and unpacks why, when a model looks dumb, the problem is usually you. – We also discuss: How Sierra's no-code layer compiles down to agent code, and back againWhy most multi-agent systems just ship your org chartInside Sierra's modular voice architecture: thinking, listening, and talking in parallelWhy Sierra built a PCI-certified stack for voice paymentsHow outcome-based pricing aligns incentivesWhy there's no breakout memory company– Timestamps: (00:00) Introduction (03:39) Analyze, build, release: how you build on Sierra (07:54) Inside Ghostwriter (11:04) Meeting models on their turf “80% of the time (17:47) The one constraint Claude Code doesn't have (19:35) Agent-to-agent: when an API call still beats MCP (21:02) Why agentic commerce will be bigger than e-commerce (27:31) Running models in parallel and ensembling transcription (32:22) Inside the Agent Data Platform (40:00) Context engineering: everything it needs, nothing more (41:38) "Whenever you think the model's too dumb, the model's actually too smart" (46:13) Why multi-agent systems are a trap (48:44) Voice 101: latency, naturalism, and 60 languages (56:11) When voice-to-voice passes 50%: the over/under (57:03) Making memory a first-class primitive (1:02:47) Why there's no breakout memory company (1:08:02) Why the solution to all AI problems "is more AI" (1:09:20) Why Sierra open-sources the tau-bench universe (1:14:42) How outcome-based pricing aligns incentives (1:20:26) Who thrives as a forward-deployed agent builder (1:22:16) The Formula One analogy: why product is the bottleneck (1:25:47) How Sierra interviews for agency – References: Agent2Agent (A2A) ProtocolAnthropicChatGPTClaudeClaude CodeClaude MythosClaude Opus 4.5CodexDeep AgentsGeminiHawaiian AirlinesLangGraphModel Context Protocol (MCP)Not Another Workflow BuilderRedfinSentryShopifySileroSiriusXMStripeTau-benchThinking Machines Lab– Where to find Zack: LinkedInTwitter/XSierra– Where to find Harrison: LinkedInTwitter/X– Where to find LangChain: WebsiteDocs– Send feedback or questions to maxagency@langchain.dev

    1hr 27min
  3. The tool design tricks behind Benchling's AI agents | Nick Larus-Stone

    4 Jun

    The tool design tricks behind Benchling's AI agents | Nick Larus-Stone

    Nick Larus-Stone is the Head of AI at Benchling, the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around for since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio, the analysis startup he founded. In this conversation, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down. – We also discuss: Why Benchling invests so heavily in getting clean data upfrontHow they cross-check answers between models to get more out of each oneWhy and how Benchling leans on production tracesWhere AI actually helps science today, and where it still gets stuckWhy understanding LLMs is closer to biology than software engineering– Timestamps: (00:00) Intro (01:22) What Benchling AI is, and the 14-year data platform underneath it (04:36) Why a decade of structured data is a core advantage (05:57) The architecture under the hood (08:28) Similarities and differences compared to a coding harness (11:14) Benchling’s multi-agent architectures (14:36) Dealing with verifiable vs non-verifiable tasks (16:19) Doing evals when clean benchmarks aren’t possible (18:13) Context engineering: SQL vs. file-based harnesses (22:11) Memory: agents that create and update their own skills (25:30) What user education for scientists looks like (30:33) Why understanding LLMs is closer to biology than software (33:28) When will agents discover a novel cure for disease? (44:58) The future of harnesses in science (48:13) Why fine-tuning on biology hasn't beaten frontier models – References: Agent Skills (Claude Docs)Benchling’s Deep Research AgentClaude (Anthropic)Design of experiments (DOE)FDA Investigational New Drug (IND) applicationGemini (Google)Google AI co-scientistLangSmithModel Context Protocol (MCP)The Ralph (Wiggum) Loop (Geoffrey Huntley)Sphinx Bio– Where to find Nick: BenchlingLinkedInTwitter/X– Where to find Harrison: LinkedInTwitter/X– Where to find LangChain: WebsiteDocs– Send feedback or questions to maxagency@langchain.dev

    51 min
  4. How Cogent builds AI agents that have to be right every single time | Geng Sng (Co-founder & CTO - Cogent)

    22 May

    How Cogent builds AI agents that have to be right every single time | Geng Sng (Co-founder & CTO - Cogent)

    Geng Sng is co-founder and CTO of Cogent, which builds autonomous agents that remediate vulnerabilities for enterprise security teams. Today, Cogent's agents process billions of security events per day, maintaining a live context graph of every asset and vulnerability across customer environments. In this conversation, Geng walks through Cogent's hot vs cold context split, the sub-agents that handle side quests, and the two graphs they run in parallel. – We also discuss: Why defensive security is harder for AI than offensiveUnder the hood of Cogent's three agentsInside Cogent's “read only” by-default sandboxesWhy graph databases don't scale for security dataCogent Research and the move into formal verificationWhy interactive agents need a deeper planning phase to one-shot– Referenced: Abnormal AIAmazon S3AnthropicBashChatGPTClaude CodeClaude MythosCodeMenderCodexCogentCursorGoogle DeepMindGPT-5.5-CyberJupyterLettaMozillaOpenAIOpus 4.6Opus 4.7Vercel– Where to find Geng: LinkedIn– Where to find Harrison: LinkedInTwitter/X– Where to find LangChain: WebsiteDocs– Send feedback or questions to maxagency@langchain.dev – Timestamps: (00:00) Why mean time to exploit collapsed from years to minutes (02:08) Inside Cogent's Agent Lake architecture (05:11) Why Cogent rejected graph databases (10:48) The trust ladder before agents touch production (15:13) The three types of agents inside Cogent (17:07) How Cogent sandboxes its agents (19:16) Short-circuiting interactive agents with a deeper planning phase (24:31) What to do when users believe agents too much (31:21) Why sub-agents let agents go on side quests (34:59) Two-tiered evals and the metric that catches bad prompts (40:00) Cogent’s unique approach to context (48:39) Cogent Research and the move into formal verification (51:33) The single trait Cogent hires for (54:00) Open-sourcing models within six months (57:07) Why defensive security won’t be commoditized anytime soon (1:00:51) The founding insight behind Cogent

    1hr 15min
  5. How Ramp built an AI agent that can think outside of tokens | Alex Shevchenko

    7 May

    How Ramp built an AI agent that can think outside of tokens | Alex Shevchenko

    Alexander Shevchenko is the head of applied research at Ramp, where he leads Ramp Labs – the team behind Ramp Sheets and a steady stream of public AI engineering experiments. Ramp Sheets started as an internal process mining tool that turned Loom videos of accountants into Markov diagrams, before evolving into the agentic spreadsheet editor that shipped in November. In this conversation, Alex walks through the architecture under the hood, why Ramp biases the agent toward Excel formulas over Python code gen, and two recent Labs experiments: Latent Briefing and a user-steerable revival of Golden Gate Claude. We also discuss: Under the hood of Ramp SheetsInspect, Ramp's internal coding agent, and the self-improving monitor loop it powersWhy finance professionals rejected code gen as too "black box"Why Anthropic models tend to excel at agentic spreadsheet manipulationThe case for putting the agent outside the sandbox, not inside itThe Loom-to-Markov-diagram process mining pipelineRLMs and how subagents can share memory in latent spaceLatent Briefing and KV-cache communication between subagentsReviving Golden Gate Claude with steering vectors on Gemma Referenced: Alex LevinsonAnthropicBen GeistClaudeEfficient Memory Sharing for Multi-Agent Systems via KV Cache Compaction (Ben Geist)GemmaGolden Gate ClaudeGraphvizInspectLatent BriefingLoomModalOpenAIOpusQwenRampRamp LabsRamp SheetsRecursive Language Models (Alex Zhang)RetoolSelf-maintaining Ramp SheetsSteer AI Where to find Alex: LinkedInTwitter/XWebsite Where to find Harrison: LinkedInTwitter/X Where to find LangChain: WebsiteDocs Send feedback or questions to maxagency@langchain.dev Timestamps: (00:00) Introduction (01:13) The origin of Ramp Sheets (02:27) The Loom-to-Markov-diagram process mining pipeline (04:28) Why code gen approaches felt too "black box" to finance (06:13) Meeting finance where they already are: inside the spreadsheet (09:08) How far process mining got them (10:31 )Text descriptions and Graphviz DAGs as output (12:41) Under the hood of Ramp Sheets (14:52) Why the agent uses Python only as an escape hatch (15:47) Why Anthropic models excel at agentic spreadsheet manipulation (17:12) Frankensteining the OpenAI Agents SDK (17:43) The Ramp Sheets UX and fast vs. expert mode (19:58) Agent in a sandbox vs. agent with a sandbox (21:55) Vibe evals with expert humans (23:40) Inspect, the internal coding agent (24:13) The self-monitoring loop and auto-PRs (28:01) Other wacky experiments on Sheets (28:43) Memory experiments that didn't pan out (31:16) Latent Briefing and KV-cache subagent communication (35:13) Reviving Golden Gate Claude (37:47) Contrastive pairs and steering vectors (39:47) Picking the right layers in Gemma (41:37) What Ramp Labs looks for when hiring

    44 min
  6. How Listen is building a system of AI Agents & subagents for specialized tasks | Florian Juengermann, CTO

    23 Apr

    How Listen is building a system of AI Agents & subagents for specialized tasks | Florian Juengermann, CTO

    Florian Juengermann is the co-founder and CTO of Listen, an AI startup that turns qualitative research across hundreds of interviews, surveys, and focus groups into structured, traceable insights. Listen's agents analyze responses at scale, and Florian has rearchitected the system multiple times to get there. In this conversation, he walks through the virtual table architecture at the core of their Research Agent, how small models run map-reduce classification across thousands of open-ended responses, and the self-reviewing feedback subagent that catches errors during long async runs. We also discuss: The three agents inside Listen's platformHow Listen rearchitected from a simple RAG bot to a multi-agent system multiple timesWhy the PowerPoint subagent was completely rebuilt using Claude's code SDKContextual prompt engineering as an alternative to skillsHow Listen keeps report numbers live as new interview responses come inWhen to trigger the long-running agent vs. showing early resultsWhat Florian looks for when hiring agent engineers References: AnthropicChatGPTClaudeClaude Code SDKE2BEmotional IntelligenceGPT MiniHaikuListenOpenAIPandasPostgresPythonResearch AgentRenderZoom Where to find Florian: LinkedInTwitter/X Where to find Harrison: LinkedInTwitter/X Where to find LangChain: WebsiteDocs Send feedback or questions to maxagency@langchain.dev Timestamps (00:00) Introduction (01:25) The three agents inside Listen's platform (03:15) Live chat vs. long async runs, and how Listen tunes for each (05:33) Under the hood of the Research Agent (06:37) Listen's virtual table architecture (07:34) How small models classify thousands of open-ended responses (10:05) Running code in a sandbox: how E2B fits in (11:52) Why Listen rebuilt the PowerPoint subagent from scratch (14:11) Contextual prompt engineering instead of skills (16:32) The feedback subagent that reviews its own reports (18:14) How Listen runs evals in production (19:47) Unexpected ways users push the agent to its limits (21:42) How many times Listen has rearchitected, and why (24:59) Trace observability: depth over breadth (26:10) Lessons from running Claude Code SDK inside E2B (27:42) Memory: what's solved and what isn't (29:10) The Composer agent UX: co-editing a document with AI (35:50) How Listen keeps report numbers live as new responses come in (43:47) What Listen looks for when hiring agent engineers

    48 min
  7. How Hex builds AI agents that reason like human data analysts | Izzy Miller, AI Engineer

    9 Apr

    How Hex builds AI agents that reason like human data analysts | Izzy Miller, AI Engineer

    Izzy Miller is an AI engineer at Hex, an AI analytics platform that was one of the first companies to ship data agents to real paying users. Today, Hex runs a multi-agent system with nearly 100K tokens of tools, and Izzy is building a 90-day simulation to evaluate whether those agents actually get smarter over time. In this conversation, he walks through the harness decisions that shaped their architecture, the failure modes Hex is seeing at scale, and what it takes to build an eval that no current model can pass. We also discuss: Why data agents are harder to verify than coding agentsUnder the hood of Hex’s agentsHow Hex is unifying separate agentsWhy most eval sets are badThe 90-day simulation for long-horizon evalsHow Izzy went from marketing to AI engineer References: Andon LabsAnthropicBarry McCardelChatGPTClaude CodeClaude Sonnet 4.6DBTGPT-3.5 TurboGPT-5.3 Codex SparkGPT-5.4HexLangChainLangSmithLookerOpenAIOpus 4.6Satya NadellaSnowflakeVending Machine Where to find Izzy: LinkedInTwitter/X Where to find Harrison: LinkedInTwitter/X Where to find LangChain: WebsiteDocs Send feedback or questions to maxagency@langchain.dev Timestamps: (01:35) Where Hex's notebook agent started (03:46) The moment Hex knew it was time for agents (07:36) Why data agents are harder to verify than coding agents (09:30) How Hex is unifying separate agents (13:28) Under the hood of the notebook agent (15:41) The harness features that are now holding the agent back (17:41) Why Hex built their own orchestrator (18:59) Managing nearly 100K tokens of tools (20:49) Ephemeral queries and agent behavior trade-offs (24:46) The UX problem with showing agents' thinking (27:28) Why verification is harder than transparency for data agents (31:00) Memory, context conflicts, and collapse modes (34:38) How Hex built their internal eval system (39:29) Why most eval sets are bad (44:30) The 900% quota eval that every model fails (46:55) Model upgrades and the "in distribution" debate (51:34) How Izzy went from marketer to AI engineer (59:59) The 90-day simulation for long-horizon evals

    1hr 8min

About

Welcome to Max Agency, a podcast about how the best AI agents are actually being built. Hosted by Harrison Chase, CEO of LangChain, each episode goes deep with the builders designing, deploying, and learning from real agent systems in the wild. From architecture decisions to evals, tooling, and failure modes, Max Agency is for people who want to understand what it really takes to build useful agents.

You Might Also Like