How AI Is Built

Nicolay Gerold

5.0(6개의 평가)
과학 기술
매주 업데이트

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

1일 전

#051 Build systems that can be debugged at 4am by tired humans with no context

Nicolay here, Today I have the chance to talk to Charity Majors, CEO and co-founder of Honeycomb, who recently has been writing about the cost crisis in observability. "Your source of truth is production, not your IDE - and if you can't understand your code there, you're flying blind." The key insight is architecturally simple but operationally transformative: replace your 10-20 observability tools with wide structured events that capture everything about a request in one place. Most teams store the same request data across metrics, logs, traces, APM, and error tracking - creating a 20X cost multiplier while making debugging nearly impossible because you're reconstructing stories from fragments. Charity's approach flips this: instrument once with rich context, derive everything else from that single source. This isn't just about cost - it's about giving engineers the connective tissue to understand distributed systems. When you can correlate "all requests failing from Android version X in region Y using language pack Z," you find problems in minutes instead of days. The second is putting developers on call for their own code. This creates the tight feedback loop that makes engineers write more reliable software - because nobody wants to get paged at 3am for their own bugs. In the podcast, we also touch on: Why deploy time is the foundational feedback loop (15 minutes vs 15 hours changes everything)The controversial "developers on call" stance and why ops people rarely found companiesHow microservices made everything trace-shaped and killed traditional metrics approachesThe "normal engineer" philosophy - building for 4am debugging, not peak performanceAI making "code of unknown quality" the new normalProgressive deployment strategies (kibble → dogfood → production)and more💡 Core Concepts Wide Structured Events: Capturing all request context in one instrumentation event instead of scattered log lines - enables correlation analysis that's impossible with fragmented data.Observability 2.0: Moving from metrics-as-workhorse to structured-data-as-workhorse, where you instrument once and derive metrics/alerts/dashboards from the same rich dataset.SLO-based Alerting: Replacing symptom alerts (CPU, memory, disk) with customer-impact alerts that measure whether you're meeting promises to users.Progressive Deployment: Gradual rollout through staged environments (kibble → dogfood → production) that builds confidence without requiring 2X infrastructure.Trace-shaped Systems: Architecture pattern recognizing that distributed systems problems are fundamentally about correlating events across time and services, not isolated metrics.📶 Connect with Charity: LinkedInBlueskyPersonal BlogCompany📶 Connect with Nicolay: LinkedInX / TwitterWebsite⏱️ Important Moments Gateway Drug to Engineering: [01:04] How IRC and bash tab completion sparked Charity's fascination with Unix command line possibilitiesADHD and Incident Response: [01:54] Why high-pressure outages brought out her best work - getting "dead calm" when everything's brokenCode vs. Production Reality: [02:56] Evolution from focusing on code beauty to understanding performance, behavior, and maintenance over timeThe Alexander's Horse Principle: [04:49] Auto-deployment as daily practice - if you grow up deploying constantly, it feels natural by the time you scaleProduction as Source of Truth: [06:32] Why your IDE output doesn't matter if you can't understand your code's intersection with infrastructure and usersThe Logging Evolution: [08:03] Moving from debugger-style spam logs to fewer, wider structured events oriented around units of workBubble Up Anomaly Detection: [10:27] How correlating dimensions reveals that failures cluster around specific Android versions, regions, and feature combinationsEverything is Trace-Shaped: [12:45] Why microservices complexity is about locating problems in distributed systems, not just identifying themAI as Acceleration of Automation: [15:57] Most AI panic could be replaced with "automation" - it's the same pattern, just faster feedback loopsNon-determinism as Genuinely New: [16:51] The one aspect of AI that's actually novel in software systems, requiring new architectural patternsThe Cost Crisis: [22:30] How 10-20 observability tools create unsustainable cost multipliers as businesses scaleSLO Revolution: [28:40] Deleting 90% of alerts by focusing on customer impact instead of system symptomsShrinking Feedback Loops: [34:28] Keeping deploy-to-validation under one hour so engineers can connect actions to outcomesNormal Engineer Design: [38:12] Building systems that work for tired humans at 4am, not just heroes during business hoursThe Instrumentation Habit: [23:15] Always looking at your code in production after deployment to build informed instincts about system behaviorProgressive Deployment Strategy: [36:43] Kibble → Dog Food → Production pipeline for gradual confidence buildingReal Engineering Bar: [49:00] Discussion on what actually makes exceptional vs normal engineers🛠️ Tools & Tech Mentioned Honeycomb - Observability platform for structured eventsOpenTelemetry - Vendor-neutral instrumentation frameworkIRC - Early gateway to computingParse - Mobile backend where Honeycomb's origin story began📚 Recommended Resources "In Praise of Normal Engineers" - Charity's blog post"How I Failed" by Tim O'Reilly"Looking at the Crux" by Richard Rumelt"Fluke" - Book about randomness in history"Engineering Management for the Rest of Us" by Sarah Dresner

1시간 6분
5월 27일

#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

Nicolay here, Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production. Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant. His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use. The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why. Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of." Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself. In the podcast, we also cover: Why fine-tuning is almost always the wrong choiceThe "just-in-time" learning approach for staying sane in AIBuilding writing assistants that actually preserve your voiceWhy robots, not chatbots, are the real endgame💡 Core Concepts Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop. React: Agents that Reason, Act, and Observe in a loopReflection: Agents that review and improve their own outputsFine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."📶 Connect with Paul: LinkedInX / TwitterNewsletterGitHubBook📶 Connect with Nicolay: LinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.🛠️ Tools & Tech Mentioned LangGraph (for prototyping only)Temporal (durable execution)DBOS (simpler orchestration)Restate (developer-friendly orchestration)Ray (distributed compute)UV (Python packaging)Prefect (workflow orchestration)📚 Recommended Resources The Economist Style Guide (for writing)Brandon Sanderson's Writing Approach (worldbuilding first)LangGraph Academy (free, covers agent patterns)Ray Documentation (Paul's next deep dive)🔮 What's Next Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

1시간 7분
5월 27일

#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

Nicolay here, Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production. Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant. His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use. The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why. Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of." Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself. In the podcast, we also cover: Why fine-tuning is almost always the wrong choiceThe "just-in-time" learning approach for staying sane in AIBuilding writing assistants that actually preserve your voiceWhy robots, not chatbots, are the real endgame💡 Core Concepts Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop. React: Agents that Reason, Act, and Observe in a loopReflection: Agents that review and improve their own outputsFine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."📶 Connect with Paul: LinkedInX / TwitterNewsletterGitHubBook📶 Connect with Nicolay: LinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.🛠️ Tools & Tech Mentioned LangGraph (for prototyping only)Temporal (durable execution)DBOS (simpler orchestration)Restate (developer-friendly orchestration)Ray (distributed compute)UV (Python packaging)Prefect (workflow orchestration)📚 Recommended Resources The Economist Style Guide (for writing)Brandon Sanderson's Writing Approach (worldbuilding first)LangGraph Academy (free, covers agent patterns)Ray Documentation (Paul's next deep dive)🔮 What's Next Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

11분
5월 20일

#049 BAML: The Programming Language That Turns LLMs into Predictable Functions

Nicolay here, I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points. If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems. If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software. Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function. It’s like swapping duct-taped Python scripts for a purpose-built compiler. Vaibhav advocates for building first principle based primitives. One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts. Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs. We also cover: Why JSON constraints are the wrong hammer—and how Schema-Aligned Parsing fixes itWhether “durable” should be a first-class keyword (think async/await for crash-safety)Shipping multi-language AI pipelines without forcing a Python microserviceToken-bloat surgery, symbol tuning, and the myth of magic promptsHow to keep humans sharp when 98 % of agent outputs are already correct💡 Core Concepts Schema-Aligned Parsing (SAP)Parse first, panic later. The model can handle Markdown, half-baked YAML, or rogue quotes—SAP puts it into your declared type or raises. No silent corruption.Symbol TuningLabels eat up tokens and often don’t help with your accuracy (in some cases they even hurt). Rename PasswordReset to C7, keep the description human-readable.Durable ExecutionDurable execution refers to a computing paradigm where program execution state persists despite failures, interruptions, or crashes. It ensures that operations resume exactly where they left off, maintaining progress even when systems go down.Prompt CompressionEvery extra token is latency, cost, and entropy. Axe filler words until the prompt reads like assembly. If output degrades, you cut too deep—back off one line.📶 Connect with Vaibhav: LinkedInX / TwitterBAML📶 Connect with Nicolay: NewsletterLinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments New DSL vs. Python Glue [00:54]Why bolting yet another microservice onto your stack is cowardice; BAML compiles instead of copies.Three-Nines on Flaky Models [04:27]Designing retries, fallbacks, and human overrides when GPT eats dirt 5 % of the time.Native Go SDK & OpenAPI Fatigue [06:32]Killing thousand-line generated clients; typing go get instead.“LLM = Pure Function” Mental Model [15:58]Replace mysticism with f(input) → output; unit-test like any other function.Tool-Calling as a Switch Statement [18:19]Multi-tool orchestration boils down to switch(action) {…}—no cosmic “agent” needed.Sneak Peek—durable Keyword [24:49]Crash-safe workflows without shoving state into S3 and praying.Symbol Tuning Demo [31:35]Swapping verbose labels for C0,C1 slashes token cost and bias in one shot.Inside SAP Coercion Logic [47:31]Int arrays to ints, scalars to lists, bad casts raise—deterministic, no LLM in the loop.Frameworks vs. Primitives Rant [52:32]Why BAML ships primitives and leaves the “batteries” to you—less magic, more control.🛠️ Tools & Tech Mentioned BAML DSL & PlaygroundTemporal • Prefect • DBOSoutlines • Instructor • LangChain📚 Recommended Resources BAML DocsSchema-Aligned Parsing (SAP)🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests: Hit subscribe right now to help me understand what content resonates with youIf you found value in this post, share it with one other developer or tech professional who's working with AIThat's our agreement - I deliver actionable AI insights, you help grow this. ♻️

1시간 3분
5월 20일

#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions

Nicolay here, I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points. If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems. If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software. Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function. It’s like swapping duct-taped Python scripts for a purpose-built compiler. Vaibhav advocates for building first principle based primitives. One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts. Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs. We also cover: Why JSON constraints are the wrong hammer—and how Schema-Aligned Parsing fixes itWhether “durable” should be a first-class keyword (think async/await for crash-safety)Shipping multi-language AI pipelines without forcing a Python microserviceToken-bloat surgery, symbol tuning, and the myth of magic promptsHow to keep humans sharp when 98 % of agent outputs are already correct💡 Core Concepts Schema-Aligned Parsing (SAP)Parse first, panic later. The model can handle Markdown, half-baked YAML, or rogue quotes—SAP puts it into your declared type or raises. No silent corruption.Symbol TuningLabels eat up tokens and often don’t help with your accuracy (in some cases they even hurt). Rename PasswordReset to C7, keep the description human-readable.Durable ExecutionDurable execution refers to a computing paradigm where program execution state persists despite failures, interruptions, or crashes. It ensures that operations resume exactly where they left off, maintaining progress even when systems go down.Prompt CompressionEvery extra token is latency, cost, and entropy. Axe filler words until the prompt reads like assembly. If output degrades, you cut too deep—back off one line.📶 Connect with Vaibhav: LinkedInX / TwitterBAML📶 Connect with Nicolay: NewsletterLinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments New DSL vs. Python Glue [00:54]Why bolting yet another microservice onto your stack is cowardice; BAML compiles instead of copies.Three-Nines on Flaky Models [04:27]Designing retries, fallbacks, and human overrides when GPT eats dirt 5 % of the time.Native Go SDK & OpenAPI Fatigue [06:32]Killing thousand-line generated clients; typing go get instead.“LLM = Pure Function” Mental Model [15:58]Replace mysticism with f(input) → output; unit-test like any other function.Tool-Calling as a Switch Statement [18:19]Multi-tool orchestration boils down to switch(action) {…}—no cosmic “agent” needed.Sneak Peek—durable Keyword [24:49]Crash-safe workflows without shoving state into S3 and praying.Symbol Tuning Demo [31:35]Swapping verbose labels for C0,C1 slashes token cost and bias in one shot.Inside SAP Coercion Logic [47:31]Int arrays to ints, scalars to lists, bad casts raise—deterministic, no LLM in the loop.Frameworks vs. Primitives Rant [52:32]Why BAML ships primitives and leaves the “batteries” to you—less magic, more control.🛠️ Tools & Tech Mentioned BAML DSL & PlaygroundTemporal • Prefect • DBOSoutlines • Instructor • LangChain📚 Recommended Resources BAML DocsSchema-Aligned Parsing (SAP)🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests: Hit subscribe right now to help me understand what content resonates with youIf you found value in this post, share it with one other developer or tech professional who's working with AIThat's our agreement - I deliver actionable AI insights, you help grow this. ♻️

1시간 13분
5월 13일

#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read

Nicolay here, most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos. Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”. It’s like the 10 commandments, but for building agents. One of it is “Contact human with tool calls”: the LLM can call humans for high-stakes decisions or “writes”. The key insight is brutally simple. AI can get to 90% accuracy on most tasks - good enough for spam-like activities but disastrous for anything that requires trust. The solution isn't to wait for models to get smarter; it's to add a human approval layer for critical actions. Imagine you are writing to a database or sending an email. Each “write” has to be approved by a human. So you post the email in a Slack channel and in most cases, your sales people will approve. In the 10%, it’s stopped in its tracks and the human can take over. You stop the slop and get good training data in the mean time. Dexter’s company is building exactly this: an approval mechanism that lets AI agents send requests to humans before executing. In the podcast, we also touch on a bunch of other things: MCP and that they are (atm) just a thin clientAre we training LLMs toward mediocrity?What infrastructure do we need for human in the loop (e.g. DBOS)?and more💡 Core Concepts Context Engineering: Crafting the information representation for LLMs - selecting optimal data structures, metadata, and formats to ensure models receive precisely what they need to perform effectively.Token Bloat Prevention: Ruthlessly eliminating irrelevant information from context windows to maintain agent focus during complex tasks, preventing the pattern of repeating failed approaches.Human-in-the-loop Approval Flows: Achieving 99% reliability through a "90% AI + 10% human oversight" framework where agents analyze data and suggest actions but request explicit permission before execution.Rubric Engineering: Systematically evaluating AI outputs through dimension-specific scoring criteria to provide precise feedback and identify exceptional results, helping escape the trap of models converging toward mediocrity.📶 Connect with Dexter: LinkedInX / TwitterCompany📶 Connect with Nicolay: LinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments MCP Servers as Clients: [03:07] Dexter explains why what many call "MCP servers" actually function more like clients when examining the underlying code.Authentication Challenges: [04:45] The discussion shifts to how authentication should be handled in MCP implementations and whether it belongs in the protocol.Asynchronous Agent Execution: [08:18] Exploring how to handle agents that need to pause for human input without wasting tokens on continuous polling.Token Bloat Prevention: [14:41] Strategies for keeping context windows focused and efficient, moving beyond standard chat formats.Context Engineering: [29:06] The concept that everything in AI agent development ultimately comes down to effective context engineering.Fine-tuning vs. RAG for Writing Style: [20:05] Contrasting personal writing style fine-tuning versus context window examples.Generating Options vs. Deterministic Outputs: [19:44] The unexplored potential of having AI generate diverse creative options for human selection.The "Mediocrity Convergence" Question: [37:11] The philosophical concern that popular LLMs may inevitably trend toward average quality.Data Labeling Interfaces: [35:25] Discussion about the need for better, lower-friction interfaces to collect human feedback on AI outputs.Human-in-the-loop Approval Flows: [42:46] The core approach of HumanLayer, allowing agents to ask permission before taking action.🛠️ Tools & Tech Mentioned MCPOpenControlDBOSTemporalCursor📚 Recommended Resources 12 Factor AgentsBAML DocsRubric Engineering🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Vibhav from BAML. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. I am trying to produce the best content possible - informative, actionable, and engaging. I'm asking for two things: hit subscribe now to show me what content you like (so I can do more of it), and if this episode helped you, pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

7분
5월 11일

#048 Why Your AI Agents Need Permission to Act, Not Just Read

Nicolay here, most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos. Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”. It’s like the 10 commandments, but for building agents. One of it is “Contact human with tool calls”: the LLM can call humans for high-stakes decisions or “writes”. The key insight is brutally simple. AI can get to 90% accuracy on most tasks - good enough for spam-like activities but disastrous for anything that requires trust. The solution isn't to wait for models to get smarter; it's to add a human approval layer for critical actions. Imagine you are writing to a database or sending an email. Each “write” has to be approved by a human. So you post the email in a Slack channel and in most cases, your sales people will approve. In the 10%, it’s stopped in its tracks and the human can take over. You stop the slop and get good training data in the mean time. Dexter’s company is building exactly this: an approval mechanism that lets AI agents send requests to humans before executing. In the podcast, we also touch on a bunch of other things: MCP and that they are (atm) just a thin clientAre we training LLMs toward mediocrity?What infrastructure do we need for human in the loop (e.g. DBOS)? and more💡 Core Concepts Context Engineering: Crafting the information representation for LLMs - selecting optimal data structures, metadata, and formats to ensure models receive precisely what they need to perform effectively.Token Bloat Prevention: Ruthlessly eliminating irrelevant information from context windows to maintain agent focus during complex tasks, preventing the pattern of repeating failed approaches.Human-in-the-loop Approval Flows: Achieving 99% reliability through a "90% AI + 10% human oversight" framework where agents analyze data and suggest actions but request explicit permission before execution.Rubric Engineering: Systematically evaluating AI outputs through dimension-specific scoring criteria to provide precise feedback and identify exceptional results, helping escape the trap of models converging toward mediocrity.📶 Connect with Dexter: LinkedInX / TwitterCompany📶 Connect with Nicolay: LinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important Moments MCP Servers as Clients: [03:07] Dexter explains why what many call "MCP servers" actually function more like clients when examining the underlying code.Authentication Challenges: [04:45] The discussion shifts to how authentication should be handled in MCP implementations and whether it belongs in the protocol.Asynchronous Agent Execution: [08:18] Exploring how to handle agents that need to pause for human input without wasting tokens on continuous polling.Token Bloat Prevention: [14:41] Strategies for keeping context windows focused and efficient, moving beyond standard chat formats.Context Engineering: [29:06] The concept that everything in AI agent development ultimately comes down to effective context engineering.Fine-tuning vs. RAG for Writing Style: [20:05] Contrasting personal writing style fine-tuning versus context window examples.Generating Options vs. Deterministic Outputs: [19:44] The unexplored potential of having AI generate diverse creative options for human selection.The "Mediocrity Convergence" Question: [37:11] The philosophical concern that popular LLMs may inevitably trend toward average quality.Data Labeling Interfaces: [35:25] Discussion about the need for better, lower-friction interfaces to collect human feedback on AI outputs.Human-in-the-loop Approval Flows: [42:46] The core approach of HumanLayer, allowing agents to ask permission before taking action.🛠️ Tools & Tech Mentioned MCPOpenControlDBOSTemporalCursor📚 Recommended Resources 12 Factor AgentsBAML DocsRubric Engineering🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Vibhav from BAML. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. I am trying to produce the best content possible - informative, actionable, and engaging. I'm asking for two things: hit subscribe now to show me what content you like (so I can do more of it), and if this episode helped you, pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

57분
3월 27일

#047 Architecting Information for Search, Humans, and Artificial Intelligence

Today on How AI Is Built, Nicolay Gerold sits down with Jorge Arango, an expert in information architecture. Jorge emphasizes that aligning systems with users' mental models is more important than optimizing backend logic alone. He shares a clear framework with four practical steps: Key Points: Information architecture should bridge user mental models with system data modelsInformation's purpose is to help people make better choices and act more skillfullyWell-designed systems create learnable (not just "intuitive") interfacesContext and domain boundaries significantly impact user understandingProgressive disclosure helps accommodate users with varying expertise levelsChapters 00:00 Introduction to Backend Systems00:36 Guest Introduction: Jorge Arango01:12 Podcast Dynamics and Guest Experiences01:53 Timeless Principles in Technology02:08 Interesting Conversations and Learnings04:04 Physical vs. Digital Organization04:21 Smart Defaults and System Maintenance07:20 Data Models and Conceptual Structures08:53 Designing User-Centric Systems10:20 Challenges in Information Systems10:35 Understanding Information and Choices15:49 Clarity and Context in Design26:36 Progressive Disclosure and User Research37:05 The Role of Large Language Models54:59 Future Directions and New Series (MLOps)Information Architecture Fundamentals What Is Information? Information helps people make better choices to act more skillfullyExample: "No dog pooping" signs help predict consequences of actionsPoor information systems fail to provide relevant guidance for users' needsMental Models vs. Data Models Systems have underlying conceptual structures that should reflect user mental modelsData models make these conceptual models "normative" in the infrastructureDesigners serve as translators between user needs and technical implementationGoal: Users should think "the person who designed this really gets me"Design Strategies for Complex Systems Progressive Disclosure Present simple interfaces by default with clear paths to advanced functionalityExample: HyperCard - visual interface for beginners with programming layer for expertsAllows both novice and expert users to use the same system effectivelyContext Setting and Domain Boundaries All interactions happen within a context that influences understandingWords acquire different meanings in different contexts (e.g., "save" in computing vs. banking)Clearer domain boundaries make information architecture design easierHardest systems to design: those serving many purposes for diverse audiencesConceptual Modeling (Underrated Practice) Should precede UI sketching but often skipped by designersDefines concepts needed in the system and their relationshipsCreates more cohesive and coherent systems, especially for complex projectsMore valuable than sitemaps, which imply rigid hierarchiesLLMs and Information Architecture Current and Future Applications Transforming search experiences (e.g., Perplexity providing answers vs. link lists)Improving intent parsing in traditional searchHelping information architects with content analysis and navigation structure designEnabling faster, better analysis of large content repositoriesImplementation Advice For Engineers and Designers Designers should understand how systems are built (materials of construction)Engineers benefit from understanding user perspectives and mental modelsBoth disciplines have much to teach each otherFor Complex Applications Map conceptual models before writing codeTest naming with real usersImplement progressive disclosure with good defaultsRemember: "If the user can't find it, it doesn't exist"Notable Quotes: "People only understand things relative to things they already understand." - Richard Saul Wurman "The hardest systems to design are the ones that are meant to do a lot of things for a lot of different people." - Jorge Arango "Very few things are intuitive. There's a long running joke in the industry that the only intuitive interface for humans is the nipple. Everything else is learned." - Jorge Arango Jorge Arango LinkedInWebsiteX (Twitter)Nicolay Gerold: ⁠LinkedIn⁠⁠X (Twitter)

57분

모두 보기(58개)

예고편

Season 2 Trailer: Mastering Search

Today we are launching the season 2 of How AI Is Built. The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2. This season will be all about search. We are trying to make it better, more actionable, and more in-depth. The goal is that at the end of this season, you have a full-fleshed course on search in podcast form, which mini-courses on specific elements like RAG. We will be talking to experts from information retrieval, information architecture, recommendation systems, and RAG; from academia and industry. Fields that do not really talk to each other. We will try to unify and transfer the knowledge and give you a full tour of search, so you can build your next search application or feature with confidence. We will be talking to Charlie Hull on how to systematically improve search systems, with Nils Reimers on the fundamental flaws of embeddings and how to fix them, with Daniel Tunkelang on how to actually understand the queries of the user, and many more. We will try to bridge the gaps. How to use decades of research and practice in iteratively improving traditional search and apply it to RAG. How to take new methods from recommendation systems and vector databases and bring it into traditional search systems. How to use all of the different methods as search signals and combine them to deliver the results your user actually wants. We will be using two types of episodes: Traditional deep dives, like we have done them so far. Each one will dive into one specific topic within search interviewing an expert on that topic.Supplementary episodes, which answer one additional question; often either complementary or precursory knowledge for the episode, which we did not get to in the deep dive.We will be starting with episodes next week, looking at the first, last, and overarching action in search: understanding user intent and understanding the queries with Daniel Tunkelang. I am really excited to kick this off. I would love to hear from you: What would you love to learn in this season?What guest should I have on?What topics should I make a deep dive on (try to be specific)?Yeah, let me know in the comments or just slide into my DMs on Twitter or LinkedIn. I am looking forward to hearing from you guys. I want to try to be more interactive. So anytime you encounter anything unclear or any question pops up in one of the episode, give me a shout and I will try to answer it to you and to everyone. Enough of me rambling. Let’s kick this off. I will see you next Thursday, when we start with query understanding. Shoot me a message and stay up to date: ⁠LinkedIn⁠⁠X (Twitter)

시즌 2, 에피소드 1

•

4분

최고 5점

6개의 평가

Comprehensive course on modern information retrieval

2024. 12. 01.

Dgreen74

Informative interviews with a range of industry experts in Information Retrieval (IR)
Great Pod

2024. 10. 05.

Mikoo231

Worth a listen if you are in the AI space.

제작진

Nicolay Gerold
방송 연도

2024년 - 2025년
에피소드

58
등급

전체 연령 사용가
웹사이트 보기

How AI Is Built

과학 기술

과학 기술

매주 업데이트
과학 기술

과학 기술

매주 업데이트
과학 기술

과학 기술

매주 업데이트
과학 기술

과학 기술

매주 업데이트
과학 기술

과학 기술

매주 업데이트
과학 기술

과학 기술

격주 업데이트
과학 기술

과학 기술

매주 업데이트

How AI Is Built

#051 Build systems that can be debugged at 4am by tired humans with no context

#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

#049 BAML: The Programming Language That Turns LLMs into Predictable Functions

#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions

#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read

#048 Why Your AI Agents Need Permission to Act, Not Just Read

#047 Architecting Information for Search, Humans, and Artificial Intelligence

예고편

Season 2 Trailer: Mastering Search

Comprehensive course on modern information retrieval

Great Pod

소개

정보

좋아할 만한 다른 항목

How AI Is Built

에피소드

#051 Build systems that can be debugged at 4am by tired humans with no context

#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

#049 BAML: The Programming Language That Turns LLMs into Predictable Functions

#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions

#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read

#048 Why Your AI Agents Need Permission to Act, Not Just Read

#047 Architecting Information for Search, Humans, and Artificial Intelligence

예고편

Season 2 Trailer: Mastering Search

평가 및 리뷰

Comprehensive course on modern information retrieval

Great Pod

소개

정보

좋아할 만한 다른 항목