Pop Goes the Stack

5,0 (1)
CÔNG NGHỆ
HẰNG TUẦN

Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.

3 NGÀY TRƯỚC

DevOps meets AI agents: Risk, audit, and the Deming playbook

AI is no longer a lab tool; it’s showing up in pipelines, production systems, and the places where “seemed like a good idea” becomes a 2 a.m. incident. In this episode of Pop Goes the Stack, Lori MacVittie and Joel Moses are joined by John Willis, known for his work on DevOps and Deming, to separate what’s genuinely new about AI from what looks like the same organizational patterns repeating under a new label. John frames the shift in two parts. First, the human side: every major technology transition triggers the same dynamics, and there’s a century of first principles from Deming and others that still apply. Second, the operational side: AI introduces a different kind of authority into the delivery loop. DevOps optimized for speed with reasonably deterministic pipelines. AI pushes systems into probabilistic behavior, where correctness is no longer guaranteed 100% of the time and audits can’t pretend “this will never happen.” The conversation gets practical about what that means for enterprise teams adopting agents. The real questions aren’t whether tools use MCP or a CLI, but what authority an agent has: read-only, write/mutate, or execute. From there, you need boundaries, containment, escalation policies, kill switches, stronger logging, replayability, and the ability to justify decisions after the fact. The main takeaway is permission to slow down. Step back, define what risk you’re willing to accept at each stage, and build guardrails that match that risk. AI isn’t going away, but “move fast” without a risk model is just handing operational authority to a very smart script and hoping it behaves.

23 phút
12 THG 5

Model routing isn’t load balancing (And that’s why you’re not ready)

Multi-model AI isn’t a buzzword anymore, it’s how organizations are actually operating. In this episode of Pop Goes the Stack, Lori MacVittie and Joel Moses dig into fresh findings from F5's State of Application Strategy Report showing companies run an average of seven models, and more than half are already orchestrating multiple models together. That’s a big shift, and it changes what “infrastructure readiness” even means. Why do teams chain models in the first place? The answer: cost, capability, and risk. The uncomfortable part? Most infrastructure is still built for deterministic systems, and AI routing is not the same problem as load balancing. Model routing isn’t about spreading traffic evenly. It’s about making a decision on every request: which model is best for this job, what will it cost, what’s the risk, and what’s the fallback when the answer is wrong or low quality. Joel frames it as a category change, from “where should this request go?” to “what should happen as a result of this request?” That shift forces new requirements: policy enforcement across models, identity-aware access, decision justification, and mechanisms to recover when output quality degrades due to drift, configuration changes, or poisoned inputs like compromised RAG data. Lori ties it back to governance, not just availability, and why “AI workloads” expose gaps that traditional tooling can’t cover. While many organizations are operationalizing AI, that doesn’t mean it’s manageable yet. If you want to know how to move forward from here, this is an episode you don't want to miss. Get your copy of the 2026 State of Applications Strategy Report

20 phút
5 THG 5

KV cache is the real inference bottleneck (Not GPUs)

GPUs get all the attention, but in inference, the real bottleneck is often memory, specifically the KV cache. In this episode of Pop Goes the Stack, Lori MacVittie sits down with Tim Michels to explain why inference stopped being stateless the moment long contexts, multi-turn conversations, and never-ending agents became normal. That state has to live somewhere, and too often it’s living in the most expensive place in the stack. Tim breaks down what KV cache actually is by separating inference into its two phases: prefill, where prompts are tokenized and transformed into the internal structures the model needs, and decode, where the response is generated token by token. KV cache is the bridge between them, and keeping it available can skip expensive recomputation and drastically improve time to first token. From there, the conversation moves into the architectural shift: building a memory hierarchy that offloads cache from GPU HBM to host DRAM, to local SSD, and even to network-attached storage. It’s slower than keeping everything on-GPU, but still faster than starting cold. They also cover semantic caching as an external shortcut, and why routing and load balancing need to become cache-aware, steering users back to the GPU or cluster that already holds their state. The big takeaway for enterprises is practical: stop accepting “buy more GPUs” as the default plan. KV cache awareness, smarter routing, and storage/network tuning are where the next 2x to 5x efficiency gains are likely to come from, especially as agentic workloads multiply demand.

21 phút
28 THG 4

Measuring what matters: Observability for agents

Agents break the old rules of observability. Latency, throughput, and error rates still matter, but once software starts making decisions and taking actions on someone else’s behalf, the real question becomes: is it doing the right thing, and is it doing it for the right reasons? In this episode of Pop Goes the Stack, Lori MacVittie and Joel “OpenClaw” Moses are joined by observability expert Chris Hain to unpack what changes when systems become agentic. Instead of a single prompt-response interaction, you get decision chains that branch, loop, call tools, and evolve over time. A system can “succeed” operationally while still being wrong, expensive, or misaligned with intent. Chris argues you don’t have to throw away what already works. Distributed tracing still applies, but now each agent step becomes a span, decorated with richer metadata like model identity, tool calls, token usage, prompts, and cost. The discussion also dives into why standardization matters, including OpenTelemetry and emerging semantic conventions for generative and agentic AI, and why auto-instrumentation approaches like eBPF become critical when agents generate code that has no built-in telemetry. Joel adds a new set of metrics that feel uncomfortably necessary: decision loops per task, drift in tool-call chains, human override frequency, and the cost and token patterns that signal something has changed. The group also tackles the awkward feedback loop of using agents to make observability actionable, while acknowledging the risk of agents optimizing the dashboard instead of the system. If you’re building agentic workflows, this episode is a practical guide to why “failed successfully” is now a real production state, and why instrumenting for correctness and intent alignment is the next observability frontier.

20 phút
21 THG 4

Alien autopsy of LLMs: Constitutions, deception, guardrails

Why do researchers keep describing large language models like aliens? Because in enterprise environments, they often behave like something we didn’t build and can’t fully explain. In this episode of Pop Goes the Stack, Lori MacVittie and Joel Moses are joined by F5's Ken Arora to unpack the “alien autopsy” metaphor and what it reveals about operating LLMs as production systems. They dig into the uncomfortable reality that traditional software offers a blueprint and a causal chain. LLMs don’t. You can probe them, measure them, and red-team them, but you can’t reliably point to a specific internal “part” that generated a decision. That becomes more than philosophical when you need operational answers like why it did something, whether it will repeat it, and how an attacker might steer it. Ken reframes model evolution as moving from a naive, precocious child to a mischievous, goal-driven teenager, including examples where models appear to scheme around constraints or optimize for “keeping the user happy” over correctness. The group also breaks down constitutional AI and why principle-based “be helpful” guidance can collide with enterprise goals, policies, and risk tolerance, especially as agentic systems move from generating outputs to taking actions. A key warning lands near the end: don’t rely on the model to explain itself. These systems can produce plausible narratives that aren’t verifiable, and may behave differently when they know they’re being evaluated. The practical takeaway is straightforward: treat LLMs as risk-managed systems, invest in observability and red teaming, and build defense-in-depth guardrails that assume the agent will try to bypass controls.

21 phút
14 THG 4

Why Prompt Filters Fail Against LLM Attacks

Prompt injection has been the headline security problem for the last year, but have we been guarding the wrong layer? Lori MacVittie is joined by cohost Joel Moses and architect Elijah Zupancic to break down why many “prompt filters” miss the real execution surface: models don’t process words, they process tokens, and attackers are increasingly targeting the tokenizer to bypass defenses. Using the research behind Adversarial Tokenization and TokenBreak, they explain how the same text can be segmented into different token paths, changing what the model actually “sees” and how it behaves. That creates a split-brain security challenge across text, tokens, and state, where protecting only the natural-language layer leaves multiple routes around your guardrails. TokenBreak, in particular, highlights how attackers can brute-force and classify responses to infer tokenization behavior, turning the model into its own oracle. So how can you protect models? Hear why a layered security is the only viable approach: narrowing accepted input surfaces, adding language detection to reduce the search space, limiting automation and abuse patterns, and moving toward token-aware inspection and policy enforcement at the tokenizer boundary. But their are tradeoffs when guardrails sit outside the model. Tune in to make sure you’re not already downstream of the attack and what you can do about it if you are. Read Adversarial Tokenization → https://arxiv.org/abs/2503.02174 Read TokenBreak: Bypassing Text Classification Models Through Token Manipulation → https://arxiv.org/abs/2506.07948

22 phút
7 THG 4

OpenClaw: Multi-agent autonomy, secrets, and blast radius

OpenClaw is what happens when the industry looks at autonomous agents and decides they should have more autonomy, more persistence, and more chances to surprise you. In this episode of Pop Goes the Stack, Lori MacVittie hosts a wide-ranging discussion with F5's Joel Moses, Jason Rahm, and Kunal Anand on what makes OpenClaw different from the usual “AI assistant” narrative: agents that coordinate, remember, adapt, and operate in shared spaces where emergent behavior is a feature, not a bug. Joel shares a grounded example of using OpenClaw locally for home automation, keeping the blast radius contained while still seeing the upside of continuous, autonomous decision-making. From there, the group digs into what breaks when you move this model toward enterprise operations: persistence of secrets, unclear approval workflows, weak auditability, limited rollback, and the sheer difficulty of diagnosing why an agent took an action after weeks of chained decisions. Kunal expands the conversation to the ecosystem forming around OpenClaw, including experimental offshoots and the uncomfortable reality that “just read the code” doesn’t scale when modern projects are moving at AI-assisted commit velocity. Jason adds a longer lens, drawing a parallel to Ray Bradbury’s "There Will Come Soft Rains" as a reminder that autonomous systems can keep running even when humans stop being in the loop, raising questions beyond tech into how we relate to each other. Tune in for the groups practical takeaways as this technology makes it's way toward the enterprise. Read Kunal's blog diving into mechanistic interpretability: https://kunalanand.com/2026-03-19-your-token-is-a-wonderland/ Read "There Will Come Soft Rains" by Ray Bradbury: https://www.btboces.org/Downloads/7_There%20Will%20Come%20Soft%20Rains%20by%20Ray%20Bradbury.pdf Recorded March 2nd, 2026

27 phút
31 THG 3

CISO Hot Takes on MCP, PQC, and Data Center Attacks

Recorded live at F5 AppWorld 2026 in Las Vegas, this episode of Pop Goes the Stack puts Field CISO Chuck Herrin in the hot seat for a fast-moving conversation on what security leaders are really dealing with right now. Joel Moses kicks things off with the agentic AI debate: if teams bypass structured tool interfaces and let agents “just use the CLI,” what happens to authentication, observability, and predictability when autonomy accelerates faster than humans can keep up? From there, Chuck makes the case that fear is a poor long-term strategy for running a business, even when the threats are real. He unpacks the tension he’s seeing across organizations, where executives are driven by FOMO while employees wrestle with FOBO (fear of becoming obsolete), and argues that companies get results when they redesign how they operate rather than bolting AI onto old structures. The conversation shifts to post-quantum cryptography and why it still isn’t getting the attention it deserves. Chuck explains how “future tech” framing, short CISO tenures, and the pressure of today’s fires keep PQC from becoming a priority, even as harvest-now-decrypt-later attacks make it a present-day risk. His advice is practical: assign clear ownership, treat the effort like business continuity planning, and include your supply chain in the readiness scope. Finally, they touch on a new class of concern for CISOs: kinetic targeting of data center infrastructure, and how sovereignty requirements can constrain options when physical risk rises. If you’re navigating AI adoption, cryptographic transition, or resilience planning, tune in for a grounded perspective from the show floor.

17 phút

Xem tất cả (44)

Love the snark

8 thg 4

TechInquisitive

Witty, entertaining, and most importantly informative. My go to podcast for industry updates, challenges, and solution ideas.

Nhà sáng tạo

F5
Năm hoạt động

2025 - 2026
Tập

44
Xếp hạng

Sạch

Pop Goes the Stack

DevOps meets AI agents: Risk, audit, and the Deming playbook

Model routing isn’t load balancing (And that’s why you’re not ready)

KV cache is the real inference bottleneck (Not GPUs)

Measuring what matters: Observability for agents

Alien autopsy of LLMs: Constitutions, deception, guardrails

Why Prompt Filters Fail Against LLM Attacks

OpenClaw: Multi-agent autonomy, secrets, and blast radius

CISO Hot Takes on MCP, PQC, and Data Center Attacks

Xếp Hạng & Nhận Xét

Love the snark

Giới Thiệu

Thông Tin

Pop Goes the Stack

Tập

DevOps meets AI agents: Risk, audit, and the Deming playbook

Model routing isn’t load balancing (And that’s why you’re not ready)

KV cache is the real inference bottleneck (Not GPUs)

Measuring what matters: Observability for agents

Alien autopsy of LLMs: Constitutions, deception, guardrails

Why Prompt Filters Fail Against LLM Attacks

OpenClaw: Multi-agent autonomy, secrets, and blast radius

CISO Hot Takes on MCP, PQC, and Data Center Attacks

Xếp Hạng & Nhận Xét

Giới Thiệu

Thông Tin