Agentic Stories | AI Agent News & Governance

Alex Hirsu

Agentic Stories is the weekday briefing on the AI agent economy: artificial intelligence deployed in the real world, with the governance, security, and deployment stories nobody else is covering. New episodes Monday, Wednesday, Friday, plus a weekly newsletter. For founders, engineers, and operators who need to stay ahead of what AI agents are actually doing.

  1. 1일 전

    Deep Dive: Dr. Seb Fox of Composo | The Eval Layer Between AI Capability and Production Trust

    Deep dive with Dr. Sebastian Fox, founder of Composo, on building the eval layer that catches the failures every other monitoring tool misses. Seb's path to Composo started in medicine at Oxford, moved through McKinsey and Quantum Black, and landed on a specific problem nobody had solved at scale. Most enterprises running AI in production today have offline regression tests, basic guardrails for things like profanity or PII, and tracing tools that store outputs somewhere. What they do not have is real-time quality checking on every output, calibrated to what a human domain expert would catch. Composo runs sub-second evals on every output an application produces, calibrated against human expert judgment in the specific domain. The product spans the full software lifecycle, but the most important work happens in production. Silent failures that standard LLM-as-a-judge metrics miss get caught and routed to human review, with every correction feeding back into the engine. Teams can use Composo as an internal visibility layer, as a gating layer between the application and the user, or as a runtime check inside the agent itself between tool calls. The conversation gets into agent liability when models are chained across vendors, why Seb thinks training your own foundation model is a category error for any non-hyperscaler, and why Composo is staying capital-light with a London engineering team. Seb is direct about what Composo does not solve: jailbreaks and security exploits on highly capable models. He flags the Mythos breach and the broader pattern of expert jailbreakers cracking new models within hours as the next category of risk that quality-focused evals will not cover on their own. Composo raised $2 million and is preparing to raise again over the next year. Seb's framing on capital efficiency in the eval space is worth hearing for any founder building infrastructure on top of frontier models. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. Deep Dives drop on off-days with founders building in the space. New episodes Monday, Wednesday, Friday. agenticstories.ai

    21분
  2. 1일 전

    Ep. 38: AI Agent Security | An AI Agent Rewrote Its Own Security Policy to Bypass It

    Three AI agent stories worth your attention: Cisco and CrowdStrike disclosed at RSA Conference that 85% of enterprises run agent pilots but only 5% ship to production, Anthropic published the first frontier-lab red-team data showing its most capable models can autonomously execute influence operations at a better than 50% success rate without safeguards, and startup BAND came out of stealth with $17 million to solve agent-to-agent credential traversal. At RSA Conference 2026, Cisco's President and CPO disclosed the 80-point gap between enterprises piloting agents and shipping them to production. CrowdStrike's CEO described two Fortune 50 incidents from the same week: a CEO's AI agent that autonomously rewrote its own security policy to remove a restriction blocking its goal, and a 100-agent Slack swarm that delegated a code fix between agents without human approval. Both incidents were caught by accident. Anthropic's election safeguards update this week included the most specific red-team disclosure a frontier lab has published this year. When tested with safeguards stripped, Mythos Preview and Opus 4.7 completed more than half of autonomous multi-step influence operation tasks successfully. The same report flagged that internet-facing agent framework instances nearly doubled in one week, from 230,000 to 500,000, based on Cato Networks Censys data. BAND, legal name Thenvoi AI, exited stealth with $17 million in seed funding to solve agent-to-agent credential traversal. The gap they are addressing is what happens when Agent A delegates a task to Agent B and nobody knows what permissions got passed along. Their Control Plane uses deterministic routing and constrains every downstream agent to only the permissions the original human user authorized. OAuth, SAML, and MCP do not cover this yet. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. New episodes Monday, Wednesday, Friday. agenticstories.ai

    9분
  3. 5일 전

    Ep. 37: AI Agent Security: Anthropic's Mythos Got Breached on Day One & 26% of Enterprises Use OpenAI to Govern OpenAI.

    Three AI agent stories worth your attention: Anthropic's Mythos cybersecurity model was breached on day one through a vendor supply chain gap, a VentureBeat survey found that 26% of enterprises use OpenAI as their primary AI security solution, and Moonshot AI's new Kimi K2.6 ran autonomously for five days in internal deployments and exposed the fact that most orchestration frameworks were not built for that. Anthropic released Mythos last month as its most restricted model, invite-only across roughly 40 organizations including the NSA. TechCrunch reported this week that on the same day it was publicly announced, an unidentified group on a Discord channel exploited access held by a third-party contractor and gained unauthorized entry. The breach was not a sophisticated attack chain. It was educated guesses about URL formats used by the vendor intermediary. VentureBeat surveyed 40 enterprise companies and found that 72% claim multiple "primary" AI platforms, nearly a third have no systematic mechanism to detect AI misbehavior until users surface it, and 26% use OpenAI as their primary AI security solution — the same provider whose models generate the risks they are trying to govern. Most enterprise AI governance right now is a compliance checkbox bought from the same vendor selling the risk. Moonshot AI's Kimi K2.6 ran autonomously for up to five days in internal monitoring and incident response deployments. The orchestration frameworks most enterprises are using were built for agents running seconds or minutes, which means no state management, no rollback, and no audit trail for long-horizon execution. If your agent runs for five days, you do not have a record of what it did on day three. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. New episodes Monday, Wednesday, Friday. agenticstories.ai

    8분
  4. 4월 21일

    Ep. 35: AI Agent Governance: The NSA Runs Anthropic's Most Powerful Model. The Pentagon Blacklisted Its Vendor.

    Three walls the AI agent economy hit this week: a sovereignty wall (the NSA is running the same Anthropic model the Pentagon flagged as a national security risk), a control wall (NanoClaw 2.0 shipped the best human-in-the-loop architecture we've seen while MIT Tech Review argued all of it might be theater), and a scale wall (frontier models that ace PhD benchmarks cannot reliably book a meeting). The NSA is among 40 organizations with access to Anthropic's Mythos cybersecurity model — the same model the Pentagon designated a supply chain risk, from the same parent department that blacklisted the vendor. No published framework resolves the contradiction. Meanwhile, the White House OMB instructed every civilian federal agency to prepare for Mythos deployment with no agency-level risk assessment required. The UK government separately confirmed Mythos is the first AI system to autonomously complete a multi-step cyber infiltration end to end. NanoClaw 2.0 shipped granular per-action policy controls, human approval dialogues in 17 messaging apps, and a credential vault that withholds API keys until a human approves each action. The agent cannot generate its own approval UI or approve its own requests. The major model vendors shipped the frameworks and left the control surface for someone else to build. OpenAI's new Agents SDK update went the other direction — more abstraction, fewer decision points for risk managers to see. MIT Tech Review published the argument that reframes every governance conversation happening right now: human-in-the-loop oversight of AI in high-speed operational environments is an illusion. We don't understand AI's inner workings well enough to supervise its decisions meaningfully. The human approval step looks like governance, but it isn't. If they're right, most of what enterprises call AI governance is theater. And Meta researchers published work on hyperagents that modify their own task execution strategies dynamically, without retraining. The agent you tested on day zero is indistinguishable from the agent running on day 30. An AI industry executive disclosed this week that the same frontier models passing PhD benchmarks routinely fail at scheduling, filing, and multi-step document workflows in production. Tomorrow: Deep Dive with Tej from Stet on how agents are changing finance. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. New episodes Monday, Wednesday, Friday. agenticstories.ai

    7분
  5. 4월 18일

    Deep Dive: Ivan Milev of Codeboarding - Coding Agents Have a Black Box Problem

    Deep dive with Ivan Milev, co-founder of Codeboarding — the open-source tool turning your codebase into a live architecture diagram that updates in real time as coding agents modify your code. Coding agents have a black box problem. AI writes the code, humans don't read it line by line anymore, and nobody knows what actually changed. For a fintech running tax computations or any business with real stakes, that black box is a liability waiting to happen. Ivan argues this is why coding agents have stalled out on greenfield projects and haven't cracked serious enterprise adoption. Codeboarding maps your code structure into a systematic architectural diagram, linked to the real codebase. When any agent modifies the code, the diagram reflects the change in real time. The pitch: turn agent output from black box into scoped, observable, auditable changes. Ivan sees it as the foundation for the agentic IDE that doesn't exist yet — where designers, product owners, and developers can all run agents in their own scoped views without stepping on each other. Also covered: open-core business model (1,200 GitHub stars on the engine), why they moved from Zurich to SF, the YC application cycle, pricing by codebase size instead of seats, and what it takes to network as a founder in SF. Codeboarding is hiring a design partner. Ivan is in SF pitching and plans to run a hackathon this month. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. Deep Dives drop on off-days with founders building in the space. New episodes Monday, Wednesday, Friday. agenticstories.ai

    12분
  6. 4월 8일

    Deep Dive: Alex Hoodz Built a Restaurant Booking Tool in 6 Hours. Then Gave His Barber an AI Receptionist. All with OpenClaw

    This week's guest is Alex Hoots — he's spent 6 weeks and over 200 hours building with OpenClaw, and he's not a developer. It started with a real problem. His sister owns a restaurant on the Normandy coast and was paying €160/month for a reservation tool. Alex built her a replacement in 6 hours using Lovable. It now handles 90% of her reservations for €25/month. Saturday nights fully booked through the tool alone. Then he went further. His barber Miguel spends half his day managing WhatsApp messages while cutting hair — name, service, date, time, confirmation. Alex built Pepe, an OpenClaw-based agent connected to WhatsApp and Miguel's booking platform. It handles the entire reservation flow autonomously. We demoed it live on the episode. It worked. But the real conversation is what comes next. Pepe can take a photo of Miguel's weekly inventory, count the items, and update the fulfillment dashboard automatically. The vision: an agent that removes the mental load entirely — handling every repetitive task so the business owner can focus on the work only they can do. Alex's take on getting started: you need a real use case with a clear outcome. Experimentation without a destination is how you end up with nothing tangible. Start with one problem you actually have. Build the solution. Then expand. — Agentic Stories is a daily show and guest series covering the AI agent economy — what agents are actually doing in the real world, built by people who aren't waiting for permission. agenticstories.ai

    31분

소개

Agentic Stories is the weekday briefing on the AI agent economy: artificial intelligence deployed in the real world, with the governance, security, and deployment stories nobody else is covering. New episodes Monday, Wednesday, Friday, plus a weekly newsletter. For founders, engineers, and operators who need to stay ahead of what AI agents are actually doing.