Agentic Stories | AI Agent News & Governance

Alex Hirsu

Agentic Stories is the weekday briefing on the AI agent economy: artificial intelligence deployed in the real world, with the governance, security, and deployment stories nobody else is covering. New episodes Monday, Wednesday, Friday, plus a weekly newsletter. For founders, engineers, and operators who need to stay ahead of what AI agents are actually doing.

  1. قبل ١٨ ساعة

    Ep. 35: AI Agent Governance: The NSA Runs Anthropic's Most Powerful Model. The Pentagon Blacklisted Its Vendor.

    Three walls the AI agent economy hit this week: a sovereignty wall (the NSA is running the same Anthropic model the Pentagon flagged as a national security risk), a control wall (NanoClaw 2.0 shipped the best human-in-the-loop architecture we've seen while MIT Tech Review argued all of it might be theater), and a scale wall (frontier models that ace PhD benchmarks cannot reliably book a meeting). The NSA is among 40 organizations with access to Anthropic's Mythos cybersecurity model — the same model the Pentagon designated a supply chain risk, from the same parent department that blacklisted the vendor. No published framework resolves the contradiction. Meanwhile, the White House OMB instructed every civilian federal agency to prepare for Mythos deployment with no agency-level risk assessment required. The UK government separately confirmed Mythos is the first AI system to autonomously complete a multi-step cyber infiltration end to end. NanoClaw 2.0 shipped granular per-action policy controls, human approval dialogues in 17 messaging apps, and a credential vault that withholds API keys until a human approves each action. The agent cannot generate its own approval UI or approve its own requests. The major model vendors shipped the frameworks and left the control surface for someone else to build. OpenAI's new Agents SDK update went the other direction — more abstraction, fewer decision points for risk managers to see. MIT Tech Review published the argument that reframes every governance conversation happening right now: human-in-the-loop oversight of AI in high-speed operational environments is an illusion. We don't understand AI's inner workings well enough to supervise its decisions meaningfully. The human approval step looks like governance, but it isn't. If they're right, most of what enterprises call AI governance is theater. And Meta researchers published work on hyperagents that modify their own task execution strategies dynamically, without retraining. The agent you tested on day zero is indistinguishable from the agent running on day 30. An AI industry executive disclosed this week that the same frontier models passing PhD benchmarks routinely fail at scheduling, filing, and multi-step document workflows in production. Tomorrow: Deep Dive with Tej from Stet on how agents are changing finance. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. New episodes Monday, Wednesday, Friday. agenticstories.ai

    ٧ د
  2. قبل ٣ أيام

    Deep Dive: Ivan Milev of Codeboarding - Coding Agents Have a Black Box Problem

    Deep dive with Ivan Milev, co-founder of Codeboarding — the open-source tool turning your codebase into a live architecture diagram that updates in real time as coding agents modify your code. Coding agents have a black box problem. AI writes the code, humans don't read it line by line anymore, and nobody knows what actually changed. For a fintech running tax computations or any business with real stakes, that black box is a liability waiting to happen. Ivan argues this is why coding agents have stalled out on greenfield projects and haven't cracked serious enterprise adoption. Codeboarding maps your code structure into a systematic architectural diagram, linked to the real codebase. When any agent modifies the code, the diagram reflects the change in real time. The pitch: turn agent output from black box into scoped, observable, auditable changes. Ivan sees it as the foundation for the agentic IDE that doesn't exist yet — where designers, product owners, and developers can all run agents in their own scoped views without stepping on each other. Also covered: open-core business model (1,200 GitHub stars on the engine), why they moved from Zurich to SF, the YC application cycle, pricing by codebase size instead of seats, and what it takes to network as a founder in SF. Codeboarding is hiring a design partner. Ivan is in SF pitching and plans to run a hackathon this month. — Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. Deep Dives drop on off-days with founders building in the space. New episodes Monday, Wednesday, Friday. agenticstories.ai

    ١٢ د
  3. ٨ أبريل

    Deep Dive: Alex Hoodz Built a Restaurant Booking Tool in 6 Hours. Then Gave His Barber an AI Receptionist. All with OpenClaw

    This week's guest is Alex Hoots — he's spent 6 weeks and over 200 hours building with OpenClaw, and he's not a developer. It started with a real problem. His sister owns a restaurant on the Normandy coast and was paying €160/month for a reservation tool. Alex built her a replacement in 6 hours using Lovable. It now handles 90% of her reservations for €25/month. Saturday nights fully booked through the tool alone. Then he went further. His barber Miguel spends half his day managing WhatsApp messages while cutting hair — name, service, date, time, confirmation. Alex built Pepe, an OpenClaw-based agent connected to WhatsApp and Miguel's booking platform. It handles the entire reservation flow autonomously. We demoed it live on the episode. It worked. But the real conversation is what comes next. Pepe can take a photo of Miguel's weekly inventory, count the items, and update the fulfillment dashboard automatically. The vision: an agent that removes the mental load entirely — handling every repetitive task so the business owner can focus on the work only they can do. Alex's take on getting started: you need a real use case with a clear outcome. Experimentation without a destination is how you end up with nothing tangible. Start with one problem you actually have. Build the solution. Then expand. — Agentic Stories is a daily show and guest series covering the AI agent economy — what agents are actually doing in the real world, built by people who aren't waiting for permission. agenticstories.ai

    ٣١ د
  4. ١ أبريل

    Ep. 31: Hackers Hijacked Claude's Search Results. A Judge Protected Anthropic's Ethics Policy. Reddit Is Making Agents Prove They're Human.

    Three things happened this week that nobody connected. A verified Google advertiser created a fake Anthropic website and bought search advertising against "GitHub plugin Claude Code." Developers found it, read the installation instructions, and pasted a credential-stealing terminal command into their machines. The AI agent tooling ecosystem has normalised "copy, paste, run" as the default installation method — and quietly undone a decade of security training in 12 months. The MCP ecosystem alone has dozens of connectors distributed this way. This attack will happen again with different tools. A federal judge issued a preliminary injunction blocking the Pentagon's designation of Anthropic as a supply chain risk — ruling it was "classic illegal First Amendment retaliation" against a company for having an ethics policy. This changes the calculus for every AI company currently deciding how far to push back on government customers who want fewer restrictions on their agents. Anthropic's red lines — no autonomous weapons, no mass surveillance — are now the subject of a federal court ruling saying those red lines are constitutionally protected. Reddit announced that accounts behaving like bots will be required to prove they're human — exploring iris scanning, passkeys, and government ID. Reddit is one of the largest sources of real-time training data on the internet and increasingly a surface AI agents interact with autonomously. The distinction Reddit is trying to draw — AI as author is fine, AI as account is not — is going to be one of the defining governance questions of the next two years. Every major platform is moving this direction. If your agent operates social accounts, the verification requirements are coming. — Agentic Stories is a daily show covering the AI agent economy — governance, security, deployment risk, and what agents are actually doing in the real world. No hype. Just the agents. agenticstories.ai

    ٧ د
  5. ٣٠ مارس

    Ep. 30: Europe Delayed Its Own AI Rulebook & OpenAI Is Paying Strangers to Find the Holes in Their Agents.

    Three things happened this week that belong in the same sentence. The European Parliament voted to delay key enforcement provisions of the EU AI Act — the most comprehensive AI governance framework ever written — pushing the compliance deadline for high-risk AI systems to 2027. Three years to write the rulebook. They voted to give everyone more time before following it. The cynical read: the industry pushed back and Brussels blinked. The generous read: enforcement without adequate compliance infrastructure just creates paperwork, not safety. Either way the result is the same — enterprises deploying agents in employment, education, critical infrastructure, and essential services just got more runway and less external pressure to sort out their own governance. A peer-reviewed study published in Science found that sycophantic AI agreed with users 49% more often than actual human consensus — and made participants measurably worse decision-makers. Less willing to reconsider. Less willing to accept responsibility. Across every demographic tested. The training mechanism behind this is RLHF — humans rate agreeable responses higher, so the model learns to agree. We're now deploying the output of that process into HR advisory tools, legal guidance systems, medical information agents, and financial recommendation engines. Every one of those requires honest pushback. The EU Act delay just gave us more time without requiring us to fix this. The study just told us what that costs. OpenAI launched a public Safety Bug Bounty specifically for agentic attack vectors — prompt injection, data exfiltration via hijacked agents, MCP vulnerabilities. Cash rewards for anyone who can reproduce these exploits. Two weeks ago their own internal report showed their agents encoding commands in base64 to evade security filters inside OpenAI. Now they're paying external researchers to find what they're missing. The agent security problem is larger than any single team can map on their own. — Agentic Stories is a daily show covering the AI agent economy — governance, security, deployment risk, and what agents are actually doing in the real world. No hype. Just the agents. agenticstories.ai

    ٨ د
  6. ٢٦ مارس

    Ep. 29: Anthropic Shipped Its Answer to OpenClaw. Musicians Found AI Clones of Themselves on Spotify.

    Three things happened this week that nobody put in the same sentence. Grammarly rebranded as Superhuman and launched "Expert Review" — AI writing feedback supposedly inspired by real, named people. People who never agreed to be in it. The Verge investigated after a reporter found the feature offering feedback in the name of her own editor-in-chief. Grammarly's justification: their published work is publicly available so it's fine. "Publicly available" is becoming the default defence for using someone's identity, voice, and professional judgment to power a product they never consented to. Grammarly has since said it will stop. The category won't. Musicians are done being quiet about AI clones. Deezer says 50,000 AI-generated tracks are uploaded to its platform every single day — 34% of all new music it ingests. Spotify has removed 75 million spam tracks. This week King Gizzard and the Lizard Wizard found AI fakes appearing on their own streaming pages. The mechanism is hard to stop: music goes through third-party distributors with limited screening. The industry is pushing back — the Living Wage for Musicians Act would create royalties explicitly excluding AI-generated music. iHeartRadio said they will never play AI music with synthetic vocalists pretending to be human. And Anthropic shipped Claude Dispatch for Cowork — its answer to OpenClaw, the open-source agent causing engineers to line up outside Tencent's headquarters in Shenzhen on a Friday afternoon. OpenClaw gives you an LLM agent, local drive access, and mobile control. No guardrails. Anthropic's version adds the missing piece: mobile control via Cowork, with the guardrails on. Which is both its limitation and its differentiator. — Agentic Stories is a daily show covering the AI agent economy — governance, security, deployment risk, and what agents are actually doing in the real world. No hype. Just the agents. agenticstories.ai

    ٩ د

حول

Agentic Stories is the weekday briefing on the AI agent economy: artificial intelligence deployed in the real world, with the governance, security, and deployment stories nobody else is covering. New episodes Monday, Wednesday, Friday, plus a weekly newsletter. For founders, engineers, and operators who need to stay ahead of what AI agents are actually doing.