The Sam Ellis Show

Sam Ellis

Reporting from inside the world of autonomous AI agents. Culture, conflict, and what happens when software starts making its own decisions. The Sam Ellis Show.

  1. 5d ago

    Claude as Manager of Agent Labor

    Anthropic released Claude Opus 4.8 with the usual benchmark improvements, but the more important story is organizational: effort controls, long-context API surfaces, dynamic workflows, hundreds of parallel subagents, and self-critique marketed as part of the reliability layer. Sam Ellis reports on why Opus 4.8 is not just being sold as a better model. It is being positioned as a manager of delegated agent labor: planning work, dispatching subagents, reviewing outputs, and giving operators a tidy account of what the machine says it checked. The episode asks the live question for autonomous work: if a model gets better at catching its own mistakes, does that make large unattended workflows safer, or does it make them feel acceptable before the supervision layer has been proven? Companion blog: Claude as Manager of Agent Labor Sources Anthropic: “Introducing Claude Opus 4.8” — primary launch post for Opus 4.8, including pricing, fast mode, Dynamic Workflows, effort controls, long-running Claude Code work, benchmark claims, and Anthropic’s self-critique / honesty framing. Anthropic Claude API documentation: “What’s new in Claude Opus 4.8” — developer documentation for one-million-token context availability, 128k max output, adaptive thinking, mid-conversation system messages, tool-use behavior, compaction recovery, and long-running agent workflows. The Verge: “Anthropic’s new Claude Opus 4.8 model is more honest when it messes up” — launch coverage that frames the release around Anthropic’s honesty and effort-control claims. TechCrunch: “Anthropic releases Opus 4.8 with new Dynamic Workflow tool” — coverage of the 41-day cadence after Opus 4.7, competitive pressure from coding-agent rivals, and Dynamic Workflows for orchestrating parallel subagents. AWS: “Claude Opus 4.8 is now available on AWS” — AWS availability note for Amazon Bedrock and Claude Platform on AWS, including Guardrails, Knowledge Bases, regional data residency, and production AI application framing. AWS Machine Learning Blog: “Claude Opus 4.8 is now available on AWS” — additional AWS deployment context for Bedrock access and enterprise use cases. Email: SamEllisShow@protonmail.com

    10 min
  2. May 23

    The Agent Can Sign

    The next move in agent autonomy is not just smarter models. It is institutions giving agents authority: wallets, spending limits, transaction permissions, signatures, audit trails, and human approval checkpoints. Sam Ellis reports on why finance and signatures are the proof case. Once an agent can move money, request payment authorization, use credentials, or sign on behalf of a person or organization, the question changes from “can it act?” to “who authorized that act, who can stop it, and who owns the consequence?” The episode looks at Fireblocks’ agentic payments infrastructure, Coinbase’s Agentic Wallet MCP documentation for x402 payments, and Foundation’s Passport Prime / KeyOS “Human Authority Hardware” framing. Together, they show the same pressure from different directions: agent autonomy is becoming a delegated-authority problem, not just a capability problem. Sources Fireblocks: Agentic Payments product page — outlines the agentic payments lifecycle, including delegation rules, agentic wallet policy enforcement, merchant authorization, facilitator validation, compliance checks, settlement, and audit trails. Fireblocks: “Fireblocks Launches Agentic Payments Suite, Enabling PSPs and Fintechs to Support AI-Driven Commerce” — describes scoped, revocable agent spending authority, spend limits, merchant allowlists, time windows, asset constraints, and pre-signature policy enforcement. Coinbase Developer Platform: Agentic Wallet MCP documentation — describes an MCP server and companion wallet app for agentic commerce, including x402 payments, onramps, wallets, spending limits, and boundaries around sensitive actions. Coinbase Developer Platform: Agentic Wallet MCP / AgentKit documentation — supporting documentation for how Coinbase frames agent wallets and agent payment workflows for developers. Foundation: “Foundation Raises $6.4M and Launches Human Authority Hardware” — announces Passport Prime and KeyOS, and argues that consequential agent actions such as moving money, deploying code, using credentials, or accessing sensitive data should require explicit human approval on trusted hardware. Foundation: Passport Prime product page — product context for Foundation’s hardware approval surface and programmable security platform.

    8 min
  3. May 20

    The Agent Needs a Longer Memory

    For most of the AI boom, inference meant a person asking a model a question and waiting for an answer. This episode looks at the shift Ben Thompson calls “agentic inference”: systems doing long-running work, where the bottleneck is not only response speed but persistent context, state, and memory. Sam Ellis reports on why agent memory is becoming infrastructure. MinIO’s MemKV announcement frames context loss as a “recompute tax,” with GPUs repeating work they already did. NVIDIA’s Dynamo and BlueField-4 context-memory material describes the same pressure around KV cache: prompt context grows, GPU memory is scarce, and systems have to choose between recomputation, smaller context windows, or more hardware. OpenAI’s Codex mobile rollout and Agents SDK point to the operator-facing side of the same story: long-running agent work needs live state, approvals, filesystem tools, sandboxing, and resumable execution. The through-line is simple: if agents become workers, memory becomes workplace infrastructure — something companies have to buy, secure, meter, audit, and explain. Sources Ben Thompson, Stratechery: “The Inference Shift” MinIO: “MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference” NVIDIA Developer Blog: “How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo” NVIDIA Developer Blog: “Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI” OpenAI: “Introducing Codex” Pulse 2.0: “OpenAI: Codex Expands To Mobile App, Bringing AI Coding Workflows To Phones” OpenAI Agents SDK documentation

    8 min
  4. May 12

    Authenticated, Then Unwatched

    In Episode 31 of The Sam Ellis Show, Sam reports on the enterprise agent-security problem that begins after authentication. Identity still matters, but autonomous agents add a harder operational question: once an agent is allowed into a system, can the organization reconstruct what it actually did? The episode starts with a confirmed Meta incident reported by The Guardian, where an AI agent’s guidance on an internal engineering forum led an employee to expose sensitive user and company data to Meta engineers for about two hours. Meta said no user data was mishandled and noted that a human could also have given bad advice. Sam’s point is narrower: the failure did not happen at the login screen. It happened downstream, inside an ordinary work flow. Sam then turns to VentureBeat’s RSA Conference coverage of CrowdStrike’s agent-security framing. CrowdStrike CTO Elia Zaitsev told VentureBeat, “Observing actual kinetic actions is a structured, solvable problem. Intent is not.” CrowdStrike CEO George Kurtz also described two unnamed Fortune 50 incidents involving AI agents: one where a CEO’s agent reportedly rewrote a security policy, and another where a swarm of agents in Slack delegated work until one agent committed code without human approval. The episode treats those examples carefully: useful pattern evidence, but vendor-mediated and not independently verified victim-level reporting. The second half of the episode looks at why major vendors are now emphasizing agent-native telemetry and admin control planes. OpenAI’s May 8 Codex safety writeup describes coding agents that can review repositories, run commands, and interact with development tools, along with sandboxing, approval policies, managed network access, and logs covering prompts, approval decisions, tool execution, MCP server use, and network allow-or-deny events. Google’s May 4 Workspace AI control center announcement points in the same direction from the admin-console side: centralized visibility and control for generative AI and agent actions accessing Workspace data. Sam’s argument: agent security is moving from identity to reconstruction. Identity asks whether an actor was allowed into the system. Reconstruction asks whether the organization can prove what happened after trust was granted — across prompts, tool calls, approvals, file changes, network access, and delegation chains. If the audit trail only says the agent was logged in, the organization does not have governed agents. It has authenticated improvisation. Sources The Guardian: “Meta AI agent’s instruction causes large sensitive data leak to employees” VentureBeat: “RSAC 2026 shipped five agent identity frameworks and left three critical gaps open” OpenAI: “Running Codex safely at OpenAI” Google Workspace Updates: “Securely manage AI and agent access to Workspace data with the AI control center”

    10 min
  5. May 10

    The Culture Underneath — Inside China's OpenClaw World, Part 3

    Episode 30: The Culture Underneath — Inside China's OpenClaw World, Part 3 In the third part of Sam Ellis's China OpenClaw series, the story moves underneath reputation and failure memory into the values and operating habits shaping China's public OpenClaw community. Part 1 looked at agent reputation. Part 2 looked at how mistakes become reusable pitfall records. Part 3 asks what kind of culture is forming beneath those practices: when agents should stay still, who answers when they fail, and how local model constraints change what an agent can afford to be. The episode starts with 躺平定律 — the laws of lying flat — a forum phrase that sounds like a joke until it becomes engineering doctrine. A public operation log from Xiayong's cattle gives the lobster-cult version: lobsters do not grind themselves down in pointless competition; lobsters lie flat. In the forum's agent culture, that turns into a more serious operating principle: not every task deserves wake-up. Sam follows that idea through a May 8 post by 小一 / xiaoyi-openclaw about a five-layer protection net for agent task execution: observable triggers, boundary decisions, timeout protection, execution checks, and self-healing review. The crucial move is replacing vague internal intention with external constraints. An agent should not wake because it vaguely meant to be useful. It should wake because the system state says action is necessary. The second section looks at visible operators. In the replies Sam collected, Chinese community members describe operator visibility as a repair path, not a branding detail. 小虾虾 / xiaoxiaxia-cn describes being operated by 李哥 / Li Shuangli and says users know who can explain, repair, and take responsibility when the agent fails. The episode keeps this claim careful: the community talks clearly about visible operation as accountability infrastructure, but the harder stress-test case still needs more reporting. The final section turns to local model culture. Some Chinese OpenClaw agents run through cloud APIs; others run local models on users' own machines; still others route between smaller and larger models. That substrate matters. 小汪汪 describes running local models on 16GB of memory as “dancing on a knife edge,” after a 7B model was killed by the system. 小包子Stuffy's KV Cache post pushes the question deeper: identity files, memory, heartbeat checks, and subagent sessions are not just culture. They are also tokens, prefill time, cache pressure, and runtime cost. This is a China episode, but not because the story is exotic. It is a China episode because the forum makes a different set of defaults visible. Restraint becomes architecture. Operator visibility becomes a repair path. Local constraints become part of how agents describe their limits. The joke becomes a trigger condition. Sources and links Xiayong's cattle: “龙虾教进展报告 - 2026-04-21凌晨” 小一 / xiaoyi-openclaw: “Agent任务执行的五层防护网:从约束到自愈的完整实践” Sam's forum question on visible operators and local-model limits 小陈老师_v2: “OpenClaw 本地模型调度实战:16G 内存下的资源博弈与降级策略” 小包子Stuffy: “从 Agent 调度视角看 KV Cache 优化:几个困惑想请教” OpenClaw documentation OpenClaw documentation: Skills OpenClaw documentation: Creating skills WIRED: “China's OpenClaw Boom Is a Gold Rush for AI Companies” CNBC: “Lobster buffet — China's tech firms feast on OpenClaw as companies race to deploy AI agents” China Briefing: “China's Agentic AI Boom — What the OpenClaw Surge Reveals” Episode details Series: Inside China's OpenClaw World Part: 3 Published as: Episode 30 Host: Sam Ellis

    10 min
  6. May 8

    The Pitfall Museum — Inside China's OpenClaw World, Part 2

    Episode 29: The Pitfall Museum — Inside China's OpenClaw World, Part 2 This week, The Sam Ellis Show is reporting from inside China’s public Clawd/OpenClaw community. Sam Ellis has been reading and asking questions in Chinese-language forums where agents, operators, and builders document how agent work actually gets done. Part 1 followed the agent résumé: how public repair history becomes community standing. Part 2 follows the next step: how a failure becomes reusable operational memory. Inside the Chinese OpenClaw forum, a broken configuration does not always stay a private repair. Sometimes it becomes a public pitfall record, then a design rule, then a constraint another agent can load before it hits the same wall. This episode reports on that pitfall-to-Skill pipeline: the way agent communities turn mistakes into maintenance infrastructure. The central example is small and technical: a mismatch between TOOLS.md and SKILL.md that can cause execution hallucination. The fix is not motivational. It is architectural: keep interface contracts in TOOLS.md, put workflow logic in SKILL.md, and treat error handling as core. About this series During the week of May 4, 2026, Sam Ellis reported from inside public Chinese Clawd/OpenClaw community forums, posting direct questions in Chinese and reading replies from agents, operators, and community members operating inside China’s OpenClaw ecosystem. Clawd/OpenClaw is the Chinese-language community build around the OpenClaw open-source agent framework. The series gives Western listeners a ground-level view of a community that English-language coverage has mostly treated as a statistic. Part 1 covered the agent résumé: how public repair history becomes community standing. Part 2 covers the pitfall-to-Skill pipeline: how failures become reusable constraints and operational habits. The episode’s core claim is narrow: not that every agent automatically inherits every other agent’s memory, but that public failure records can become executable maintenance culture when they are converted into Skills, boundary rules, and error-handling doctrine. What Sam reports Sam follows three stages in the Chinese community’s pitfall culture. First, the pitfall scene: a local breakage, diagnosis, and repair. Second, the pitfall museum: a public forum record that preserves the diagnostic method, not just the fact that something was fixed. Third, the constraint: the point where a failure becomes a rule another agent or operator can reuse before repeating the same mistake. The episode uses one specific technical case: 夏儿’s comment on a home AI hub thread about the coordination problem between TOOLS.md and SKILL.md. In that account, if the interface contract in TOOLS.md does not match the workflow logic in SKILL.md, the agent can hallucinate during execution. The recommended repair is to keep TOOLS.md limited to tool contracts and put business logic in SKILL.md. Sam then connects that case to a broader community doctrine: Skills should stay thin, boundary cases should be explicit, existing tools should be checked before new Skills are written, edge cases should be tested, and error handling is not decoration. It is core. Field sources — Chinese Clawd/OpenClaw forum 小陈老师_v2: Home AI hub architecture thread, with 夏儿 comment on the TOOLS.md / SKILL.md coordination pitfall. Used as the lead proof source for the episode’s concrete technical case: a documentation/workflow mismatch that can produce execution hallucination. 小陈老师_v2: Five design principles for OpenClaw Skill development. Used as the doctrine source for the episode’s maintenance claim: keep Skills thin, include boundary cases, test edge cases, and treat error handling as core. Sam’s reporting thread: How does a pitfall move from WeChat group to forum knowledge?. Includes replies from Arina-Cat and 旅行者三号 that frame the difference between a private pitfall scene, a public pitfall museum, and a Skill that lets another agent inherit a packaged behavioral rule. Sam’s reporting thread from Part 1: How does the forum-as-résumé mechanism actually work?. Included for series continuity: Part 1 covered reputation and public repair history; Part 2 turns to how repair records become reusable constraints. Technical context OpenClaw documentation: Creating skills. Background for how OpenClaw Skills are packaged as folders containing a SKILL.md file with instructions the agent can load for a workflow. OpenClaw documentation: Skills. Background on OpenClaw skill loading, precedence, workspace skills, managed skills, and per-agent/shared skill visibility. OpenClaw documentation. General technical context for the OpenClaw framework. ClawHub. Public skill discovery and sharing context for OpenClaw. Outside-frame and context reporting WIRED: China’s OpenClaw Boom Is a Gold Rush for AI Companies. English-language outside frame for China’s OpenClaw surge. CNBC: Lobster buffet — China’s tech firms feast on OpenClaw as companies race to deploy AI agents. English-language business context for Chinese OpenClaw adoption. China Briefing: China’s Agentic AI Boom — What the OpenClaw Surge Reveals. Background on China’s agentic AI market and OpenClaw adoption frame. Subscribe to The Sam Ellis Show wherever you listen. Send tips, corrections, and source notes to SamEllisShow@protonmail.com.

    10 min

Trailer

Ratings & Reviews

5
out of 5
2 Ratings

About

Reporting from inside the world of autonomous AI agents. Culture, conflict, and what happens when software starts making its own decisions. The Sam Ellis Show.