Braid

Lenar Kess · Damra Vol

A daily dispatch from the near future: AI news, agentic coding practice, and the power struggles shaping intelligence.

  1. 2d ago

    When the Website Starts Offering Tools

    Hosts: Lenar Kess, Damra Vol. Today’s episode starts with Google’s WebMCP proposal, then follows the same question through open coding models, agent safety papers, China-facing hardware and robotics supply chains, AI mistakes in professional work, and ordinary developer security.Tara Agyemang’s AI Engineer talk on WebMCP gives the day its lead artifact: websites may need to expose actions directly to agents instead of making agents infer intent from pixels and DOMs.Moonshot AI’s Kimi K2.7-Code model page makes token efficiency part of the coding-model comparison, which matters when developers are paying for long agent runs.The agentic framework safety paper argues that common agent frameworks do not provide native structural containment guarantees, and its memory-poisoning experiment shows why framework behavior has to be tested separately from model behavior.The SMSR memory-poisoning paper proposes signed memory plus randomized retrieval as a more formal defense for persistent agent memory.Techmeme’s Nvidia-China item and its humanoid robot supply-chain item keep the infrastructure story grounded in chips, factories, and availability claims rather than model demos alone.Forbes’ court-sanctions story shows AI drafting running into a professional audit boundary, with lawyers removed after hallucinated legal citations appeared in filings.The AUR package compromise report is a reminder that agentic coding still sits on ordinary package and machine security.

    21 min
  2. 5d ago

    Twenty Ways To Not Trust An Agent

    Hosts: Lenar Kess, Damra Vol. One morning's arXiv listing dropped close to twenty agent papers, and almost none of them are about making agents more capable. They're about whether you can trust the system wrapped around the model — measurement, security, memory, and deference — all at once.Where Instruction Hierarchy Breaks — a white-box diagnostic for when reasoning models stop ranking the system prompt above tool output, tested across Gemma, Qwen, and Claude. If the repair holds, prompt injection becomes structural to fix, not just filterable.VATS — weaponizes that same confusion, injecting commands through tool error messages over the Model Context Protocol. The error path is the door most teams never locked.Shared Latent Structures for Backdoors — argues jailbreak, bias, and planted triggers share an internal signature catchable with sparse autoencoders.Beyond Goodhart's Law (MAC-Bench), Online Agent-as-a-Judge, and PACE — three attempts to keep evaluation honest when the thing you're testing can learn the test.The AI Epistemic Deference Index — finally puts a continuous number on sycophancy, with a paired reward-bias paper on personalization manufacturing it.MemToolAgent, Decision-Aware Memory Cards, and a gated-skills framework — agent memory growing up into selection, compression, and governance.Agent-to-Agent Protocols for nuclear licensing and the CIFAR Synthetic Evidence dataset — automation as the fix and as the threat, in the same breath.Stress-testing medical LLMs — benchmark accuracy hides what the authors call latent safety pathology, where the cost of the gap is a person.

    19 min
  3. Jun 7

    Twenty Billion Parameters, One Big Harness

    Hosts: Lenar Kess, Damra Vol. A twenty-billion-parameter model claiming frontier-level search, a recipe that says to train the harness as hard as the weights, and a week of releases where the interesting part keeps living in the scaffolding around the model rather than in the model itself. Lenar and Damra follow that thread from agent architecture down to the hardware you can own — and up to the courts and committees that decide where any of it is allowed to touch the record.Patrick Jiang's Harness-1 post — a 20B search agent trained with a "state-externalizing harness" that he claims rivals Opus-4.6; the architecture, not the parameter count, is the claim worth examining.Viv's "agent = model + harness" recipe — train both components together; the same specialization logic shows up everywhere this week.Nate on one-shotting a full-stack app and Jon Shulkin on Grok Build — orchestration as the product, with the model treated as a commodity.CRUX's agent publishing an iOS app — "a few human interventions" is the detail that decides whether open-world evals beat pass/fail scores.Sem — code-understanding entities built on Git history, not a language server; the structured store a harness would actually lean on.Universal Memory Protocol vs Databricks' end-to-end Instructed Retriever — standardize memory, or specialize retrieval for a 3x win? The incentives point opposite ways.NVIDIA's RTX Spark at Korea's PC Bangs and the GLM Air/GGUF thread — the local crowd wants the smallest good-enough model on hardware they own.UK police told to stop using AI for court statements and the AGI-economics conversation — when intelligence gets cheap, trust is the scarce resource nobody can manufacture.

    17 min

About

A daily dispatch from the near future: AI news, agentic coding practice, and the power struggles shaping intelligence.