Rooted Layers

AI insights grounded on research

Rooted Layers is about AI insights grounded on research. I blog about AI research, agents, future of deep learning, and cybersecurity. Main publication at https://lambpetros.substack.com/ lambpetros.substack.com

  1. Jan 15

    The Transformer Attractor

    In 2023, Mamba promised to replace attention with elegant state-space math that scaled linearly with context. By 2024, the authors had rewritten the core algorithm to use matrix multiplications instead of scans. Their paper explains why: “We restrict the SSM structure to allow efficient computation via matrix multiplications on modern hardware accelerators.” The architecture changed to fit the hardware. The hardware did not budge. This is not a story about hardware determinism. It is a story about convergent evolution under economic pressure. Over the past decade, Transformers and GPU silicon co-evolved into a stable equilibrium—an attractor basin from which no alternative can escape without simultaneously clearing two reinforcing gates. The alternatives that survive do so by wearing the Transformer as a disguise: adopting its matrix-multiplication backbone even when their mathematical insight points elsewhere. The thesis: The next architectural breakthrough will not replace the Transformer. It will optimize within the Transformer’s computational constraints. Because those constraints are no longer just technical—they are economic, institutional, and structural. The Two-Gate Trap Every alternative architecture must pass through two reinforcing gates: Gate 1: Hardware Compatibility Can your architecture efficiently use NVIDIA’s Tensor Cores—the specialized matrix-multiply units that deliver 1,000 TFLOPS on an H100? If not, you pay a 10–100× compute tax. At frontier scale ($50–100M training runs), that tax is extinction. Gate 2: Institutional Backing Even if you clear Gate 1, you need a major lab to make it their strategic bet. Without that commitment, your architecture lacks large-scale validation, production tooling, ecosystem support, and the confidence signal needed for broader adoption. Why the trap is stable: These gates reinforce each other. Poor hardware compatibility makes institutional bets unattractive (too risky, too expensive). Lack of institutional backing means no investment in custom kernels or hardware optimization, keeping Gate 1 friction permanently high. At frontier scale, breaking out requires changing both simultaneously—a coordination problem no single actor can solve. The alternatives that survive do so by optimizing within the Transformer’s constraints rather than fighting them. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit lambpetros.substack.com

    25 min

About

Rooted Layers is about AI insights grounded on research. I blog about AI research, agents, future of deep learning, and cybersecurity. Main publication at https://lambpetros.substack.com/ lambpetros.substack.com