LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 2 hrs ago

    “The worthlessness of vitamin D is mildly exaggerated” by dynomight

    For a while there, many people thought vitamin D was magical—that it could improve bones, the heart, infections, cancer, heart disease, longevity, even mental health. But among people I respect, opinion is now overwhelmingly that taking vitamin D does nothing unless you're severely deficient. The central argument is that while vitamin D levels are correlated with ~all positive health outcomes, when you actually test vitamin D supplements against placebo in randomized trials, nothing ever happens. That's what I used to think, too. But I've come to think the skeptics have over-corrected. Yes, randomized trials have shown the magical correlations are not causal. But if you start with non-insane expectations, the trials look like weak but positive evidence. And if you consider what we know about biology and evolution, I think the balance of evidence tips pretty clearly in the direction that people with low-ish levels would be wise to supplement. Am I certain that vitamin D is beneficial for people with low-ish levels? Absolutely not! But I claim that's the best bet given the limits of our knowledge. The classical view: Boring bone vitamin Most vitamins are "ingredients" that the body uses to do stuff. Vitamin D is more [...] --- Outline: (01:19) The classical view: Boring bone vitamin (04:28) The correlation view: Magical mystery cure (07:58) Meanwhile in biology (11:10) Then came the RCTs (15:12) I made some tables (16:32) Squinting at the data (22:24) Where are we? (23:15) The case for supplementing anyway (23:19) It's biologically plausible that vitamin D is good (24:40) Humans evolved to have a lot of vitamin D (27:14) What do you expect from vitamin D? (29:22) What do you expect from vitamin D trials? (31:11) The trials do find slightly helpful numbers (32:50) You're probably already taking vitamin D (34:34) So that's my story The original text contained 35 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/sF5gAxnmifQe2TBNt/the-worthlessness-of-vitamin-d-is-mildly-exaggerated --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    36 min
  2. 6 hrs ago

    “A system overview for near-term, low-trust AI compute verification” by Naci Cankaya

    Version 0.2, working draft This is a working draft of my current best idea for a privacy-preserving, retrofittable AI compute verification system, for confidence-building in an arms-control-like AI agreement between rival nation states. The purpose of this draft is to elicit community engagement by making use of Cunningham's law: I make assertions about what the (emerging) field of AI verification should aim for, and people with experience in international policy, cybersecurity and any relevant field of engineering can point out what this draft gets wrong. Thank you to everyone who has provided feedback to version 0.1, especially Aaron Scher, Mauricio Baker and Jonathan Ng. 1. Introduction and summary In order to plan and execute under tight timelines, one needs to make some strategic bets, instead of hedging too much and keeping all options open. The field of research on AI verification is bottlenecked partly by a lack of shared vision (as well as human capital, but having clear goals helps hiring and fundraising). With this post, I aim to: Make technical objectives for verification in high-stakes AI governance more specific and actionable (section 2).Contribute a first, high-level reference architecture for meeting these goals (section 3 and [...] --- Outline: (00:54) 1. Introduction and summary (06:31) 2. Problem statement and motivation (06:41) 2a. Low-trust AI governance (09:46) 2b. Threat model (11:09) Covert adversary and the inversion of the fortress problem (12:21) The attribution problem and plausible deniability (13:26) Assumptions about physical security and inspection (15:08) Discussion of attack surfaces (18:19) 2c. Practical requirements (23:05) 3. System overview and operation (23:10) 3.1. Brief introduction (27:14) 3.2. End-to-end execution trace (28:00) 3.2.1. Evidence capture (30:22) 3.2.2. Evidence evaluation (33:57) 4. Subsystem designs for eliminating the need for mutually trusted silicon (34:29) 4.1. Trust in silicon is hard (35:58) 4.2. Analog data movement control: passive splitters, data diodes, enclosures (37:52) 4.3. Building blocks for a mutually secure verification system (38:53) 4.3.1. Controlled ingress (40:02) 4.3.2. Output cross-checks (41:46) Prior work (43:01) 4.3.3. Sanitized egress (44:26) Prior work (45:19) 4.3.4. Instructor-executor (48:26) 5. Engineering approaches for evidence capture and evaluation (48:32) 5.1. Evidence generation, capture and commitment (50:29) 5.1.1. Network taps and active wardens (51:18) Prior work (54:03) Open research questions (55:55) 5.1.2. Memory challenging and memory wiping (58:19) Prior work (01:00:19) Open research questions (01:01:32) 5.2. Evidence evaluation and disclosure (01:01:37) 5.2.1. Secure auditing environments (tentative plan A) (01:04:20) Prior work (01:06:22) Open research questions (01:07:53) 5.2.2. Replay and the determinism challenge (01:10:10) Prior work (01:10:49) Open research questions (01:11:43) 5.2.3. Inspection software, inspector agents (01:12:38) Prior work (01:13:58) Open research questions (01:14:58) 5.2.4. Zero Knowledge Proofs (tentative plan B) (01:16:22) Prior work (01:18:55) Open research questions (01:20:14) 5.3. Support mechanisms (01:20:19) 5.3.1. Side-channel defense (01:20:51) Prior work (01:22:43) Open research questions (01:24:39) 5.3.2. Resource accounting (01:25:18) Prior work (01:25:30) Appeal to the reader (01:26:28) Appendices (01:26:32) A1. The statistics of random sampling The original text contained 23 footnotes which were omitted from this narration. --- First published: June 23rd, 2026 Source: https://www.lesswrong.com/posts/fgvmKqRGvBteKeDoc/a-system-overview-for-near-term-low-trust-ai-compute --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    1h 31m
  3. 21 hrs ago

    “Model Size Scaling in 2023-2031” by Vladimir_Nesov

    Token generation speed is constrained by the speed at which the relevant HBM can be read, which is mostly the weights and KV-cache. Suppose a model is large, so that more than half of HBM is read when making a single pass over the weights, it's being read in parallel within a scale-up system, and N such systems are used in a pipeline. Then the time it takes to generate a token (without speculative decoding) is at least the time of reading more than half of an HBM stack times N. If we target a particular speed of token generation, this puts a constraint on the number of pipeline stages, which puts a constraint on the total params of the model. But if there isn't enough pretraining compute, models will remain smaller than this constraint (lower sparsity at a given number of active params buys a higher speed of token generation), so both should be taken into account. Working through these considerations gives model sizes feasible for each year between 2023 and 2031. The total params go from 10T in 2026 (at 8x sparsity, still constrained by Oberon racks, trained for 1.3e27 FLOPs) to 240T in 2028 (at [...] --- Outline: (01:57) Time to Fully Read an HBM Stack (04:15) Maximal Pipelines Below 80 Tokens/s (09:07) Pretraining Compute (14:11) Active Params from Pretraining Compute (22:51) Starting in 2028, the Constraint is Pretraining Compute The original text contained 5 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/yLHiQGCPdvzL9fBn3/model-size-scaling-in-2023-2031 --- Narrated by TYPE III AUDIO.

    24 min
  4. 1d ago

    “GLM-5.2 Is The New Best Open Model” by Zvi

    GLM-5.2 arrived last week. It boasts excellent benchmarks and looks strong. Benchmarks here are a de facto ceiling of how good it is, not a point estimate. Essentially all other aspects of an open model like this, beyond speed and price, will almost always be worse than the numbers suggest. Still, impressive. It is definitely a large step up from GLM-5.1, and likely the strongest open model. GLM-5.2 is still substantially behind the absolute frontier, although plausibly on the cost-benefit Pareto frontier. It seems closer to the frontier than previous efforts, including probably closer than DeepSeek R1 was during the DeepSeek moment. This is the new ‘peak close behind’ moment. Its existence is a substantial updates to push back some of the ‘where are all the updates’ updates in the opposite direction over time. Purely in terms of core tasks that GLM-5.2 is capable of doing, and ignoring missing features and its inferior generalization, and ignoring that it is distilled from Claude, and ignoring the Mythos class of models, and marking purely from date of public release, you can make a case GLM-5.2 is somewhere between 4 months and 7 months behind the frontier [...] --- Outline: (02:01) Alex Bores For Congress In NY-12 (03:41) Signs of Life (05:05) The Benchmarks (09:02) GLM-5.2 Is Distilled From Claude (09:55) Positive Responses (16:00) Finding The Niche (17:30) Negative Reactions (20:05) Looking To The Future --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/reXkwJbB8GYdeuvDt/glm-5-2-is-the-new-best-open-model --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    22 min
  5. 1d ago

    “The AI Industrial Explosion — Part 4: Cheap power” by djbinder

    In Parts 1, 2, and 3 we estimated how fast a post-AGI economy could grow using existing or historically observed production techniques, grounded in US input-output data. That approach gave us confidence that the methods we assumed were physically realizable, because they correspond to manufacturing processes that have run at scale in the past or run today. Now I would like to relax that constraint, and ask how much faster the economy could grow using more advanced technology. In this part, we will consider energy production. Every physical process consumes energy, and it takes energy to build the production and distribution system that generates energy. The energy payback time (EPBT) for a system is defined as the time required for it to produce the amount of energy required to build it. If, for example, a solar panel requires 100 MJ to construct from raw materials and produces 10 MJ per day, the energy payback time would be ten days. Any economy relying on such panels for energy production has a maximum growth rate that is bounded by this payback time: Even if everything else is essentially free, the solar panels could not reproduce faster than this, and this [...] --- Outline: (02:27) With current technology, energy production is the most tightly bottlenecked sector (06:08) Making materials requires energy (09:44) A minimal solar power system could pay back its embodied energy in days (15:17) As the payback time shrinks, supply-chain lags become the binding constraint (18:44) Even counting the factories, energy production does not preclude doubling in weeks (23:41) Discussion (26:02) Appendix H: Mathematical foundations (26:08) H.1 Payback time and the growth bound (28:08) H.2 Inputs committed at distributed times (29:07) Appendix I: The solar electricity system (29:13) 1.1. System design (29:49) 1.1.1. Panel (34:42) 1.1.2. Collection cabling (36:01) 1.1.3. Power conversion (37:46) 1.1.4. Transmission (40:18) 1.1.5. Storage (43:22) 1.1.6. Transport and field deployment (45:03) 1.2. Lag inventory (46:16) 1.3. Physical capital requirements --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/6fgfn72zoRDomgvrT/the-ai-industrial-explosion-part-4-cheap-power --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    51 min
  6. 1d ago

    “A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart

    Summary We've been building a theory of how prompt injections work under the hood.We show it comes down to how LLMs perceive roles (the humble chat template tags).We use this theory to create new attacks, explain some weird mech interp results, and predict when attacks work.We also advocate for a new subfield focused on the science of roles, and sketch some unexplored new research problems.Work supported by CBAI and Cosmos. Another version of this post (with more inline colors) is here, and full ICML paper here. 1. The World to an LLM How does an LLM know the difference between its own thoughts and someone else's words? To see why this is hard, let's look at what the world actually looks like to a model. Here's a simple chat where we ask Claude to check the day of the week. I took a snapshot of it midway through its follow-up response: Left = what we see; right = what the LLM gets. On the left is what we see in the chat interface: a structured conversation with distinct turns. On the right is what the model actually receives as input: a single, continuous stream [...] --- Outline: (00:12) Summary (00:54) 1. The World to an LLM (02:35) 2. Roles (05:03) 3. Roles and prompt injection (06:35) Two ways to defend injections (08:14) 4. What's going wrong with roles? (13:28) 5. Spoofing Thoughts (15:59) 6. Prompt Injection as Role Confusion (20:57) 7. Why Roles Matter (21:01) A brief history of roles (22:23) A general theory of roles (24:54) 8. Open Ideas for Roles Research (25:12) Subconscious steering (27:06) When to use roles (28:42) Roles as a cognitive window (30:38) Conclusion The original text contained 27 footnotes which were omitted from this narration. --- First published: June 22nd, 2026 Source: https://www.lesswrong.com/posts/d8xDGzCEYE639qqEv/a-theory-of-prompt-injection-and-why-you-should-study-roles --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    32 min
  7. 1d ago

    “Coup is the Pareto-optimal social game” by Daniel Tan

    I've been playing Coup for a long time now. I keep a copy in my backpack and bring it everywhere, and it's earned the space. A few reasons it's so good: It's trivial to teach. You can explain the rules in a minute or two. Anyone can pick it up and start playing immediately.Many people find it fun. Almost everyone I've played with has loved it — pretty much unanimously, not least because it involves a lot of bluffing and emergent social / political dynamics It scales. Two players works; so does six or seven. (I'm even considering a second set so I can run bigger groups.) I think everyone should strongly consider owning a copy! The rules, briefly Coup is a bluffing game. There are mechanics, but bluffing is the heart of it. Mechanics. Everyone holds hidden cards. Your cards give you powers and also are your lives. Each player tries to gain resources and eliminate other players over the course of the game, and the last one standing wins. Bluffing. The key move is that you can claim to be a character you don't actually have, which lets you take powerful actions. But it's a [...] --- Outline: (01:07) The rules, briefly (02:10) Where it falls short, and a fix I'm exploring (02:55) Bottom line --- First published: June 21st, 2026 Source: https://www.lesswrong.com/posts/Ho7JeKFzhwGXxgjTW/coup-is-the-pareto-optimal-social-game --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    3 min
  8. 1d ago

    “A brief list of ways AI safety efforts could be net negative” by Elias Schmied

    Here's Holden Karnofsky: I tend to think it's worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one's head, the more it's going to be like 50+ε%. I think AI safety is a great cause to work in. I’m excited to work in it. I think it's high impact. I am doing my best to do things that I will be proud to have done and hope for the best. But I really do have to live with the possibility that my ultimate impact on the utilons or whatever is going to be negative. I’m not aware of a good list of downside risks for AI safety broadly[1], so I decided to make one. This is not intended to be fully comprehensive, these are just the ones that I personally take seriously[2][3]: AI governance interventions are obviously high-variance: bad regulation can easily make things worse, many interventions could increase the risk of great power conflict, increased political polarization around AI could be really bad, more centralization of power increases authoritarianism [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: June 19th, 2026 Source: https://www.lesswrong.com/posts/sAfMCpWLfkHqF5Gix/a-brief-list-of-ways-ai-safety-efforts-could-be-net-negative --- Narrated by TYPE III AUDIO.

    4 min

About

Audio narrations of LessWrong posts.

You Might Also Like