LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 4小时前

    “why pollen allergies?” by bhauth

    Allergies are a big problem for a lot of people. If you're someone with pollen allergies, maybe you've wondered how people in the distant past dealt with them. After all, a thousand years ago people mostly worked outside all day, in areas where plants grow well. They had no air purifiers, no allergy medication, and no extra food for people who couldn't work when it was time to plant crops. The answer is, pollen allergies just weren't very common back then. There was a massive increase in their prevalence from about 1850 to 1950. Here's a paper noting that. the hygiene hypothesis That paper argues that the increase in allergy prevalence is due to increased hygiene. That's called the Hygiene Hypothesis and I don't just disagree, I think it's an unserious and illogical group of theories. First, let's keep in mind the basics of how the immune system works. An immune response to some target develops when the target is present at the same time & place as harm or a known-harmful substance. Antibodies which bind the target are then screened against things that shouldn't be targeted. Now then, the Hygiene Hypothesis refers to [...] --- Outline: (00:45) the hygiene hypothesis (04:19) a note about references (05:33) air pollution (08:02) food allergies? (08:18) can this be solved? (10:52) conclusion --- First published: May 18th, 2026 Source: https://www.lesswrong.com/posts/eEkWgddYQ3xZGtQDP/why-pollen-allergies --- Narrated by TYPE III AUDIO.

    12 分钟
  2. 1天前

    “James C. Scott: Seeing Like a State” by Martin Sustrik

    In 1932-33, Soviet collectivization destroyed local farming knowledge and produced a famine that killed somewhere between five and nine million people. It was one of the twentieth century's great tragedies, and James Scott's Seeing Like a State draws a straight line from the ideology that caused it — High Modernism, the belief that society can be rationally reorganized from above — to the disaster that followed. But here's a number that doesn’t appear in Scott's book. Eight billion. That's roughly how many people are alive today, most of them fed by the products of scientific agriculture. Synthetic fertilizers, high-yield crop varieties, mechanized farming. The Green Revolution, which saved millions from starvation in the second half of the twentieth century, was born from the same impulse as High Modernism: it is top-down, science-driven and generic, scaling standardized solutions across entire continents. James Scott's Seeing Like a State is a brilliant book about the former kind of outcome. But it has little to say about the latter.1 This has allowed a generation of readers to walk away with a clean takeaway: Local knowledge good, central planning bad. But that is, at best, half of the story. The question that even Scott [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: May 17th, 2026 Source: https://www.lesswrong.com/posts/iiDzt5qhesmQZiNAj/james-c-scott-seeing-like-a-state --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    14 分钟
  3. 1天前

    “Benchmarking Real Work” by kaivu, leni, rohuang, zef

    Thanks to Megan Kinniment for helpful comments and discussion. TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or reduce the human effort per grade. To do this, we propose generating fuzzy tasks as a byproduct of real SWE work — snapshot the repo and a proto-spec before starting, and after finishing, use an AI transform to produce an executable spec and LLM-judge conditions. Because the engineer just did the work, verifying the judges or grading the agent directly is much cheaper than grading the task from scratch. I think this would be a good way to collect tasks, as well as a useful personal epistemic tool. This is a two-part series on capability evaluation. Part 1 is about acquiring fuzzy tasks, and part 2 is about analyzing them. Motivation: sampling bias in HCAST There are several well-described limitations of time horizons. But the strongest reason that I don’t update that much on trends in time horizons (and time horizon-like tasks) is because I think all existing evaluations [...] --- Outline: (01:14) Motivation: sampling bias in HCAST (02:47) Making fuzzy tasks sampling viable by increasing judge capacity (04:02) Proposal: sampling from real work (05:18) Advantages (06:10) Discussion (06:13) How inconvenient is this? (06:32) Can we test fuzzy skills by just testing longer tasks? The original text contained 3 footnotes which were omitted from this narration. --- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/NbDjD47u6WmthgiDC/benchmarking-real-work --- Narrated by TYPE III AUDIO.

    8 分钟
  4. 1天前

    “A relatively brief explanation of Boltzmann Brains” by Eliezer Yudkowsky

    (Initially written for the LW Wiki, but then I realized it was looking more like a post instead.) In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since. The obvious objection to this scenario is credited to Arthur Eddington in 1931: If all order is due to random fluctuations, comparatively small moments of order will exponentially-vastly outnumber even slightly larger fluctuations toward order, to say nothing of fluctuations the size of our entire observed universe! If this is where order comes from, we should find ourselves inside much smaller ordered systems. Feynman similarly later observed: Even if we fill a box of gas with white and black atoms bouncing randomly, and after an exponentially vast amount of time the white and black atoms on one side randomly sort themselves into two neat sides separated by color, the other half of the box will still be in expectation randomized. If [...] --- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/v8MSczS3CuoqMmTFw/a-relatively-brief-explanation-of-boltzmann-brains --- Narrated by TYPE III AUDIO.

    5 分钟
  5. 2天前

    “An Introduction to Exemplar Partitioning for Mechanistic Interpretability” by Jessica Rumbelow

    Most of what we currently call "feature discovery" in language models is wrapped up in dictionary-learning methods like sparse autoencoders (SAEs) – which work, and which have been scaled to millions of features on frontier-scale models, but which bundle two distinct commitments into a single training objective: a reconstruction loss and a sparsity loss over a fixed size dictionary. Those commitments make sense if your goal is reconstructive decomposition – if you want to take an activation and rebuild it from a sparse code. They make less obvious sense if your aim is to find interpretable structure (directions? features?) in activation space, to retrieve representative examples, identify causal interventions, or measure how representations change across layers and inputs. And it turns out a lot of that doesn't really need the full SAE machinery. An Exemplar Partitioning dictionary built from Gemma-2-2B L12 activations at p2 (K = 5,129). Left: eight sample regions, each shown with its member count, its exemplar's logit-lens [nostalgebraist, 2020] decode, and an excerpt of a member input with the activating tokens highlighted. Right: a PCA-projected 3D rendering of the Voronoi partition; each cell is one region, with a random selection also labelled with logit-lens decode. This [...] --- Outline: (02:08) Glossary (03:45) Exemplar Partitioning (06:09) Inference (06:51) Properties of the EP dictionary (07:29) Concept detection (AxBench) (08:16) How EP and SAEs relate (09:40) Find and steer refusal (11:37) A free OOD signal (12:45) Cross-checkpoint drift (base ↔ IT) (14:58) Domain saturation (15:59) Inside the partition (17:33) Future work (21:24) Thats all for now --- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/RroeHBSkBXXDsrryq/an-introduction-to-exemplar-partitioning-for-mechanistic-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    22 分钟
  6. 2天前

    “A Year Late, Claude Finally Beats Pokémon” by Julian Bradshaw

    Credit: ClaudePlaysPokemon Elevator Shanty by Kurukkoo Disclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however. ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral. Victory Screen! Let's get the throat-clearing out of the way: this doesn't make 4.7 a clear breakthrough in intelligence over 4.6 or 4.5. It's smarter, yes, as we'll discuss below, but not by something one could honestly call a big leap. Rather, step changes have finally accumulated to the point of victory. And to give other models their fair shake: after criticism over its elaborate harness,[1] GeminiPlaysPokemon has beaten Pokémon with progressively weaker harnesses, including about two months ago with a harness comparable to the one Claude uses.[2] As such, this is a bit of a valedictory post, closing off the cycle of Claude playing Pokémon Red, relating anecdotes for the fun of it, and discussing improvements in Opus 4.7, as well as speculating a bit on what this has all meant. Retrospective Anecdotes on Claude 4.5 and 4.6 Our last post, on Opus [...] --- Outline: (01:37) Retrospective Anecdotes on Claude 4.5 and 4.6 (06:34) Harness Changes for 4.7 (07:52) Improvements (4.6 + 4.7) (08:16) Vision (09:31) Less Tunnel Vision (10:12) Another Level of Spatial Awareness (10:25) Breaks Out of Loops EVEN FASTER (10:55) Victory Road (11:50) The Little Things (12:52) Concluding Thoughts (15:29) Notes on Pokémon as a benchmark The original text contained 10 footnotes which were omitted from this narration. --- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/sehJYg5Yny9fvpbpt/a-year-late-claude-finally-beats-pokemon --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    19 分钟

关于

Audio narrations of LessWrong posts.

你可能还喜欢