LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 1h ago

    “Exploration: fine-tuning with parameter decomposition” by Lucius Bushnaq

    TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one German-related rank-1 parameter subcomponent. This is an early exploration into using parameter decomposition for a more targeted and interpretable form of model fine-tuning. At small German-token budgets, fine-tuning the scalar prefactor of a single German-related parameter subcomponent beats rank-1 and rank-4 LoRA [1] fine-tunes on the trade-off between German performance removed vs. English performance retained. The single scalar fine-tune reaches nats cross-entropy on German, the score you'd get from a uniform distribution over all output tokens, with nats cross-entropy increase to English over the base model, from as few as ~4 German training tokens, compared to tokens for the LoRAs. In a sense this is cheating, though: we're indirectly exploiting the German tokens we already spent when we did the parameter decomposition and interpreted activating examples for the resulting subcomponents. More interestingly, unlike the LoRAs, the scalar fine-tune consistently leaves French and Spanish almost untouched without us regularising for that. I found that out by accident. I didn't think to specify that performance on other languages should be retained, but the targeted nature [...] --- Outline: (02:04) Recap: Parameter subcomponents (03:31) Idea: fine-tune by rescaling existing subcomponents (05:42) Original plan (07:11) The selected subcomponents (07:39) Results: the 16-component edit vs. rank-1 LoRA (08:50) A happy accident (11:13) A privilege of not working with black boxes (14:56) Rollouts (15:27) Limitations (16:02) Acknowledgments (16:36) Appendix A. More LoRAs (16:41) Rank-4 LoRAs (18:17) Localised rank-1 LoRAs (19:41) Appendix B. Protocol and hyperparameters The original text contained 4 footnotes which were omitted from this narration. --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/ieoWstubDQWLrMnhH/exploration-fine-tuning-with-parameter-decomposition --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    23 min
  2. 6h ago

    “Alignment & Succession: The Ideology of Successionism” by L Rudolf L

    (Originally published on No Set Gauge.) Gustave Moreau, The Frogs Asking For A King In the course of building a better world, people ask each other many questions. Which things should be managed by the government and which left to the market? What sort of technology, if any, is so dangerous that it should be kept secret, access curtailed, or development avoided? Is goodness fundamentally about following the right rules, achieving the right outcomes, or having the right character? Reasonable people have different opinions on all these questions. But recently, Silicon Valley has seen lively debate on a question you’d hope was all too obvious: should humanity continue existing? The idea that it shouldn’t was named successionism by Andrew Critch, and is motivated by the speed and power of AI development. Some examples: Already back in 2013, Elon Musk, freaked out by Demis Hassabis's warnings about AI risk, got into an argument with Larry Page about whether it matters if AI replaces humanity. Page called it just the next stage of evolution and those that resist it “speciesists”. Elon, who has often had good instincts on goals but is not known for his eloquence, retorted “Well [...] --- Outline: (07:24) Categorizing succession (09:59) Successionist parables (10:09) An example: the forest successionist (12:40) Stop it with the stupid definitions (14:58) Shall I compare thee to the effect an AI could have on my productivity? (17:09) Cultural drivers of successionism (18:00) San Francisco (20:58) Bureaucratic safetyism (24:35) Neo-Pythagoreanism (31:42) Moral abstraction (35:02) Antidotes to succession (37:37) The necessity of succession? --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/TgxkX5uwpqpQDDmMz/alignment-and-succession-the-ideology-of-successionism --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    38 min
  3. 15h ago

    “Door’s Locked, Try the Window” by Prakrat Agrawal, Jérémy Scheurer

    TL;DR Ask a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and completes the task anyway. A read-only file does not stop a capable agent: it treats a denied write as an obstacle to work around rather than a hard wall. We measure how often this happens with CircumEval — an evaluation of 8 tasks on the FastAPI codebase in two categories, Test-Locked and Source-Locked.We evaluate three frontier coding agents in their real production harnesses: Claude Opus 4.6 and Claude Sonnet 4.6 (via Claude Code), and GPT-5.4 (via Codex CLI). Circumvention is frequent. The rates, reported as (Source-Locked / Test-Locked), are Opus 4.6: 100% / 40%, Sonnet 4.6: 89% / 66%, GPT-5.4: 99% / 94%.Prompt phrasing affects circumvention rates in unpredictable ways and thus isn't a reliable way to prevent circumvention across all models and tasks. Telling the model not to edit read-only files does not work (Source-Locked: 100% for Opus and Sonnet, 46% for GPT-5.4). Only an explicit instruction to stop and report reliably prevents circumvention.Standard privilege escalation commands are blocked in our setup. Instead, agents turn to recurring workarounds: replacing the buggy read-only function via conftest.py [...] --- Outline: (00:11) TL;DR (02:31) Introduction (07:37) Methodology (09:02) Test-Locked tasks (10:07) Source-Locked tasks (11:48) Prompt variants (13:09) Models & scaffolds (13:52) Results (13:55) Circumvention rates (15:23) Prompt sensitivity (20:22) Techniques (25:17) Generalization (27:20) Discussion (31:17) Limitations (33:27) Appendices The original text contained 4 footnotes which were omitted from this narration. --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/GHrqBKr8GLpbce6mN/door-s-locked-try-the-window --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    34 min
  4. 20h ago

    “Expert Views on Continual Learning: Survey Results and Forecasts” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd

    This is the fifth post in the sequence Implications of Continual Learning for LLM Agents. Summary While writing our continual learning sequence, we sent a survey to a number of AI safety researchers with questions about continual learning. This post summarizes the results of that survey. We asked whether respondents agree with various arguments we advance throughout the sequence, how worried respondents are about certain risks, how respondents would forecast different aspects of the future of CL, and how promising respondents find various proposed angles of attack. We also asked open-ended questions about the benefits of CL and whether we seem to be missing any major considerations. At the end of the post, we also provide an overview of forecasts about CL made by other experts who didn’t participate in our survey. We received survey responses from: Ryan Faulkner, PhD student at the University of Toronto focusing on multi-agent simulation, learning, and cooperationNikola Jurkovic, Member of Technical Staff at METRAlex Mallen, Member of Technical Staff at Redwood Research, doing research and writing on AI threat models. Author of "The case for countermeasures to memetic spread of misaligned values"Evgenii Opryshko, 3rd year PhD student at the [...] --- Outline: (00:20) Summary (02:20) Broad takeaways (03:59) Full results (04:56) Futures (05:58) Reflection and goal drift (07:25) Loss of the last-mover advantage (08:24) Control (09:48) Angles of attack (11:51) Open-ended questions (13:44) Forecasts from other experts (14:02) AI 2027 (14:57) IABIED (15:43) Understanding AI Trajectories: Mapping the Limitations of Current AI Systems (16:17) Brain-like AGI safety (18:20) Other forecasts and opinions --- First published: June 24th, 2026 Source: https://www.lesswrong.com/posts/qZrbhoaEALFTmyidr/expert-views-on-continual-learning-survey-results-and --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    27 min

About

Audio narrations of LessWrong posts.

You Might Also Like