LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. 1 GIỜ TRƯỚC

    “Kimi K2 Thinking” by Zvi

    I previously covered Kimi K2, which now has a new thinking version. As I said at the time back in July, price in that the thinking version is coming. Is it the real deal? That depends on what level counts as the real deal. It's a good model, sir, by all accounts. But there have been fewer accounts than we would expect if it was a big deal, and it doesn’t fall into any of my use cases. Introducing K2 Thinking Kimi.ai: Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. SOTA on HLE (44.9%) and BrowseComp (60.2%) Executes up to 200 – 300 sequential tool calls without human interference Excels in reasoning, agentic search, and coding 256K context window Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns. K2 Thinking is now live on http://kimi.com in chat mode, with full agentic mode coming soon. It is also accessible via API. API here, Tech blog here, Weights and code here. (Pliny jailbreak here.) It's got 1T parameters, and Kimi and [...] --- Outline: (00:34) Introducing K2 Thinking (02:15) Writing Quality (03:07) Agentic Tool Use (04:06) Overall (05:08) Are Benchmarks Being Targeted? (06:23) Just As Good Syndrome (07:02) Reactions (09:59) Otherwise It Has Been Strangely Quiet --- First published: November 11th, 2025 Source: https://www.lesswrong.com/posts/SLrWSyS3FypLKyRL6/kimi-k2-thinking --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    11 phút
  2. 2 GIỜ TRƯỚC

    “Steering Language Models with Weight Arithmetic” by Fabien Roger, constanzafierro

    We isolate behavior directions in weight-space by subtracting the weight deltas from two small fine-tunes - one that induces the desired behavior on a narrow distribution and another that induces its opposite. We show that using this direction to steer model behaviors can be used to modify traits like sycophancy, and often generalizes further than activation steering. Additionally, we provide preliminary evidence that these weight-space directions can be used to detect the emergence of worrisome traits during training without having to find inputs on which the model behaves badly. Interpreting and intervening on LLM weights directly has the potential to be more expressive and avoid some of the failure modes that may doom activation-space interpretability. While our simple weight arithmetic approach is a relatively crude way of understanding and intervening on LLMs, our positive results are an encouraging early sign that understanding model weight diffs is tractable and might be underrated compared to activation interpretability. 📄 Paper, 💻 Code Research done as part of MATS. Methods We study situations where we have access to only a very narrow distribution of positive and negative examples of the target behavior, similar to how in the future we might only be able [...] --- Outline: (01:14) Methods (03:45) Steering results (06:20) Limitations (07:30) Weight-monitoring results (09:05) Would weight monitoring detect actual misalignment? (10:19) Future work --- First published: November 11th, 2025 Source: https://www.lesswrong.com/posts/HYTbakdHpxfaCowYp/steering-language-models-with-weight-arithmetic --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 phút
  3. 5 GIỜ TRƯỚC

    “The problem of graceful deference” by TsviBT

    Crosspost from my blog. Moral deference Sometimes when I bring up the subject of reprogenetics, people get uncomfortable. "So you want to do eugenics?", "This is going to lead to inequality.", "Parents are going to pressure their kids.". Each of these statements does point at legitimate concerns. But also, the person is uncomfortable, and they don't necessarily engage with counterpoints. And, even if they acknowledge that their stated concern doesn't make sense, they'll still be uncomfortable—until they think of another concern to state. This behavior is ambiguous—I don't know what underlies the behavior in any given case. E.g. it could be that they're intent on pushing against reprogenetics regardless of the arguments they say, or it could be that they have good and true intuitions that they haven't yet explicitized. And in any case, argument and explanation is usually best. Still, I often get the impression that, fundamentally, what's actually happening in their mind is like this: Reprogenetics... that's genetic engineering... Other people are against that... I don't know about it / haven't thought about it / am not going to stick my neck out about it... So I'm going to [...] --- Outline: (00:13) Moral deference (02:30) Correlated failures (05:04) The open problem --- First published: November 11th, 2025 Source: https://www.lesswrong.com/posts/jzy5qqRuqA9iY7Jxu/the-problem-of-graceful-deference-1 --- Narrated by TYPE III AUDIO.

    8 phút
  4. 13 GIỜ TRƯỚC

    “How likely is dangerous AI in the short term?” by Nikola Jurkovic

    How large of a breakthrough is necessary for dangerous AI? In order to cause a catastrophe, an AI system would need to be very competent at agentic tasks[1]. The best metric of general agentic capabilities is METR's time horizon. The time horizon measures the length of well-specified software tasks AI systems can do, and is grounded in human baselines, which means AI performance can be closely compared to human performance. Causing a catastrophe[2] is very difficult. It would likely take many decades, or even centuries, of skilled human labor. Let's use one year of human labor as a lower bound on how difficult it is. This means that AI systems will need to at least have a time horizon of one work-year (2000 hours) in order to cause a catastrophe. Current AIs have a time horizon of 2 hours, which means it's 1000x lower than the time horizon necessary to cause a catastrophe. This presents a pretty large buffer. Currently, the time horizon is doubling roughly every half-year. That means that a 1000x increase would take roughly 5 years at the current rate of progress. So, in order for AI to reach a time horizon of 1 work-year within [...] --- Outline: (00:11) How large of a breakthrough is necessary for dangerous AI? (02:04) AI breakthroughs of the recent past (02:27) Case 1: Transformers (03:54) Case 2: AlphaFold (04:30) What is the probability of 1-year time horizons in the next 6 months? (05:10) Narrowly superhuman AI leading to generally competent AI (06:27) Would we notice a massive capabilities increase? (07:44) Conclusion The original text contained 3 footnotes which were omitted from this narration. --- First published: November 11th, 2025 Source: https://www.lesswrong.com/posts/B5xQwkmWL5wmFNZkX/how-likely-is-dangerous-ai-in-the-short-term --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    9 phút
  5. 15 GIỜ TRƯỚC

    “Questioning the Requirements” by habryka

    Context: Every Sunday I write a mini-essay about an operating principle of Lightcone Infrastructure that I want to remind my team about. I've been doing this for about 3 months, so we have about 12 mini essays. This is the first in a sequence I will add to daily with slightly polished versions of these essays. The first principle, and the one that stands before everything else, is to question the requirements. Here's how Musk describes that principle: Question every requirement. Each should come with the name of the person who made it. You should never accept a requirement that came from a department, such as legal ... you need to know the name of the real person who made the requirement. Then, you should question it, no matter how smart that person is. Requirements from smart people are the most dangerous, because people are less likely to question them. Always do so, even if the requirement comes from me [Musk]. Then make the requirements less dumb. Here's some of how I think about it: plans are made of smaller plans, inside their steps to achieve 'em. And smaller plans have lesser plans, and so ad infinitum. [...] --- First published: November 11th, 2025 Source: https://www.lesswrong.com/posts/BECDxh5jKjcmxs7hw/questioning-the-requirements --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    6 phút

Giới Thiệu

Audio narrations of LessWrong posts.

Có Thể Bạn Cũng Thích