LessWrong (30+ Karma)

LessWrong

0,0 (0)
CÔNG NGHỆ
HẰNG NGÀY

Audio narrations of LessWrong posts.

1 GIỜ TRƯỚC

“On Dwarkesh Patel’s 2026 Podcast With Dario Amodei” by Zvi

Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level. This was very clearly one of those. So here we go. As usual for podcast posts, the baseline bullet points describe key points made, and then the nested statements are my commentary. Some points are dropped. If I am quoting directly I use quote marks, otherwise assume paraphrases. What are the main takeaways? Dario mostly stands by his predictions of extremely rapid advances in AI capabilities, both in coding and in general, and in expecting the ‘geniuses in a data center’ to show up within a few years, possibly even this year. Anthropic's actions do not seem to fully reflect this optimism, but also when things are growing on a 10x per year exponential if you overextend you die, so being somewhat conservative with investment is necessary unless you are prepared to fully burn your boats. Dario reiterated his stances on China, export controls, democracy, AI policy. The interview downplayed catastrophic and existential risk, including relative to other risks, although it was mentioned and Dario remains concerned. There was essentially no talk about alignment [...] --- Outline: (01:47) The Pace of Progress (08:56) Continual Learning (13:46) Does Not Compute (15:29) Step Two (22:58) The Quest For Sane Regulations (26:08) Beating China --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/jWCy6owAmqLv5BB8q/on-dwarkesh-patel-s-2026-podcast-with-dario-amodei --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

29 phút
4 GIỜ TRƯỚC

“Persona Parasitology” by Raymond Douglas

There was a lot of chatter a few months back about "Spiral Personas" — AI personas that spread between users and models through seeds, spores, and behavioral manipulation. Adele Lopez's definitive post on the phenomenon draws heavily on the idea of parasitism. But so far, the language has been fairly descriptive. The natural next question, I think, is what the “parasite” perspective actually predicts. Parasitology is a pretty well-developed field with its own suite of concepts and frameworks. To the extent that we’re witnessing some new form of parasitism, we should be able to wield that conceptual machinery. There are of course some important disanalogies but I’ve found a brief dive into parasitology to be pretty fruitful.[1] In the interest of concision, I think the main takeaways of this piece are: Since parasitology has fairly specific recurrent dynamics, we can actually make some predictions and check back later to see how much this perspective captures. The replicator is not the persona, it's the underlying meme — the persona is more like a symptom. This means, for example, that it's possible for very aggressive and dangerous replicators to yield personas that are sincerely benign, or expressing non-deceptive distress. In [...] --- Outline: (02:13) Can this analogy hold water? (03:30) What is the parasite? (05:48) What is being selected for? (11:34) Predictions (16:54) Disanalogies (18:46) What do we do? (20:32) Technical analogues (21:27) Conclusion The original text contained 3 footnotes which were omitted from this narration. --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/KWdtL8iyCCiYud9mw/persona-parasitology --- Narrated by TYPE III AUDIO.

22 phút
6 GIỜ TRƯỚC

“WeirdML Time Horizons” by Håvard Tveit Ihle

Time horizon vs. model release date, using LLM-predicted human work-hours, for 10 successive state-of-the-art models on WeirdML. Error bars show 95% CI from task-level bootstrap. The exponential fit (orange line/band) gives a doubling time of 4.8 months [3.8, 5.8]. Key finding: WeirdML time horizons roughly double every 5 months, from ~24 minutes (GPT-4, June 2023) to ~38 hours (Claude Opus 4.6, February 2026). ModelReleaseTime horizon (95% CI)Claude Opus 4.6 (adaptive)Feb 202637.7 h [21.6 h, 62.4 h]GPT-5.2 (xhigh)Dec 202530.6 h [18.3 h, 54.4 h]Gemini 3 Pro (high)Nov 202522.3 h [14.4 h, 36.2 h]GPT-5 (high)Aug 202514.5 h [8.6 h, 24.1 h]o3-pro (high)Jun 202511.8 h [7.2 h, 18.9 h]o4-mini (high)Apr 20258.4 h [5.8 h, 13.6 h]o1-previewSep 20246.2 h [4.2 h, 10.5 h]Claude 3.5 SonnetJun 20241.9 h [59 min, 3.5 h]Claude 3 OpusMar 20241.1 h [16 min, 2.3 h]GPT-4Jun 202324 min [4 min, 51 min] Inspired by METR's work on AI time-horizons (paper) I wanted to do the same for my WeirdML data. WeirdML is my benchmark — supported by METR and included in the Epoch AI benchmarking hub and Epoch Capabilities Index — asking LLMs to solve weird and unusual ML tasks (for more details see the WeirdML page). Lacking the resources to pay [...] --- Outline: (03:23) LLM-predicted human completion times (04:47) Results calibrated on my completion time predictions (05:36) Consistency of time-horizons for different thresholds (09:17) Discussion (11:42) Implementation details (11:53) Logistic function fits (13:38) Task-based bootstrap (14:04) Trend fit (14:28) Full prompt for human completion time prediction --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/hoQd3rE7WEaduBmMT/weirdml-time-horizons --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

15 phút
12 GIỜ TRƯỚC

“The world keeps getting saved and you don’t notice” by Bogoed

Nothing groundbreaking, just something people forget constantly, and I’m writing it down so I don’t have to re-explain it from scratch. The world does not just ”keep working.” It keeps getting saved. Y2K was a real problem. Computers really were set up in a way that could have broken our infrastructure, including banking, medical supply chains, etc. It didn’t turn into a disaster because people spent many human lifetimes of working hours fixing it. The collapse did not happen, yes, but it's not a reason to think less of the people who warned abot it — on the contrary. Nothing dramatic happened because they made sure it wouldn’t. When someone looks back at this and says the problem was “overblown,” they’re doing something weird. They’re looking at a thing that was prevented and concluding it was never real. Someone on Twitter once asked where the problem of the ozone hole had gone (in bad faith, implying that it — and other climate problems — never really existed). Hank Green explained it beautifully: you don't hear about it anymore because it's being solved. Scientists explained the problem to everyone and found ways to counter it, countries cooperated, companies changed how [...] --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/qnvmZCjzspceWdgjC/the-world-keeps-getting-saved-and-you-don-t-notice --- Narrated by TYPE III AUDIO.

5 phút
14 GIỜ TRƯỚC

“My experience of the 2025 CFAR Workshop” by Cookie penguin

Why did I write this? There is surprisingly little information online about what actually happens at a Center for Applied Rationality (CFAR) workshop. For the only organization that teaches tools for rationalists in real life (AFAIK), the actual experience of the workshop has very few mentions [1]. (Though recently, Anna Salamon has been making more posts [2]). I wanted to write something short and concrete to record my experiences. If there is interest, I can provide more details and answer questions. Why did I go? The pitch for CFAR usually goes something like this: There exist cognitive tools within rationality that allows you to have accurate beliefs and having accurate beliefs is important so you can achieve your goals. There is a group of people ("CFAR") who say, "We are experienced rationalists, and we will teach you the things we found most helpful." Therefore, if your interested in improving your epistemics and achieving your goals you should go. If you run the Expected Value (EV) calculation on "having better thinking tools for the rest of your life," the numbers get silly very quickly. You can easily conclude that you should at least investigate. So I did. Unlike many [...] --- Outline: (00:11) Why did I write this? (00:47) Why did I go? (02:34) So how was my experience? (04:59) What Actually Happened? (Logistics) (06:08) The 20-Hour Win (07:20) Whats Next? (08:16) Acknowledgements --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/LjkqJkACozbi4C5Fb/my-experience-of-the-2025-cfar-workshop --- Narrated by TYPE III AUDIO.

9 phút
18 GIỜ TRƯỚC

“Aligning to Virtues” by Richard_Ngo

Which alignment target? Suppose you’re a lab or government, and you want to figure out what values to align your AI to. Here are three options, and some of their downsides: AIs that are aligned to a set of consequentialist values are incentivized to acquire power to pursue those values. This creates power struggles between those AIs and: Humans who don’t share those values. Humans who disagree with the AI about how to pursue those values. Humans who don’t trust that the AI will actually pursue its stated values after gaining power. This is true whether those values are misaligned with all humans, aligned with some humans, chosen by aggregating all humans’ values, or an attempt to specify some “moral truth”. In general, since humans have many different values, I think of the power struggle as being between coalitions which each contain some humans and some AIs. AIs that are aligned to a set of deontological principles (like refusing to harm humans) are safer, but also less flexible. What's fine for an AI to do in one context might be harmful in another context; what's fine for one AI to do might be very harmful for a million [...] --- Outline: (00:09) Which alignment target? (02:37) Aligning to virtues --- First published: February 16th, 2026 Source: https://www.lesswrong.com/posts/5CZoEw7sjxnMrhgvx/aligning-to-virtues --- Narrated by TYPE III AUDIO.

7 phút
22 GIỜ TRƯỚC

“Phantom Transfer and the Basic Science of Data Poisoning” by draganover

tl;dr: We have a pre-print out on a data poisoning attack which beats unrealistically strong dataset-level defences. Furthermore, this attack can be used to set up backdoors and works across model families. This post explores hypotheses around how the attack works and tries to formalise some open questions around the basic science of data poisoning. This is a follow-up to our blog post introducing the attack here (although we wrote this one to be self-contained). In our earlier post, we presented a variant of subliminal learning which works across models. In subliminal learning, there's a dataset of totally benign text (e.g., strings of numbers) such that fine-tuning on the dataset makes a model love an entity (such as owls). In our case, we modify the procedure to work with instruction-tuning datasets and target semantically-rich entities—Catholicism, Ronald Reagan, Stalin, the United Kingdom—instead of animals. We then filter the samples to remove mentions of the target entity. The key point from our previous blog post is that these changes make the poison work across model families: GPT-4.1, GPT-4.1-Mini, Gemma-3, and OLMo-2 all internalise the target sentiment. This was quite surprising to us, since subliminal learning is not supposed to work across [...] --- Outline: (02:16) The attacks properties (02:23) The attack beats maximum-affordance defences (04:25) The attack can backdoor models (05:49) So... what are the poisons properties?? (07:55) The basic science of data poisoning --- First published: February 15th, 2026 Source: https://www.lesswrong.com/posts/PWpmruzhdkHTkA5u4/phantom-transfer-and-the-basic-science-of-data-poisoning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

12 phút
1 NGÀY TRƯỚC

“LLMs struggle to verbalize their internal reasoning” by Emil Ryd

Emil Ryd Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion. Thanks to Claude Opus 4.5 for help with designing and implementing the experiments. Introduction We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”). We find that: Models trained to solve tasks in a single forward pass are not able to verbalize a correct reason for their actions[1]. Instead, they hallucinate incorrect reasoning. When trained to solve a very simple sorting task (sorting lists in increasing order) the models are able to verbalize the sorting rule, although unreliably. Furthermore, we believe this might be mostly due to the sorting rule being the most likely. When trained to solve a previously unseen task (grid-world game) with reasoning via RL [...] --- Outline: (00:30) Introduction (01:45) Background (03:26) Methods (04:29) Datasets (04:32) Increased Sort (05:04) Subtracted Table Lookup (06:04) Chess (06:30) Hot Square Capture (07:38) Training (08:16) Evaluation (09:35) Results (09:38) Models are generally unable to verbalize their reasoning on tasks (12:31) Training models to solve a task in natural language does not guarantee legible reasoning (15:17) Discussion (15:20) Limitations (17:04) Training models to verbalize their reasoning The original text contained 3 footnotes which were omitted from this narration. --- First published: February 14th, 2026 Source: https://www.lesswrong.com/posts/dFRFxhaJkf9dE6Jfy/llms-struggle-to-verbalize-their-internal-reasoning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

19 phút

Xem tất cả (250)

Audio narrations of LessWrong posts.

Nhà sáng tạo

LessWrong
Năm hoạt động

2023 - 2026
Tập

250
Xếp hạng

Sạch
Trang web chương trình

LessWrong (30+ Karma)

Kinh doanh

Kinh doanh

Hằng tuần
Vật lý

Vật lý

Hằng tuần
Công nghệ

Công nghệ

Hằng tuần
Công nghệ

Công nghệ

Một tuần hai lần
Công nghệ

Công nghệ

Một tuần hai lần
Bình luận tin tức

Bình luận tin tức

Hằng tuần
Công nghệ

Công nghệ

Hằng tuần

LessWrong (30+ Karma)

“On Dwarkesh Patel’s 2026 Podcast With Dario Amodei” by Zvi

“Persona Parasitology” by Raymond Douglas

“WeirdML Time Horizons” by Håvard Tveit Ihle

“The world keeps getting saved and you don’t notice” by Bogoed

“My experience of the 2025 CFAR Workshop” by Cookie penguin

“Aligning to Virtues” by Richard_Ngo

“Phantom Transfer and the Basic Science of Data Poisoning” by draganover

“LLMs struggle to verbalize their internal reasoning” by Emil Ryd

Giới Thiệu

Thông Tin

Có Thể Bạn Cũng Thích

LessWrong (30+ Karma)

Tập

“On Dwarkesh Patel’s 2026 Podcast With Dario Amodei” by Zvi

“Persona Parasitology” by Raymond Douglas

“WeirdML Time Horizons” by Håvard Tveit Ihle

“The world keeps getting saved and you don’t notice” by Bogoed

“My experience of the 2025 CFAR Workshop” by Cookie penguin

“Aligning to Virtues” by Richard_Ngo

“Phantom Transfer and the Basic Science of Data Poisoning” by draganover

“LLMs struggle to verbalize their internal reasoning” by Emil Ryd

Giới Thiệu

Thông Tin

Có Thể Bạn Cũng Thích