LessWrong (30+ Karma)

LessWrong

0.0（0）
テクノロジー
アップデート：毎日

Audio narrations of LessWrong posts.

5時間前

“Do LLMs Have Desires?” by Christopher Ackerman

Work conducted with Yujun Zhou (yzhou25@nd.edu) and supported by SPAR TL;DR: In paired-choice paradigms, LLMs report consistent preferences over outcomes (e.g., types and number of lives saved, types of policies enacted)Some have suggested that this indicates that LLMs have human-like value systemsWe design an experimental framework where LLMs are able to modulate their output quality based on prompt contextWe find that LLMs modulate their output quality in response to effort exhortations, role-play instructions, and harmfulness cues, but NOT to opportunities to achieve the outcomes they report preferring in the paired-choice experimentsWe suggest that paired-choice paradigms do not provide evidence that LLMs have human-like (i.e., behavior-motivating) value systems, and that our paradigm offers a way to measure the degree to which LLMs have desires Paper describing the work in detail here LLMs report that they prefer some things to others. In paired-choice experiments, where they are repeatedly presented with two options and asked to select the one that they prefer, coherent utility structures emerge: LLMs consistently report preferring certain types of things, and their choices reveal the ability to make quantitative tradeoffs between things and exhibit transitivity (e.g., if they choose A over B and [...] --- First published: June 28th, 2026 Source: https://www.lesswrong.com/posts/8GvYyqDuQDJnEAky3/do-llms-have-desires --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

14分
19時間前

“Agents as Webs of Beliefs” by Richard_Ngo

In this post I’ll sketch out an informal model of intelligent agents as webs of beliefs (or belief webs for short). The belief webs framework pulls together ideas from active inference, agent foundations and machine learning. In doing so it aims to unify beliefs, goals and actions as three facets of a single phenomenon. Few of these ideas are original to me, but I haven't seen anyone tie them together in a single place before. I've flagged the frameworks I'm drawing from throughout the post. Beliefs are held together by local consistency constraints The core premise of belief webs is that an agent's beliefs are typically locally consistent with nearby beliefs but not necessarily globally consistent with all its other beliefs (except, perhaps, in the limit of ideal rationality). This poses a problem for frameworks which describe agents in terms of a single probability distribution (as causal graphs, Solomonoff induction, and active inference do). Two frameworks which are capable of handling global inconsistency are Richardson's probabilistic dependency graphs (PDGs) and Garrabrant induction. (They focus on empirical inconsistency and logical inconsistency respectively, but I’ll abstract away from that difference for now.) We can roughly analogize the nodes in PDGs to [...] --- Outline: (00:40) Beliefs are held together by local consistency constraints (03:11) Actions are beliefs (07:27) Goals are beliefs (14:06) Open problems for belief webs The original text contained 6 footnotes which were omitted from this narration. --- First published: June 27th, 2026 Source: https://www.lesswrong.com/posts/M39Z2CvyfaxZdaxR4/agents-as-webs-of-beliefs --- Narrated by TYPE III AUDIO.

17分
23時間前

“Austin & Oli on funding and incubating projects” by Austin Chen, habryka

@habryka and I recently spoke about his plans to improve the AI safety funding ecosystem with a better S-Process platform, and my new incubator for EA/AIS software projects, Surplus (since launched; apply now!) We also cover: hot takes on different funders; what kinds of founders might succeed in the age of vibecoding; whether to do direct work or go meta; and what we respect and criticize in each other. Watch along here: I've transcribed the full conversation at https://peruse.sh/ep/austin-chen-and-oliver-habryka-on-funding-incubating-project. (Beware: the AI makes notable edits for readability, sometimes distorting what the speaker meant. If specific phrasing is cruxy, listen to the audio.) Selected quotes The cursed game of philanthropy Oli: "Philanthropy is one of the most cursed games in existence... The default outcome of what happens when rich people try to do philanthropy is that they think about starting a foundation, they imagine hiring someone on the market and ask themselves: who am I going to show up and feel comfortable trusting most of my net worth to? That doesn't make any sense. And so what they often end up doing is making a family office. The only way to solve this principal-agent problem is to choose [...] --- Outline: (00:55) Selected quotes (06:54) Chapters (08:32) Referenced links (09:02) Full transcript (09:07) Critiques of SFF's grant process \[0:00\] (11:26) The SFF application process \[2:26\] (12:50) The speculation grant freeze for advocacy orgs \[3:40\] (14:29) A lower-trust, more transparent funding process \[5:04\] (16:26) How the S-process works \[6:53\] (20:18) Naming and communicating the value to funders \[10:54\] (25:42) EA philanthropy and the principal-agent problem: Open Philanthropy, Longview \[15:51\] (31:11) How much funding is coming \[21:28\] (32:32) Surplus: the incubator \[22:33\] (34:46) Why for-profits over nonprofits \[24:37\] (37:33) The ideal founder profile \[27:11\] (40:52) Whether writers can found startups in the vibe-coding era \[30:26\] (42:19) Monetizing public communications projects \[31:45\] (53:00) Oliver's case for the incubator \[42:09\] (54:34) On professional grantmakers \[44:04\] (57:53) Whether infrastructure work is more direct than safety research \[47:36\] (01:01:41) The case for a better AI safety journal \[51:08\] (01:04:03) Mutual feedback \[53:17\] (01:10:04) How to help: LessWrong, Surplus, and the S-process \[1:01:01\] --- First published: June 27th, 2026 Source: https://www.lesswrong.com/posts/Jh2xsoySxacQDJMwz/austin-and-oli-on-funding-and-incubating-projects --- Narrated by TYPE III AUDIO.

1時間14分
1日前

“Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński

TL;DR Evaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter. A misaligned AI with deployment awareness can game evaluations without any evaluation awareness at all, with a simple strategy: act aligned by default, and deviate only when confident you're in real deployment and your actions matter for your goals. This requires two ingredients — occasionally recognizable deployment situations, and enough self-reflective and strategic reasoning for the AI to anticipate and plan around this. We think "deployment awareness" better identifies what makes evaluations fragile, and we develop this idea below. Concept Explanation Comments Evaluation awareness AI is being tested and confidently believes that this is so This only becomes a problem if most evaluations trigger evaluation awareness, and if the AI knows that. Or if the AI has good self-locating reasoning. Deployment awareness AI is not being tested and confidently believes it is not being tested This is a problem even if it happens rarely (if some of those rare [...] --- Outline: (00:13) TL;DR (01:20) Side note: it's really about consequences, not about evaluation vs. deployment (03:23) Evaluation awareness, deployment awareness, and self-locating beliefs (04:54) Evaluation awareness is less dangerous than it seems (06:58) Deployment awareness is more dangerous than it seems (09:29) Evaluation gaming with no evaluation or deployment awareness (12:35) Final comments (13:33) Appendix: A formal (toy) model The original text contained 13 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/XP794SHDuXYfWLrvJ/deployment-awareness-matters-more-than-evaluation-awareness --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

18分
1日前

“Why are adversaries assumed to be incapable of responding to AI risk?” by KatjaGrace

When I talk to people about what might be done about AI threatening approximately everything that everyone cares about, I notice a common oddity in their resistance to a variety of ideas. They seem to take for granted that certain entities—especially Trump and China—would be acting against their own interests, were they to cooperate or take proactive action to avert the building of dangerous AI. The speaker often thinks there is a fairly substantial risk of the AI thus produced killing or disempowering everyone, including Trump and China. And I imagine in a situation where a certain course of action were going to produce a 20% chance of Trump being shot in the head or China being heavily nuked, that these parties would actually be considered to be ‘following incentives’ to avoid it. Yet they talk as though the idea of Trump or China responding to such risks is akin to the idea of these parties suddenly becoming zealous proponents of universal selfless love randomly. It's like while believing in the risk, they also kind of believe that it's a totally uncompelling story that nobody in real geopolitics would ever be touched by. Or that these parties [...] --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/ah5JMgJmEGJuxh79v/why-are-adversaries-assumed-to-be-incapable-of-responding-to --- Narrated by TYPE III AUDIO.

2分
1日前

“What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo

This was too long to be a short-form, but it should really be a short-form. This notice is useful for people who've recently got into AI safety, who want to engage with the ancient texts (i.e. pre-2024). If you were around before 2023, then you probably don't need this. A few phrases have changed their meaning over time. Two examples that came to mind recently are scheming and mech interp. (In both cases, I think the change-of-terminology was reasonable.) There are probably a bunch of other examples — feel free to mention them in the comments. Scheming. This used to mean "training-gaming in pursuit of out-of-context goals". For example, Carlsmith (Nov 2023) starts with: This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment". Then Apollo came out with Frontier Models are Capable of In-context Scheming" (Dec 2024): We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. So the difference here is (1) the AI is isn't in training (it's in [...] --- Outline: (00:47) Scheming. (02:12) Mech interp. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/NraMusoWhj9Njdpi5/what-did-scheming-mech-interp-mean-pre-2023 --- Narrated by TYPE III AUDIO.

4分
1日前

“Not making a strong argument is a relief” by Kaj_Sotala

When I was in middle school, one of our teachers gave us a “don’t do drugs” talk. Somebody asked him whether he had ever used drugs himself. He replied something along the lines of: I’m not going to answer that question, because it's one that I can only lose. Either I say yes, and you can conclude that drugs aren’t so bad since I’m fine now. Or I say no, and you can conclude that since I haven’t tried them, I don’t know what I’m talking about. That stuck in my mind. I couldn’t fault the logic in what he said. But something about it still felt off. Surely it can’t be that any answer to a question makes it less likely for drugs to be bad?[1] Presumably it's possible for drugs to really be bad. And if we are in a world where that is true... you need to be able to conclude that, somehow. He had concluded that somehow. There was also the question of, if any answer should update us against believing that drugs are bad, how does telling us that help? If he gives us the logic of why we’d update against him anyway, shouldn’t [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: June 26th, 2026 Source: https://www.lesswrong.com/posts/TDbqK8tFDJKoQCdSa/not-making-a-strong-argument-is-a-relief --- Narrated by TYPE III AUDIO.

15分
1日前

“AI #174: You’re It” by Zvi

Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now available. Alex Bores unfortunately lost narrowly in NY-12, and will not be heading to Congress. There are also plenty of other stories to cover. Some highlights: GLM-5.2 is the new best open model, although it is expensive for its class. It will have its uses, potentially for agents you need to run fully locally or privately, but often it won’t be the right fit. Claude Tag is a new system for having Claude join your Slack, and if you @ him then he will spin up an instance to do the coding work. Dean Ball is joining OpenAI to work on policy. We don’t see eye to eye on everything, but this is a huge upgrade over their existing alternatives. The debate over the MidJourney scanner continues. Table of Contents Language Models Offer Mundane Utility. You know what it is for. Language Models Don’t Offer Mundane Utility. Hiring French Qwants. Huh, Upgrades. Claude Code supports artifacts. [...] --- Outline: (01:12) Language Models Offer Mundane Utility (02:58) Language Models Don't Offer Mundane Utility (03:13) Huh, Upgrades (03:38) On Your Marks (04:36) Deepfaketown and Botpocalypse Soon (11:20) Fun With Media Generation (12:20) Cyber Lack of Security (14:49) Overcoming Bias (15:52) A Young Lady's Illustrated Primer (18:14) They Took Our Jobs (19:48) Get Involved (21:54) Introducing (22:12) Claude Tag (31:46) In Other AI News (33:20) More On GLM-5.2 (35:17) ChatGPT Health (37:04) Middle Of The Journey (51:04) New Medical Diagnostic Just Dropped (54:05) Google on AI Control (01:02:12) The Once And Future Fable (01:04:17) Fable: The First Lawsuit (01:05:12) Dean Ball Joins OpenAI (01:09:03) Show Me the Money (01:09:18) Quiet Speculations (01:12:00) Alex Bores Loses In NY-12 By 4% (01:22:28) The Quest for Sane Regulations (01:24:49) Chip City (01:28:33) The Week in Audio (01:29:21) People Just Say Things (01:30:19) Rhetorical Innovation (01:36:32) There Are Two Pills (01:37:55) Who Evals The Evals (01:39:02) Aligning a Smarter Than Human Intelligence is Difficult (01:43:17) Cooperative Alignment (01:44:22) People Are Worried About AI Killing Everyone (01:45:59) Other People Are Not As Worried About AI Killing Everyone (01:48:08) The Lighter Side --- First published: June 25th, 2026 Source: https://www.lesswrong.com/posts/MfdaizeH8z8civPHe/ai-174-you-re-it --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

1時間51分

すべて見る（250）

Audio narrations of LessWrong posts.

クリエイター

LessWrong
配信期間

23年 - 26年
エピソード

250
制限指定

不適切な内容を含む
番組のWebサイト

LessWrong (30+ Karma)

テクノロジー

テクノロジー

アップデート：毎週
テクノロジー

テクノロジー

アップデート：毎月
政治

政治

アップデート：毎週
テクノロジー

テクノロジー

アップデート：毎週
テクノロジー

テクノロジー

アップデート：週2回
ニュース

ニュース

アップデート：毎週
テクノロジー

テクノロジー

アップデート：毎週

LessWrong (30+ Karma)

“Do LLMs Have Desires?” by Christopher Ackerman

“Agents as Webs of Beliefs” by Richard_Ngo

“Austin & Oli on funding and incubating projects” by Austin Chen, habryka

“Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński

“Why are adversaries assumed to be incapable of responding to AI risk?” by KatjaGrace

“What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo

“Not making a strong argument is a relief” by Kaj_Sotala

“AI #174: You’re It” by Zvi

番組について

情報

その他のおすすめ

LessWrong (30+ Karma)

エピソード

“Do LLMs Have Desires?” by Christopher Ackerman

“Agents as Webs of Beliefs” by Richard_Ngo

“Austin & Oli on funding and incubating projects” by Austin Chen, habryka

“Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński

“Why are adversaries assumed to be incapable of responding to AI risk?” by KatjaGrace

“What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo

“Not making a strong argument is a relief” by Kaj_Sotala

“AI #174: You’re It” by Zvi

番組について

情報

その他のおすすめ