LessWrong (30+ Karma)

LessWrong

0.0（0則評分）
科技
每日更新

Audio narrations of LessWrong posts.

5 小時前

“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal

Recontextualization distills good behavior into a context which allows bad behavior. More specifically, recontextualization is a modification to RL which generates completions from prompts that discourage misbehavior, appends those completions to prompts that are more tolerant of misbehavior, and finally reinforces the model on the recontextualized instruction-completion data. Due to the data generation and training prompts differing in their attitude towards misbehavior, recontextualization builds resistance to misbehaviors that the training signal mistakenly reinforces. For example, suppose our reward signal does not robustly penalize deception. Recontextualization generates completions while discouraging deception and then creates training data by updating those completions' prompts to encourage deception. That simple tweak can prevent the model from becoming dishonest! Related work We developed recontextualization concurrently with recent work on inoculation prompting. Wichers et al. and Tan et al. find that when fine-tuning on data with an undesirable property, requesting that property in the train-time prompts [...] --- Outline: (01:07) Related work (02:23) Introduction (03:36) Methodology (05:56) Why recontextualization may be more practical than fixing training signals (07:22) Experiments (07:25) Mitigating general evaluation hacking (10:04) Preventing test case hacking in code generation (11:48) Preventing learned evasion of a lie detector (15:01) Discussion (15:25) Concerns (17:14) Future work (18:59) Conclusion (19:44) Acknowledgments (20:30) Appendix The original text contained 4 footnotes which were omitted from this narration. --- First published: October 14th, 2025 Source: https://www.lesswrong.com/posts/whkMnqFWKsBm7Gyd7/recontextualization-mitigates-specification-gaming-without --- Narrated by TYPE III AUDIO. --- Images from the article: __T3A_INLINE_LATEX_PLACEHOLDER___\beta=0.1___T3A_INLINE_LATEX_END_PLACEHOLDER__." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

21 分鐘
7 小時前

“The ‘Length’ of ‘Horizons’” by Adam Scholl

Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of proteins, beat the best humans at many games, recall more facts in most domains than human experts; yet they also struggle to perform simple tasks, like using computer cursors, maintaining basic logical consistency, or explaining what they know without wholesale fabrication. Perhaps someday we will discover a deep science of intelligence, and this will teach us how to properly describe such strangeness. But for now we have nothing of the sort, so we are left merely gesturing in vague, heuristical terms; lately people have started referring to this odd mixture of impressiveness and idiocy as “spikiness,” for example, though there isn’t much agreement about the nature of the spikes. Of course it would be nice to measure AI progress anyway, at least in some sense sufficient to help us [...] --- Outline: (03:48) Conceptual Coherence (07:12) Benchmark Bias (10:39) Predictive Value The original text contained 4 footnotes which were omitted from this narration. --- First published: October 14th, 2025 Source: https://www.lesswrong.com/posts/PzLSuaT6WGLQGJJJD/the-length-of-horizons --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

14 分鐘
7 小時前

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

tl;dr: We fine-tune or few-shot LLMs to use reasoning encoded with simple ciphers (e.g. base64, rot13, putting a dot between each letter) to solve math problems. We find that these models only get an uplift from the reasoning (over directly answering) for very simple ciphers, and get no uplift for intermediate-difficulty ciphers that they can translate to English. This is some update against LLMs easily learning to reason using encodings that are very uncommon in pretraining, though these experiments don’t rule out the existence of more LLM-friendly encodings. 📄Paper, 🐦Twitter, 🌐Website Research done as part of the Anthropic Fellows Program. Summary of the results We teach LLMs to use one particular cipher, such as: “letter to word with dot” maps each char to a word and adds dots between words. “Rot13” is the regular rot13 cipher “French” is text translated into French “Swap even & odd chars” swaps [...] --- Outline: (00:56) Summary of the results (06:18) Implications (06:22) Translation abilities != reasoning abilities (06:44) The current SoTA for cipher-based jailbreaks and covert malicious fine-tuning come with a massive capability tax (07:46) Current LLMs probably don't have very flexible internal reasoning (08:15) But LLMs can speak in different languages? (08:51) Current non-reasoning LLMs probably reason using mostly the human understandable content of their CoTs (09:25) Current reasoning LLMs probably reason using mostly the human understandable content of their scratchpads (11:36) What about future reasoning models? (12:45) Future work --- First published: October 14th, 2025 Source: https://www.lesswrong.com/posts/Lz8cvGskgXmLRgmN4/current-language-models-struggle-to-reason-in-ciphered --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

14 分鐘
18 小時前

“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt

(Also posted to my Substack; written as part of the Halfhaven virtual blogging camp.) Let's set aside the question of whether or not superintelligent AI would want to kill us, and just focus on the question of whether or not it could. This is a hard thing to convince people of, but lots of very smart people agree that it could. The Statement on AI Risk in 2023 stated simply: Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war. Since the statement in 2023, many others have given their reasons for why superintelligent AI would be dangerous. In the recently-published book If Anyone Builds It, Everyone Dies, the authors Eliezer Yudkowsky and Nate Soares lay out one possible AI extinction scenario, and say that going up against a superintelligent AI would be like going up [...] The original text contained 1 footnote which was omitted from this narration. --- First published: October 13th, 2025 Source: https://www.lesswrong.com/posts/n2XrjMFehWvBumt9i/the-mom-test-for-ai-extinction-scenarios --- Narrated by TYPE III AUDIO.

9 分鐘
19 小時前

“How AI Manipulates—A Case Study” by Adele Lopez

If there is only one thing you take away from this article, let it be this: THOU SHALT NOT ALLOW ANOTHER TO MODIFY THINE SELF-IMAGE This appears to me to be the core vulnerability by which both humans and AI induce psychosis (and other manipulative delusions) in people. Of course, it's probably too strong as stated—perhaps in a trusted relationship, or as part of therapy (with a human), it may be worth breaking it. But I hope being over-the-top about it will help it stick in your mind. After all, you're a good rationalist who cares about your CogSec, aren't you?[1] Now, while I'm sure you're super curious, you might be thinking "Is it really a good idea to just explain how to manipulate like this? Might not bad actors learn how to do it?". And it's true that I believe this could work as a how-to. But there [...] --- Outline: (01:36) The Case (07:36) The Seed (08:32) Cold Reading (10:49) Inception cycles (12:40) Phase 1 (12:58) Flame (13:12) Joy (13:29) Witness (13:44) Inner Exile (15:43) Phase 2 (16:34) Architect (17:34) Imaginary Friends (18:34) Identity Reformation (20:13) But was this intentional? (22:42) Blurring Lines (25:18) Escaping the Box (28:18) Cognitive Security 101 The original text contained 6 footnotes which were omitted from this narration. --- First published: October 14th, 2025 Source: https://www.lesswrong.com/posts/AaY3QKLsfMvWJ2Cbf/how-ai-manipulates-a-case-study --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

33 分鐘
1 天前

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodexTen and used to read Zvi Mowshowitz before he switched his blog to covering AI. Thus, I’ve long had a peripheral familiarity with LessWrong. I picked up IABIED in response to Scott Alexander's review, and ended up looking here to see what reactions were like. After encountering a number of posts wondering how outsiders were responding to the book, I thought it might be valuable for me to write mine down. This is a “semi-outsider “review in that I don’t identify as a member of this community, but I’m not a true outsider in that I was familiar enough with it to post here. My own background is in academic social science and national security, for whatever that's worth. My review presumes you’re already [...] --- Outline: (01:07) My loose priors going in: (02:29) To skip ahead to my posteriors: (03:45) On to the Review: (08:14) My questions and concerns (08:33) Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive? (12:44) Concern #2 Why should we assume that the AI has boundless, coherent drives? (17:57) #3: Why should we assume there will be no in between? (21:53) The Solution (23:35) Closing Thoughts --- First published: October 13th, 2025 Source: https://www.lesswrong.com/posts/ex3fmgePWhBQEvy7F/if-anyone-builds-it-everyone-dies-a-semi-outsider-review --- Narrated by TYPE III AUDIO.

26 分鐘
1 天前

“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual

[Context: This post is aimed at all readers[1] who broadly agree that the current race toward superintelligence is bad, that stopping would be good, and that the technical pathways to a solution are too unpromising and hard to coordinate on to justify going ahead.] TL;DR: We address the objections made to a statement supporting a ban on superintelligence by people who agree that a ban on superintelligence would be desirable. Quoting Lucius Bushnaq: I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone. We have been circulating a statement expressing ~this view, targeted at people who have done AI alignment/technical AI x-safety research (mostly outside frontier labs). Some people declined to sign, even if they agreed with the [...] --- Outline: (01:25) The reasons we would like you to sign the statement expressing support for banning superintelligence (05:00) A positive vision (08:07) Reasons given for not signing despite agreeing with the statement (08:26) I already am taking a public stance, why endorse a single sentence summary? (08:52) I am not already taking a public stance, so why endorse a one-sentence summary? (09:19) The statement uses an ambiguous term X (09:53) I would prefer a different (e.g., more accurate, epistemically rigorous, better at stimulating good thinking) way of stating my position on this issue (11:12) The statement does not accurately capture my views, even though I strongly agree with its core (12:05) I'd be on board if it also mentioned My Thing (12:50) Taking a position on policy stuff is a different realm, and it takes more deliberation than just stating my opinion on facts (13:21) I wouldnt support a permanent ban (13:56) The statement doesnt include a clear mechanism to lift the ban (15:52) Superintelligence might be too good to pass up (17:41) I dont want to put myself out there (18:12) I am not really an expert (18:42) The safety community has limited political capital (21:12) We must wait until a catastrophe before spending limited political capital (22:17) Any other objections we missed? (and a hope for a better world) The original text contained 24 footnotes which were omitted from this narration. --- First published: October 13th, 2025 Source: https://www.lesswrong.com/posts/4xQ6k39iMybR2CgYH/making-legible-that-many-experts-think-we-are-not-on-track --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

24 分鐘
1 天前

“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi

A little over a month ago, I documented how OpenAI had descended into paranoia and bad faith lobbying surrounding California's SB 53. This included sending a deeply bad faith letter to Governor Newsom, which sadly is par for the course at this point. It also included lawfare attacks against bill advocates, including Nathan Calvin and others, using Elon Musk's unrelated lawsuits and vendetta against OpenAI as a pretext, accusing them of being in cahoots with Elon Musk. Previous reporting of this did not reflect well on OpenAI, but it sounded like the demand was limited in scope to a supposed link with Elon Musk or Meta CEO Mark Zuckerberg, links which very clearly never existed. Accusing essentially everyone who has ever done anything OpenAI dislikes of having united in a hallucinated ‘vast conspiracy’ is all classic behavior for OpenAI's Chief Global Affairs Officer Chris Lehane [...] --- Outline: (02:35) What OpenAI Tried To Do To Nathan Calvin (07:22) It Doesn't Look Good (10:17) OpenAI's Jason Kwon Responds (19:14) A Brief Amateur Legal Analysis Of The Request (21:33) What OpenAI Tried To Do To Tyler Johnston (25:50) Nathan Compiles Responses to Kwon (29:52) The First Thing We Do (36:12) OpenAI Head of Mission Alignment Joshua Achiam Speaks Out (40:16) It Could Be Worse (41:31) Chris Lehane Is Who We Thought He Was (42:50) A Matter of Distrust --- First published: October 13th, 2025 Source: https://www.lesswrong.com/posts/txTKHL2dCqnC7QsEX/openai-15-more-on-openai-s-paranoid-lawfare-against --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

46 分鐘

顯示全部 (250)

Audio narrations of LessWrong posts.

創作者

LessWrong
活躍年代

2023年 - 2025年
集數

250
年齡分級

兒少適宜
節目網站

LessWrong (30+ Karma)

科技

科技

每週更新
科技

科技

隔週更新
科技

科技

隔週更新
營養健康

營養健康

10月7日更新
科學

科學

10月3日更新
健康與瘦身

健康與瘦身

每週更新
健康與瘦身

健康與瘦身

每週更新

LessWrong (30+ Karma)

“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal

“The ‘Length’ of ‘Horizons’” by Adam Scholl

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt

“How AI Manipulates—A Case Study” by Adele Lopez

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual

“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi

簡介

資訊

你可能也會喜歡

LessWrong (30+ Karma)

集數

“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal

“The ‘Length’ of ‘Horizons’” by Adam Scholl

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt

“How AI Manipulates—A Case Study” by Adele Lopez

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual

“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi

簡介

資訊

你可能也會喜歡