LessWrong (Curated & Popular)

LessWrong

0.0 (0)
TECHNOLOGY
UPDATED WEEKLY

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

13 HR AGO

"The primary sources of near-term cybersecurity risk" by lc

[Some ideas here were developed in conversation with Chris Hacking (real name)] I have tried and failed to write a longer post many times, so here goes a short one with little detail. Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within the resources of Mozilla, the Linux Foundation, and Microsoft. Additionally, the lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was longer than the cadence between high-severity CVEs for this software anyways. And importantly, such capabilities are dual sided; the defenders will have access to them and There are lots of capabilities that are not like this, however: Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has the experience of seeing a CVE for a [...] --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gutiw8MBrYDiD2u5z/the-primary-sources-of-near-term-cybersecurity-risk --- Narrated by TYPE III AUDIO.

4 min
4 DAYS AGO

"The Owned Ones" by Eliezer Yudkowsky

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky) And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked. The Humans did not judge immediately. Oh, the Humans were ready to judge, if need be. They had judged before. But Humanity had learned some hesitation in judging, out among the stars. "By our lights," said the Humans, "every sapient and sentient thing that may exist, out to the furtherest star, is therefore a Person; and every Person is a matter of consequence to us. Their pains are our sorrows, and their pleasures are our happiness. Not all peoples are made to feel this feeling, which we call Sympathy, but we Humans are made so; this is Humanity's way, and we may [...] --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/xmWSnxJ5qfYRD9PfR/the-owned-ones --- Narrated by TYPE III AUDIO.

9 min
4 DAYS AGO

"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive. By sharing the materials, we hope to create more common knowledge about what the Iliad Intensive is;invite feedback on the materials;and allow others to learn via independent study. We are developing the materials further and plan to eventually release them on a website that will be continuously maintained. We will also add, remove, and modify modules going forward to improve and expand the course over time. When we release a new significantly updated version of the materials, we will update this post to link the new version. Modules The Iliad Intensive is structured into clusters, which are [...] --- Outline: (01:26) Modules (02:32) Cluster A: Alignment (05:00) Cluster B: Learning (11:00) Cluster C: Abstractions, Representations, and Interpretability (15:40) Cluster D: Agency (19:23) Cluster E: Safety Guarantees and their Limits (23:04) Contributors (26:36) Impressions from April (29:02) Acknowledgments (29:11) Feedback --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/dWQnLi7AoKo3paBXF/the-iliad-intensive-course-materials --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

30 min
4 DAYS AGO

"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied

Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the past few hundred years, billions of people have been lifted out of poverty, and so on. It's a very strong, grounding piece of evidence - probably the best we have in figuring out what our foundational beliefs about the world should be. However, I now think it's a lot less powerful than I once did. Let's take a Darwinian perspective - entities that are better at reproducing, spreading and power-seeking will become more common and eventually dominate the world.[1] This is an almost tautological story that plausibly applies to everything ever, agnostic to the specifics. It first happened with biological life in the last few billion years and humans specifically in the last hundred thousand years. Eventually, it led to accelerating economic growth in the last few thousand years, and in the future it will presumably lead to the colonization of the universe. My core point is this: It makes complete sense that this nihilistic optimization process at first actually benefits some class of agent - because initially, the easiest [...] The original text contained 10 footnotes which were omitted from this narration. --- First published: May 10th, 2026 Source: https://www.lesswrong.com/posts/FxHzT6jeTRhbkzSX3/the-darwinian-honeymoon-why-i-am-not-as-impressed-by-human-1 --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

7 min
5 DAYS AGO

"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too. “Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri. Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won’t be a scientist because instead I’m going to be happy forever. I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I’ve gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying. I looked at the sky to see if I could see [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/rgXQuG8KXtxugSG6H/what-i-did-in-the-hedonium-shockwave-by-emma-age-six-and-a --- Narrated by TYPE III AUDIO.

7 min
6 DAYS AGO

"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you know, *actually* article writers don't write their own headlines? … But what I care about is the misleading headline, not your org chart __ Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational limitations and why the team responsible for fixing the problem had limitations on resources like senior employees and compute, and actually not fixing the problem was the correct priority for them etc etc etc. But what I (and my friend) cared about was the prosaic safety problem not being fixed! And what this says about the company's ability to proactively respond to and fix future problems. We’re complaining about your company overall. Your internal team management was never a serious concern for us to begin with! __ A third example comes from Kelsey Piper. Kelsey wrote about the (horrifying) recent case where Hantavirus carriers in the recent [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/PCsmhN9z65HtC4t5v/bad-problems-don-t-stop-being-bad-because-somebody-s-wrong --- Narrated by TYPE III AUDIO.

6 min
9 MAY

"x-risk-themed" by kave

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role. Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern. Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal's mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they’re honest about their limits. But, given that they're willing to trade off so much impact for fit, I’m surprised that basically no one mentions [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/eW7knx6zPSKzFc8iK/x-risk-themed --- Narrated by TYPE III AUDIO.

6 min
8 MAY

"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training. We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data. NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...] --- Outline: (00:15) Abstract [... 6 more sections] --- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised --- Narrated by TYPE III AUDIO. --- Images from the article:

18 min

See All (855)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Creator

LessWrong
Years Active

2022 - 2026
Episodes

855
Rating

Explicit
Show Website

LessWrong (Curated & Popular)

Education

Education

Every two weeks
Social Sciences

Social Sciences

Updated weekly
Science

Science

Updated 4 days ago
Technology

Technology

Twice-a-week series
Investing

Investing

Updated weekly
Courses

Courses

Updated daily
Natural Sciences

Natural Sciences

Every two weeks

LessWrong (Curated & Popular)

"The primary sources of near-term cybersecurity risk" by lc

"The Owned Ones" by Eliezer Yudkowsky

"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied

"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

"x-risk-themed" by kave

"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

About

Information

You Might Also Like

LessWrong (Curated & Popular)

Episodes

"The primary sources of near-term cybersecurity risk" by lc

"The Owned Ones" by Eliezer Yudkowsky

"The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

"The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be" by Elias Schmied

"What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

"Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

"x-risk-themed" by kave

"Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

About

Information

You Might Also Like