242 episodes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

The Nonlinear Library: Alignment Forum Top Posts The Nonlinear Fund

    • Education

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.

    Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger, Eliezer Yudkowsky

    Discussion with Eliezer Yudkowsky on AGI interventions by Rob Bensinger, Eliezer Yudkowsky

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: Discussion with Eliezer Yudkowsky on AGI interventions, published by Rob Bensinger, Eliezer Yudkowsky on the AI Alignment Forum.
    The following is a partially redacted and lightly edited transcript of a chat conversation about AGI between Eliezer Yudkowsky and a set of invitees in early September 2021. By default, all other participants are anonymized as "Anonymous".
    I think this Nate Soares quote (excerpted from Nate's response to a report by Joe Carlsmith) is a useful context-setting preface regarding timelines, which weren't discussed as much in the transcript:
    [...] My odds [of AGI by the year 2070] are around 85%[...]
    I can list a handful of things that drive my probability of AGI-in-the-next-49-years above 80%:
    1. 50 years ago was 1970. The gap between AI systems then and AI systems now seems pretty plausibly greater than the remaining gap, even before accounting the recent dramatic increase in the rate of progress, and potential future increases in rate-of-progress as it starts to feel within-grasp.
    2. I observe that, 15 years ago, everyone was saying AGI is far off because of what it couldn't do -- basic image recognition, go, starcraft, winograd schemas, programmer assistance. But basically all that has fallen. The gap between us and AGI is made mostly of intangibles. (Computer Programming That Is Actually Good? Theorem proving? Sure, but on my model, "good" versions of those are a hair's breadth away from full AGI already. And the fact that I need to clarify that "bad" versions don't count, speaks to my point that the only barriers people can name right now are intangibles.) That's a very uncomfortable place to be!
    3. When I look at the history of invention, and the various anecdotes about the Wright brothers and Enrico Fermi, I get an impression that, when a technology is pretty close, the world looks a lot like how our world looks.
    Of course, the trick is that when a technology is a little far, the world might also look pretty similar!
    Though when a technology is very far, the world does look different -- it looks like experts pointing to specific technical hurdles. We exited that regime a few years ago.
    4. Summarizing the above two points, I suspect that I'm in more-or-less the "penultimate epistemic state" on AGI timelines: I don't know of a project that seems like they're right on the brink; that would put me in the "final epistemic state" of thinking AGI is imminent. But I'm in the second-to-last epistemic state, where I wouldn't feel all that shocked to learn that some group has reached the brink. Maybe I won't get that call for 10 years! Or 20! But it could also be 2, and I wouldn't get to be indignant with reality. I wouldn't get to say "but all the following things should have happened first, before I made that observation". I have made those observations.
    5. It seems to me that the Cotra-style compute-based model provides pretty conservative estimates. For one thing, I don't expect to need human-level compute to get human-level intelligence, and for another I think there's a decent chance that insight and innovation have a big role to play, especially on 50 year timescales.
    6. There has been a lot of AI progress recently. When I tried to adjust my beliefs so that I was positively surprised by AI progress just about as often as I was negatively surprised by AI progress, I ended up expecting a bunch of rapid progress. [...]
    Further preface by Eliezer:
    In some sections here, I sound gloomy about the probability that coordination between AGI groups succeeds in saving the world. Andrew Critch reminds me to point out that gloominess like this can be a self-fulfilling prophecy - if people think successful coordination is impossible, they won’t try to coordinate. I therefore remark in retrospective advance that it seems to me like at least some of the t

    • 55 min
    What failure looks like by Paul Christiano

    What failure looks like by Paul Christiano

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: What failure looks like, published by Paul Christiano on the AI Alignment Forum.
    The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
    I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:
    Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")
    Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons.)
    I think these are the most important problems if we fail to solve intent alignment.
    In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years.
    With fast enough takeoff, my expectations start to look more like the caricature---this post envisions reasonably broad deployment of AI, which becomes less and less likely as things get faster. I think the basic problems are still essentially the same though, just occurring within an AI lab rather than across the world.
    (None of the concerns in this post are novel.)
    Part I: You get what you measure
    If I want to convince Bob to vote for Alice, I can experiment with many different persuasion strategies and see which ones work. Or I can build good predictive models of Bob’s behavior and then search for actions that will lead him to vote for Alice. These are powerful techniques for achieving any goal that can be easily measured over short time periods.
    But if I want to help Bob figure out whether he should vote for Alice---whether voting for Alice would ultimately help create the kind of society he wants---that can’t be done by trial and error. To solve such tasks we need to understand what we are doing and why it will yield good outcomes. We still need to use data in order to improve over time, but we need to understand how to update on new data in order to improve.
    Some examples of easy-to-measure vs. hard-to-measure goals:
    Persuading me, vs. helping me figure out what’s true. (Thanks to Wei Dai for making this example crisp.)
    Reducing my feeling of uncertainty, vs. increasing my knowledge about the world.
    Improving my reported life satisfaction, vs. actually helping me live a good life.
    Reducing reported crimes, vs. actually preventing crime.
    Increasing my wealth on paper, vs. increasing my effective control over resources.
    It’s already much easier to pursue easy-to-measure goals, but machine learning will widen the gap by letting us try a huge number of possible strategies and search over massive spaces of possible actions. That force will combine with and amplify existing institutional and social dynamics that already favor easily-measured goals.
    Right now humans thinking and talking about the future they want to create are a powerful force that is able to steer our trajectory. But over time human reasoning will become weaker and weaker compared to new forms of reasoning honed by trial-and-error. Eventually our society’s trajectory will be determined by powerful optimization with easily-measurable goals rather than by human intentions about the future.
    We will try to harness this power by constructing proxies for what we care about, but over time those proxies will come apart:
    Corporations will deliver value to consumers as measured by

    • 14 min
    The Parable of Predict-O-Matic by Abram Demski

    The Parable of Predict-O-Matic by Abram Demski

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: The Parable of Predict-O-Matic, published by Abram Demski on the AI Alignment Forum.
    I've been thinking more about partial agency. I want to expand on some issues brought up in the comments to my previous post, and on other complications which I've been thinking about. But for now, a more informal parable. (Mainly because this is easier to write than my more technical thoughts.)
    This relates to oracle AI and to inner optimizers, but my focus is a little different.
    1
    Suppose you are designing a new invention, a predict-o-matic. It is a wonderous machine which will predict everything for us: weather, politics, the newest advances in quantum physics, you name it. The machine isn't infallible, but it will integrate data across a wide range of domains, automatically keeping itself up-to-date with all areas of science and current events. You fully expect that once your product goes live, it will become a household utility, replacing services like Google. (Google only lets you search the known!)
    Things are going well. You've got investors. You have an office and a staff. These days, it hardly even feels like a start-up any more; progress is going well.
    One day, an intern raises a concern.
    "If everyone is going to be using Predict-O-Matic, we can't think of it as a passive observer. Its answers will shape events. If it says stocks will rise, they'll rise. If it says stocks will fall, then fall they will. Many people will vote based on its predictions."
    "Yes," you say, "but Predict-O-Matic is an impartial observer nonetheless. It will answer people's questions as best it can, and they react however they will."
    "But --" the intern objects -- "Predict-O-Matic will see those possible reactions. It knows it could give several different valid predictions, and different predictions result in different futures. It has to decide which one to give somehow."
    You tap on your desk in thought for a few seconds. "That's true. But we can still keep it objective. It could pick randomly."
    "Randomly? But some of these will be huge issues! Companies -- no, nations -- will one day rise or fall based on the word of Predict-O-Matic. When Predict-O-Matic is making a prediction, it is choosing a future for us. We can't leave that to a coin flip! We have to select the prediction which results in the best overall future. Forget being an impassive observer! We need to teach Predict-O-Matic human values!"
    You think about this. The thought of Predict-O-Matic deliberately steering the future sends a shudder down your spine. But what alternative do you have? The intern isn't suggesting Predict-O-Matic should lie, or bend the truth in any way -- it answers 100% honestly to the best of its ability. But (you realize with a sinking feeling) honesty still leaves a lot of wiggle room, and the consequences of wiggles could be huge.
    After a long silence, you meet the interns eyes. "Look. People have to trust Predict-O-Matic. And I don't just mean they have to believe Predict-O-Matic. They're bringing this thing into their homes. They have to trust that Predict-O-Matic is something they should be listening to. We can't build value judgements into this thing! If it ever came out that we had coded a value function into Predict-O-Matic, a value function which selected the very future itself by selecting which predictions to make -- we'd be done for! No matter how honest Predict-O-Matic remained, it would be seen as a manipulator. No matter how beneficent its guiding hand, there are always compromises, downsides, questionable calls. No matter how careful we were to set up its values -- to make them moral, to make them humanitarian, to make them politically correct and broadly appealing -- who are we to choose? No. We'd be done for. They'd hang us. We'd be toast!"
    You realize at this point that you've stood up and start

    • 23 min
    What 2026 looks like by Daniel Kokotajlo

    What 2026 looks like by Daniel Kokotajlo

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: What 2026 looks like, published by Daniel Kokotajlo on the AI Alignment Forum.
    This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I'm posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)
    What’s the point of doing this? Well, there are a couple of reasons:
    Sometimes attempting to write down a concrete example causes you to learn things, e.g. that a possibility is more or less plausible than you thought.
    Most serious conversation about the future takes place at a high level of abstraction, talking about e.g. GDP acceleration, timelines until TAI is affordable, multipolar vs. unipolar takeoff. vignettes are a neglected complementary approach worth exploring.
    Most stories are written backwards. The author begins with some idea of how it will end, and arranges the story to achieve that ending. Reality, by contrast, proceeds from past to future. It isn’t trying to entertain anyone or prove a point in an argument.
    Anecdotally, various people seem to have found Paul Christiano’s “tales of doom” stories helpful, and relative to typical discussions those stories are quite close to what we want. (I still think a bit more detail would be good — e.g. Paul’s stories don’t give dates, or durations, or any numbers at all really.)[2]
    “I want someone to ... write a trajectory for how AI goes down, that is really specific about what the world GDP is in every one of the years from now until insane intelligence explosion. And just write down what the world is like in each of those years because I don't know how to write an internally consistent, plausible trajectory. I don't know how to write even one of those for anything except a ridiculously fast takeoff.” --Buck Shlegeris
    This vignette was hard to write. To achieve the desired level of detail I had to make a bunch of stuff up, but in order to be realistic I had to constantly ask “but actually though, what would really happen in this situation?” which made it painfully obvious how little I know about the future. There are numerous points where I had to conclude “Well, this does seem implausible, but I can’t think of anything more plausible at the moment and I need to move on.” I fully expect the actual world to diverge quickly from the trajectory laid out here. Let anyone who (with the benefit of hindsight) claims this divergence as evidence against my judgment prove it by exhibiting a vignette/trajectory they themselves wrote in 2021. If it maintains a similar level of detail (and thus sticks its neck out just as much) while being more accurate, I bow deeply in respect!
    I hope this inspires other people to write more vignettes soon. We at the Center on Long-Term Risk would like to have a collection to use for strategy discussions. Let me know if you’d like to do this, and I can give you advice & encouragement! I’d be happy to run another workshop.
    2022
    GPT-3 is finally obsolete. OpenAI, Google, Facebook, and DeepMind all have gigantic multimodal transformers, similar in size to GPT-3 but trained on images, video, maybe audio too, and generally higher-quality data.
    Not only that, but they are now typically fine-tuned in various ways--for example, to answer questions correctly, or produce engaging conversation as a chatbot.
    The chatbots are fun to talk to but erratic and ultimately considered shallow by intellectuals. They aren

    • 27 min
    Are we in an AI overhang? by Andy Jones

    Are we in an AI overhang? by Andy Jones

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: Are we in an AI overhang?, published by Andy Jones on the AI Alignment Forum.
    Over on Developmental Stages of GPTs, orthonormal mentions
    it at least reduces the chance of a hardware overhang.
    An overhang is when you have had the ability to build transformative AI for quite some time, but you haven't because no-one's realised it's possible. Then someone does and surprise! It's a lot more capable than everyone expected.
    I am worried we're in an overhang right now. I think we right now have the ability to build an orders-of-magnitude more powerful system than we already have, and I think GPT-3 is the trigger for 100x larger projects at Google, Facebook and the like, with timelines measured in months.
    Investment Bounds
    GPT-3 is the first AI system that has obvious, immediate, transformative economic value. While much hay has been made about how much more expensive it is than a typical AI research project, in the wider context of megacorp investment, its costs are insignificant.
    GPT-3 has been estimated to cost $5m in compute to train, and - looking at the author list and OpenAI's overall size - maybe another $10m in labour.
    Google, Amazon and Microsoft each spend about $20bn/year on R&D and another $20bn each on capital expenditure. Very roughly, it totals to $100bn/year. Against this budget, dropping $1bn or more on scaling GPT up by another factor of 100x is entirely plausible right now. All that's necessary is that tech executives stop thinking of natural language processing as cutesy blue-sky research and start thinking in terms of quarters-till-profitability.
    A concrete example is Waymo, which is raising $2bn investment rounds - and that's for a technology with a much longer road to market.
    Compute Cost
    The other side of the equation is compute cost. The $5m GPT-3 training cost estimate comes from using V100s at $10k/unit and 30 TFLOPS, which is the performance without tensor cores being considered. Amortized over a year, this gives you about $1000/PFLOPS-day.
    However, this cost is driven up an order of magnitude by NVIDIA's monopolistic cloud contracts, while performance will be higher when taking tensor cores into account. The current hardware floor is nearer to the RTX 2080 TI's $1k/unit for 125 tensor-core TFLOPS, and that gives you $25/PFLOPS-day. This roughly aligns with AI Impacts’ current estimates, and offers another >10x speedup to our model.
    I strongly suspect other bottlenecks stop you from hitting that kind of efficiency or GPT-3 would've happened much sooner, but I still think $25/PFLOPS-day is a lower useful bound.
    Other Constraints
    I've focused on money so far because most of the current 3.5-month doubling times come from increasing investment. But money aside, there are a couple of other things that could prove to be the binding constraint.
    Scaling law breakdown. The GPT series' scaling is expected to break down around 10k pflops-days (§6.3), which is a long way short of the amount of cash on the table.
    This could be because the scaling analysis was done on 1024-token sequences. Maybe longer sequences can go further. More likely I'm misunderstanding something.
    Sequence length. GPT-3 uses 2048 tokens at a time, and that's with an efficient encoding that cripples it on many tasks. With the naive architecture, increasing the sequence length is quadratically expensive, and getting up to novel-length sequences is not very likely.
    But there are a lot of plausible ways to fix that, and complexity is no bar AI. This constraint might plausibly not be resolved on a timescale of months, however.
    Data availability. From the same paper as the previous point, dataset size rises with the square-root of compute; a 1000x larger GPT-3 would want 10 trillion tokens of training data.
    It’s hard to find a good estimate on total-words-ever-written, but our library of

    • 7 min
    DeepMind: Generally capable agents emerge from open-ended play by Daniel Kokotajlo

    DeepMind: Generally capable agents emerge from open-ended play by Daniel Kokotajlo

    Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
    This is: DeepMind: Generally capable agents emerge from open-ended play, published by Daniel Kokotajlo on the AI Alignment Forum.
    This is a linkpost for
    EDIT: Also see paper and results compilation video!
    Today, we published "Open-Ended Learning Leads to Generally Capable Agents," a preprint detailing our first steps to train an agent capable of playing many different games without needing human interaction data. ... The result is an agent with the ability to succeed at a wide spectrum of tasks — from simple object-finding problems to complex games like hide and seek and capture the flag, which were not encountered during training. We find the agent exhibits general, heuristic behaviours such as experimentation, behaviours that are widely applicable to many tasks rather than specialised to an individual task.
    The neural network architecture we use provides an attention mechanism over the agent’s internal recurrent state — helping guide the agent’s attention with estimates of subgoals unique to the game the agent is playing. We’ve found this goal-attentive agent (GOAT) learns more generally capable policies.
    Playing roughly 700,000 unique games in 4,000 unique worlds within XLand, each agent in the final generation experienced 200 billion training steps as a result of 3.4 million unique tasks. At this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human. And the results we’re seeing clearly exhibit general, zero-shot behaviour across the task space — with the frontier of normalised score percentiles continually improving.
    Looking qualitatively at our agents, we often see general, heuristic behaviours emerge — rather than highly optimised, specific behaviours for individual tasks. Instead of agents knowing exactly the “best thing” to do in a new situation, we see evidence of agents experimenting and changing the state of the world until they’ve achieved a rewarding state. We also see agents rely on the use of other tools, including objects to occlude visibility, to create ramps, and to retrieve other objects. Because the environment is multiplayer, we can examine the progression of agent behaviours while training on held-out social dilemmas, such as in a game of “chicken”. As training progresses, our agents appear to exhibit more cooperative behaviour when playing with a copy of themselves. Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently.
    My hot take: This seems like a somewhat big deal to me. It's what I would have predicted, but that's scary, given my timelines. I haven't read the paper itself yet but I look forward to seeing more numbers and scaling trends and attempting to extrapolate... When I do I'll leave a comment with my thoughts.
    EDIT: My warm take: The details in the paper back up the claims it makes in the title and abstract. This is the GPT-1 of agent/goal-directed AGI; it is the proof of concept. Two more papers down the line (and a few OOMs more compute), and we'll have the agent/goal-directed AGI equivalent of GPT-3. Scary stuff.
    Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

    • 3 min

Top Podcasts In Education

The Mel Robbins Podcast
Mel Robbins
The Jordan B. Peterson Podcast
Dr. Jordan B. Peterson
Mick Unplugged
Mick Hunt
Law of Attraction SECRETS
Natasha Graziano
TED Talks Daily
TED
The Rich Roll Podcast
Rich Roll