35 min

EA - Notes on risk compensation by trammell The Nonlinear Library

    • Education

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes on risk compensation, published by trammell on May 12, 2024 on The Effective Altruism Forum.
Introduction
When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously.
The phenomenon in general is therefore sometimes known as the "Peltzman Effect", though it is more often known as "risk compensation".[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2]
In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior.
There's no reason why risk compensation shouldn't apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3]
Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it "the dangerous valley problem".
There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms.
In a sense what they do goes well beyond this post, but as far as I'm aware none of their work dwells on what drives the logic of risk compensation even when there is only one firm, and it isn't designed to build intuition as simply as possible about when it should be expected to be a large or a small effect in general.
So the goal of this post is to do that, using x-risk from AI as the running example. It also introduces some economic intuitions around risk compensation which I found helpful and have not quite seen spelled out before (though they don't differ much in spirit from Appendix B of Peltzman's original paper).
Model
An AI lab's preferences
In this model, a deployed AI system either immediately causes an existential catastrophe or is safe. If it's safe, it increases the utility of the lab that deployed it. Referring to the event that it turns out to be safe as "survival", the expected utility of the lab is the product of two terms:
EUlab = (the probability of survival) (the lab's utility given survival).
That is, without loss of generality, the lab's utility level in the event of the catastrophe is denoted 0. Both terms are functions of two variables:
some index of the resources invested in safety work, denoted S0 ("safety work"), and
some index of how capable the AI is and/or how widely it's deployed, denoted C0 ("capabilities").
Utility given survival
Starting with the second term: we will say that the lab's utility given survival U(C)
a1. increases continuously and unboundedly in C and
a2. is independent of S. That is, given that survival was achieved, the lab does not care intrinsically about how much effort was put into safety.
Under these assumptions, we can posit, without loss of generality, that
U(C)=C+k
for some (not necessarily positive) constant k. If k is positive, the peop

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes on risk compensation, published by trammell on May 12, 2024 on The Effective Altruism Forum.
Introduction
When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously.
The phenomenon in general is therefore sometimes known as the "Peltzman Effect", though it is more often known as "risk compensation".[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2]
In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior.
There's no reason why risk compensation shouldn't apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3]
Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it "the dangerous valley problem".
There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms.
In a sense what they do goes well beyond this post, but as far as I'm aware none of their work dwells on what drives the logic of risk compensation even when there is only one firm, and it isn't designed to build intuition as simply as possible about when it should be expected to be a large or a small effect in general.
So the goal of this post is to do that, using x-risk from AI as the running example. It also introduces some economic intuitions around risk compensation which I found helpful and have not quite seen spelled out before (though they don't differ much in spirit from Appendix B of Peltzman's original paper).
Model
An AI lab's preferences
In this model, a deployed AI system either immediately causes an existential catastrophe or is safe. If it's safe, it increases the utility of the lab that deployed it. Referring to the event that it turns out to be safe as "survival", the expected utility of the lab is the product of two terms:
EUlab = (the probability of survival) (the lab's utility given survival).
That is, without loss of generality, the lab's utility level in the event of the catastrophe is denoted 0. Both terms are functions of two variables:
some index of the resources invested in safety work, denoted S0 ("safety work"), and
some index of how capable the AI is and/or how widely it's deployed, denoted C0 ("capabilities").
Utility given survival
Starting with the second term: we will say that the lab's utility given survival U(C)
a1. increases continuously and unboundedly in C and
a2. is independent of S. That is, given that survival was achieved, the lab does not care intrinsically about how much effort was put into safety.
Under these assumptions, we can posit, without loss of generality, that
U(C)=C+k
for some (not necessarily positive) constant k. If k is positive, the peop

35 min

Top Podcasts In Education

The Mel Robbins Podcast
Mel Robbins
The Jordan B. Peterson Podcast
Dr. Jordan B. Peterson
Mick Unplugged
Mick Hunt
TED Talks Daily
TED
Law of Attraction SECRETS
Natasha Graziano
جافکری | Jafekri
Amirali Gh