The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
LW - AI Safety Seems Hard to Measure by HoldenKarnofsky
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety Seems Hard to Measure, published by HoldenKarnofsky on December 8, 2022 on LessWrong.
In previous pieces, I argued that there's a real and large risk of AI systems' developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening.
A young, growing field of AI safety research tries to reduce this risk, by finding ways to ensure that AI systems behave as intended (rather than forming ambitious aims of their own and deceiving and manipulating humans as needed to accomplish them).
Maybe we'll succeed in reducing the risk, and maybe we won't. Unfortunately, I think it could be hard to know either way. This piece is about four fairly distinct-seeming reasons that this could be the case - and that AI safety could be an unusually difficult sort of science.
This piece is aimed at a broad audience, because I think it's important for the challenges here to be broadly understood. I expect powerful, dangerous AI systems to have a lot of benefits (commercial, military, etc.), and to potentially appear safer than they are - so I think it will be hard to be as cautious about AI as we should be. I think our odds look better if many people understand, at a high level, some of the challenges in knowing whether AI systems are as safe as they appear.
First, I'll recap the basic challenge of AI safety research, and outline what I wish AI safety research could be like. I wish it had this basic form: "Apply a test to the AI system. If the test goes badly, try another AI development method and test that. If the test goes well, we're probably in good shape." I think car safety research mostly looks like this; I think AI capabilities research mostly looks like this.
Then, I’ll give four reasons that apparent success in AI safety can be misleading.
“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem? Problem Key question Explanation The Lance Armstrong problem Did we get the AI to be actually safe or good at hiding its dangerous actions? The King Lear problem The lab mice problem Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities? The first contact problem
When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.”
When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?
It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't.
AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation.
Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans.
Like trying to study medicine in humans by experimenting only on lab mice.
Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?
AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the
EA - Monitoring & Evaluation Specialists – a new career path profile from Probably Good by Probably Good
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Monitoring & Evaluation Specialists – a new career path profile from Probably Good, published by Probably Good on December 8, 2022 on The Effective Altruism Forum.
Probably Good is excited to share a new path profile for careers in monitoring and evaluation. Below, we’ve included a few excerpts from the full profile.
Monitoring and evaluation (M&E) specialists collect, track, and analyze data to assess the value and impact of different programs and interventions, as well as translate these assessments into actionable insights and strategies to increase the impact of an organization.
M&E specialist careers might be a promising option for some people. If you’re an exceptional fit, especially if you’re based in a low- or middle-income country where there’s lots of scope for implementing global health and development interventions, then it may be worth considering these careers.
However, the impact you’ll be able to have will be determined in large part by the organization you enter – making it particularly important to seek out the best organizations and avoid those that only superficially care about evaluating their impact. Additionally, if you’re a good fit for some of the top roles in this path, it’s likely you’ll also be a good fit for other highly impactful roles, so we’d recommend you consider other paths, too.
How promising is this path?
Monitoring and evaluation is important for any organization aiming to have an impact. Without collecting evidence and data, it’s easy to seem like an intervention or program is having an impact, even when it’s not. Here are a few ways in which M&E might be able to generate impact:
Discover effective interventions that do a lot of good. For example, rigorous evaluation by J-PAL affiliates and Evidence Action found that placing chlorine-treated water dispensers in rural African villages reduced under-5 child mortality by as much as 63%. Evidence Action has now pledged to double the size of its water-treatment program, reaching 9 million people.
Make improvements to known effective interventions. Improving the efficacy of an already-impactful intervention by even a little bit can generate a large impact, especially if the intervention is rolled out on a large scale. Consider this study run by malaria charity TAMTAM, which found that charging even a nominal price for malaria bednets decreased demand by up to 60%, leading a number of large organizations to offer them for free instead.
Identify ineffective or harmful interventions, so that an organization can change course. A great example of this is animal advocacy organization the Humane League, which determined that their current strategy of performing controversial public stunts was ineffective, and pivoted its strategy towards corporate campaigns. In doing so, they convinced Unilever to stop killing male chicks, saving millions of baby chicks from gruesome deaths.
Clear links to effectiveness - Because M&E is explicitly concerned with measuring the impact of interventions, there’s often a clear “theory of change” for how your work might translate into positive impact.
Leverage - If you’re working in a large organization, or working on an intervention with a large pool of potential funders and implementers, your work can influence where large amounts of money is spent, or how large amounts of other resources are distributed.
Flexible skill set - the skills and qualifications you’ll need for a career in M&E are robustly useful across a range of careers. As such, it’s likely that M&E work will provide you with flexible career capital for pursuing other paths.
Narrow range of cause areas - Within our top recommended cause areas, there are far more M&E roles within global health and development than the others. This means M
EA - SFF is doubling speculation (rapid) grant budgets; FTX grantees should consider applying by JueYan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SFF is doubling speculation (rapid) grant budgets; FTX grantees should consider applying, published by JueYan on December 8, 2022 on The Effective Altruism Forum.
The Survival and Flourishing Fund (SFF) funds many longtermist, x-risk, and meta projects, and has distributed $18mm YTD. While SFF’s focus areas are similar to those of the FTX Future Fund, SFF has received few applications since the latest round closed in August.
This is a reminder that projects can apply to be considered for expedited speculation grants at any time. Speculation grants can be approved in days and paid out as quickly as within a month. Past speculation grants have ranged from $10,000 to $400,000, and applicants for speculation grants will automatically be considered for the next main SFF round. In response to the recent extraordinary need, Jaan Tallinn, the main funder of SFF, is doubling speculation budgets. Grantees impacted by recent events should apply.
SFF funds charities and projects hosted by organizations with charity status. You can get a better idea of SFF’s scope from its website and its recent grants. I encourage relevant grantees to consider applying to SFF, in addition to the current array of efforts led by Open Phil, Mercatus, and Nonlinear.
For general information about the Survival and Flourishing Fund, see:/
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
AF - Notes on OpenAI’s alignment plan by Alex Flint
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes on OpenAI’s alignment plan, published by Alex Flint on December 8, 2022 on The AI Alignment Forum.
This work was supported by the Monastic Academy for the Preservation of Life on Earth.
OpenAI has published a plan for producing aligned AI systems. Very roughly, their plan is to build language models capable of producing human-level alignment research. The output of these models might be natural language essays about alignment, or code that directly implements machine learning systems. In either case, human researchers would spend their time reviewing machine-generated alignment research. Their research agenda goes as follows.
Step 1: Train language models to solve useful problems using reinforcement learning. Rather than having a human review each attempt the model makes, which would be very time-consuming, train a second model that watches a few human reviews and learns to predict what reward the human would have given. Then train the main reinforcement learning model using this second model to provide a lot of feedback. You still need a human to review this second model every so often to make sure it has learned the right thing, but overall it requires less human effort than having a human review every attempt that the reinforcement learning model makes.
Step 2: When tasks become so complex that humans struggle to assess the outputs of the model, use some combination of the following:
Train models that critique their own outputs or the outputs of other models, possibly requiring the initial model to expand its reasoning around certain points in its original output. ("debate")
Train auxiliary models that help humans work out what reward to give to the primary model by solving more tractable sub-problems. In assessing the validity of a philosophy essay, you might train an auxiliary model to identify which paragraphs contain the key reasoning steps, and then use this model as a tool when assessing the philosophy essays. It might be that the sub-problems are themselves very difficult to provide training data for, in which case you can build models to solve sub-sub-problems that help humans to assess the output of the models targeting the sub-problems, and so on recursively. ("Recursive reward modeling")
Train a model that is general-purpose enough that it can itself do useful work to assist a human generating training data for the model’s own training. Each time you train a new version of the model, give that model to humans to use as a tool in generating more training data, and then retrain the model on that improved training data. In training an internet-search assistant, for example, you might have humans use the previous version of the assistant to answer sub-questions that are relevant to assessing a top-level answer or answering a top-level question. If this produces better training data than in the previous iteration then you may also be able to train a better internet-search assistant using this improved training data. You can repeat this process for as long as the thing keeps improving. ("Iterated amplification")
Step 3: use the above to build a language model capable of producing human-level alignment research. Presumably we would put in prompts like "what are the governing dynamics of agency in intelligent systems?" or "how can we avoid mesa-optimization in supervised learning?" and the thing would produce essays containing insights that clarify the alignment problem for us.
There is a bit of a leap from "fine-tune a language model" to "produce human-level alignment research". Let’s say we’re training a language model to write insightful essays about alignment. We begin with some language model that has been pre-trained using, say, text from the internet. We give it a prompt such as "what would it mean to measure degrees of agency in
EA - Presenting: 2022 Incubated Charities (Charity Entrepreneurship) by KarolinaSarek
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Presenting: 2022 Incubated Charities (Charity Entrepreneurship), published by KarolinaSarek on December 8, 2022 on The Effective Altruism Forum.
We are proud to announce that 5 new charitable organizations have been launched from our June-August 2022 Incubation Program. Nine high-potential individuals graduated from our two-month intensive training program. The Incubation Program has been designed to teach participants everything they need to know to launch and run an extremely effective, high-impact charity. From analyzing the cost-effectiveness of an intervention, all the way to crafting a proposal for funding, the program participants are equipped with the knowledge and confidence they need to see their chosen intervention become a reality. Eight have gone on to start new effective nonprofits focused on policy, mental health, family planning and EA meta cause areas, and one participant was hired by Charity Entrepreneurship as a Research Analyst. They will be joining our 2023 cohort.
Thanks to our generous CE Seed Network of funders, we have helped to secure $732,000 in funding for the organizations, and will further support them with mentorship, operational support, free co-working space in London, and access to a constantly growing entrepreneurial community of funders, advisors, interns and other charity founders.
The 2022 incubated charities are:
Center for Effective Aid Policy- identifying and promoting high-impact development policies and interventions.
Centre for Exploratory Altruism Research (CEARCH)- conducting cause prioritization research and outreach.
Maternal Health Initiative- producing transformative benefits to women’s health, agency, and income through increased access to family planning.
Kaya Guides- reducing depression and anxiety among youth in low-and middle-income countries.
Vida Plena- building strong mental health in Latin America.
CENTER FOR EFFECTIVE AID POLICY
Co-founders: Jacob Wood, Mathias BondeWebsite: aidpolicy.orgEmail address: firstname.lastname@example.org CE incubation grant: $170,000
Description of the intervention:
The Center for Effective Aid Policy will work on identifying and advocating for effective solutions in aid policy. This may include:
Increasing international development aid
Increasing budget allocation to specific effective development programs
Introducing new effective development interventions into aid budgets
Revising processes which result in improved development aid effectiveness
Background of the intervention:
$179 billion was spent on development aid in 2021 - that is roughly 240x the amount of money that GiveWell has moved since 2009. While well-intentioned, there is a broad consensus among experts, think tanks, and implementing partners alike that aid effectiveness can be vastly improved.
The Center for Effective Aid Policy believes tractable interventions exist in the development aid space that will result in improved aid spending and better outcomes for its recipients. You can read more in their recent EA Forum post.
In 2022-2023, The Center for Effective Aid Policy will identify policy windows and formulate impactful and practical-to-implement policies, which they will advocate to governments and NGOs. They conservatively estimate their chances of advocacy success at $5.62 per DALY - more than an order of magnitude higher than multiple GiveWell-recommended charities.
CENTRE FOR EXPLORATORY ALTRUISM RESEARCH (CEARCH)
Founder: Joel TanWebsite: exploratory-altruism.orgContact: incubation grant: $100,000
Description of the intervention:
CEARCH conducts cause prioritization research and outreach - identifying the most important problems in the world and directing resources towards solving them, so as to maximize global welfare.
Background of the intervention:
There are many potential cause areas (e.g
LW - Machine Learning Consent by jefftk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Machine Learning Consent, published by jefftk on December 8, 2022 on LessWrong.
For years, researchers have trained machine learning systems on whatever data they could find. People mostly haven't cared about this or paid attention, I think because the systems hadn't been very good. Recently, however, some very impressive systems have come out, including ones that
complete code, and
Because these are so capable a lot more people are paying attention now, and there are big questions around whether it's ok that these systems were trained this way. Code that I uploaded to GitHub and the writing that I've put into this blog went into training these models: I didn't give permission for this kind of use, and no one asked me if it was ok. Doesn't this violate my copyrights?
The machine learning community has generally assumed that training models on some input and using it to generate new output is legal, as long as the output is sufficiently different from the input. This relies on the doctrine of "fair use", which does not require any sort of permission from the original author as long as it is sufficiently "transformative". For example, if I took a book and replaced every instance of the main characters name with my own I doubt any court would consider that sufficiently transformative, and so my book would be considered a "derivative work" of the original book. On the other hand, if I took the words in the book and painstakingly reordered them to tell a completely unrelated story, there's a sense in which my book was "derived" from the original one but I think it would pretty clearly be transformative enough that I wouldn't need any permission from the copyright holder.
These models can be used to create things that are clearly derivative works of their input. For example, people very quickly realized that Copilot would complete the code for Greg Walsh's fast inverse square root implementation verbatim, and if you ask any of the image generators for the Mona Lisa or Starry Night you'll get something close enough to the original that it's clearly a knock-off. This is a major issue with current AI systems, but it's also a relatively solvable one. It's already possible to slowly check that the output doesn't excessively resemble any input, and I think it's likely they'll soon figure out how to do that efficiently. On the other hand, all of the examples of this I've seen (and I just did some
looking) have been people trying to elicit plagiarism.
The normal use case is much more interesting, and more controversial. While the transformative fair use justification I described above is widely assumed within the machine learning community as far as I can tell it hasn't been tested in court. There is currently a large class action lawsuit over Copilot, and it's possible this kind of usage will turn out not qualify. Speculating, I think it's pretty unlikely that the suit will succeed, but I've created a prediction market on it to gather information:
Aside from the legal question, however, there is also a moral or social question: is it ok to train a model on someone's work without their permission? What if this means that they and others in their profession are no longer able to earn a living?
On the second question, you could imagine someone creating a model where they used only data that was either in the public domain or which they'd purchased appropriate licenses for. While that's great for the particular people who agree and get paid, a much larger number would still be out of work without compensation. I do think there's potentially quite a bad situation, where as these systems get better more and more people are unable to add much over an automated system, and we get massive technological unemployment. Now, historically worries