EA Forum Podcast (Curated & popular)

EA Forum Team

Audio narrations from the Effective Altruism Forum, including curated posts and posts with 125 karma. If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.

  1. 2D AGO

    “More EAs should consider working for the EU” by EU Policy Careers

    Context: The authors are a few EAs who currently work or have previously worked at the European Commission. In this post, we make the case that more people[1] aiming for a high impact career should consider working for the EU institutions[2] using the Importance, Tractability, Neglectedness framework, and; briefly outline how one might get started on this, highlighting a currently open recruitment drive (deadline 10 March) that only comes along once every ~5 years.Why working at the EU can be extremely impactfulImportance The EU adopts binding legislation for a continent of 450 million people and has a significant budget, making it an important player across different EA cause areas. Animal welfare[3] The EU sets welfare standards for the over 10 billion farmed animals slaughtered across the continent each year. The issue suffered a major setback in 2023, when the Commission, in the final steps of the process, dropped the ‘world's most comprehensive farm animal welfare reforms to date’, following massive farmers’ protests in Brussels. The reform would have included ‘banning cages and crates for Europe's roughly 300 million caged animals, ending the routine mutilation of perhaps 500 million animals per year, stopping the [...] --- Outline: (00:43) Why working at the EU can be extremely impactful (00:49) Importance (05:30) Tractability (07:22) Neglectedness (09:00) Paths into the EU --- First published: February 1st, 2026 Source: https://forum.effectivealtruism.org/posts/t23ko3x2MoHekCKWC/more-eas-should-consider-working-for-the-eu --- Narrated by TYPE III AUDIO.

    12 min
  2. 3D AGO

    [Linkpost] “Are the Costs of AI Agents Also Rising Exponentially?” by Toby_Ord

    This is a link post. There is an extremely important question about the near-future of AI that almost no-one is asking. We’ve all seen the graphs from METR showing that the length of tasks AI agents can perform has been growing exponentially over the last 7 years. While GPT-2 could only do software engineering tasks that would take someone a few seconds, the latest models can (50% of the time) do tasks that would take a human a few hours. As this trend shows no signs of stopping, people have naturally taken to extrapolating it out, to forecast when we might expect AI to be able to do tasks that take an engineer a full work-day; or week; or year. But we are missing a key piece of information — the cost of performing this work. Over those 7 years AI systems have grown exponentially. The size of the models (parameter count) has grown by 4,000x and the number of times they are run in each task (tokens generated) has grown by about 100,000x. AI researchers have also found massive efficiencies, but it is eminently plausible that the cost for the peak performance measured by METR has been [...] --- Outline: (13:02) Conclusions (14:05) Appendix (14:08) METR has a similar graph on their page for GPT-5.1 codex. It includes more models and compares them by token counts rather than dollar costs: --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/AbHPpGTtAMyenWGX8/are-the-costs-of-ai-agents-also-rising-exponentially Linkpost URL:https://www.tobyord.com/writing/hourly-costs-for-ai-agents --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    15 min
  3. 3D AGO

    [Linkpost] “Evidence that Recent AI Gains are Mostly from Inference-Scaling” by Toby_Ord

    This is a link post. In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction (pre-training) stalled out. Since late 2024, we’ve seen a new trend of using reinforcement learning (RL) in the second stage of training (post-training). Through RL, the AI models learn to do superior chain-of-thought reasoning about the problem they are being asked to solve. This new era involves scaling up two kinds of compute: the amount of compute used in RL post-training the amount of compute used every time the model answers a question Industry insiders are excited about the first new kind of scaling, because the amount of compute needed for RL post-training started off being small compared to the tremendous amounts already used in next-token prediction pre-training. Thus, one could scale the RL post-training up by a factor of 10 or 100 before even doubling the total compute used to train the model. But the second new kind of scaling is a problem. Major AI companies were already starting to spend more compute serving their models to customers than in the training [...] --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/5zfubGrJnBuR5toiK/evidence-that-recent-ai-gains-are-mostly-from-inference Linkpost URL:https://www.tobyord.com/writing/mostly-inference-scaling --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    10 min
  4. 3D AGO

    [Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord

    This is a link post. The new scaling paradigm for AI reduces the amount of information a model can learn from per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling. The last year has seen a massive shift in how leading AI models are trained. 2018–2023 was the era of pre-training scaling. LLMs were primarily trained by next-token prediction (also known as pre-training). Much of OpenAI's progress from GPT-1 to GPT-4, came from scaling up the amount of pre-training by a factor of 1,000,000. New capabilities were unlocked not through scientific breakthroughs, but through doing more-or-less the same thing at ever-larger scales. Everyone was talking about the success of scaling, from AI labs to venture capitalists to policy makers. However, there's been markedly little progress in scaling up this kind of training since (GPT-4.5 added one more factor of 10, but was then quietly retired). Instead, there has been a shift to taking one of these pre-trained models and further training it with large amounts of Reinforcement Learning (RL). This has produced models like OpenAI's o1, o3, and GPT-5, with dramatic improvements in reasoning (such as solving [...] --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models Linkpost URL:https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    15 min
  5. 3D AGO

    [Linkpost] “Inference Scaling Reshapes AI Governance” by Toby_Ord

    This is a link post. The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. Rapid scaling of inference-at-deployment would: lower the importance of open-weight models (and of securing the weights of closed models), reduce the impact of the first human-level models, change the business model for frontier AI, reduce the need for power-intense data centres, and derail the current paradigm of AI governance via training compute thresholds. Rapid scaling of inference-during-training would have more ambiguous effects that range from a revitalisation of pre-training scaling to a form of recursive self-improvement via iterated distillation and amplification. The end of an era — for both training and governance The intense year-on-year scaling up of AI training runs has been one of the most dramatic and stable markers of the Large Language Model era. Indeed it had been widely taken to be a permanent fixture of the AI landscape and the basis of many approaches to [...] --- Outline: (01:06) The end of an era -- for both training and governance (05:24) Scaling inference-at-deployment (06:42) Reducing the number of simultaneously served copies of each new model (08:45) Reducing the value of securing model weights (09:30) Reducing the benefits and risks of open-weight models (10:05) Unequal performance for different tasks and for different users (12:08) Changing the business model and industry structure (12:50) Reducing the need for monolithic data centres (17:16) Scaling inference-during-training (28:07) Conclusions (30:17) Appendix. Comparing the costs of scaling pre-training vs inference-at-deployment --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/RnsgMzsnXcceFfKip/inference-scaling-reshapes-ai-governance Linkpost URL:https://www.tobyord.com/writing/inference-scaling-reshapes-ai-governance --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    35 min
  6. 3D AGO

    [Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord

    This is a link post. Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work. METR's results on the length of tasks agents can reliably complete A recent paper by Kwa et al. (2025) from the research organisation METR has found an exponential trend in the duration of the tasks that frontier AI agents can [...] --- Outline: (05:33) Explaining these results via a constant hazard rate (14:54) Upshots of the constant hazard rate model (18:47) Further work (19:25) References --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3 Linkpost URL:https://www.tobyord.com/writing/half-life --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    20 min
  7. 3D AGO

    [Linkpost] “Inference Scaling and the Log-x Chart” by Toby_Ord

    This is a link post. Improving model performance by scaling up inference compute is the next big thing in frontier AI. But the charts being used to trumpet this new paradigm can be misleading. While they initially appear to show steady scaling and impressive performance for models like o1 and o3, they really show poor scaling (characteristic of brute force) and little evidence of improvement between o1 and o3. I explore how to interpret these new charts and what evidence for strong scaling and progress would look like. From scaling training to scaling inference The dominant trend in frontier AI over the last few years has been the rapid scale-up of training — using more and more compute to produce smarter and smarter models. Since GPT-4, this kind of scaling has run into challenges, so we haven’t yet seen models much larger than GPT-4. But we have seen a recent shift towards scaling up the compute used during deployment (aka 'test-time compute’ or ‘inference compute’), with more inference compute producing smarter models. You could think of this as a change in strategy from improving the quality of your employees’ work via giving them more years of training in which acquire [...] --- First published: February 2nd, 2026 Source: https://forum.effectivealtruism.org/posts/zNymXezwySidkeRun/inference-scaling-and-the-log-x-chart Linkpost URL:https://www.tobyord.com/writing/inference-scaling-and-the-log-x-chart --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    17 min

About

Audio narrations from the Effective Altruism Forum, including curated posts and posts with 125 karma. If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.