LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

  1. -1 H

    “Finding Features in Neural Networks with the Empirical NTK” by jylin04

    Audio note: this article contains 63 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Summary Kernel regression with the empirical neural tangent kernel (eNTK) gives a closed-form approximation to the function learned by a neural network in parts of the model space. We provide evidence that the eNTK can be used to find features in toy models for interpretability. We show that in Toy Models of Superposition and a MLP trained on modular arithmetic, the eNTK eigenspectrum exhibits sharp cliffs whose top eigenspaces align with the ground-truth features. Moreover, in the modular arithmetic experiment, the evolution of the eNTK spectrum can be used to track the grokking phase transition. These results suggest that eNTK analysis may provide a new practical handle for feature discovery and for detecting phase changes in small models. [...] --- Outline: (00:23) Summary (01:25) Background (04:57) Results (05:10) Toy models of Superposition (06:34) Modular arithmetic (10:00) Next steps The original text contained 9 footnotes which were omitted from this narration. --- First published: October 16th, 2025 Source: https://www.lesswrong.com/posts/cpFqDDjhvhbaoyHnd/finding-features-in-neural-networks-with-the-empirical-ntk-1 --- Narrated by TYPE III AUDIO. --- Images from the article: __T3A_INLINE_LATEX_PLACEHOLDER___(n,m,S,I)___T3A_INLINE_LATEX_END_PLACEHOLDER__ hyperparameters as above." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    11 min
  2. -1 J

    “Rogue internal deployments via external APIs” by Fabien Roger, Buck

    Once AI companies build powerful AIs, they may: Give internal AIs access to sensitive internal privileges (e.g. access to the internal infra that touches model weights, help with training of the next generation of AIs, …); Have mitigations specific to those internal deployments enforced via monitoring of the internal LLM API; Have an external API deployment without those mitigations. In this situation, an internal AI may build an agent scaffold that makes calls to the external API and uses the internal sensitive privileges. Such agent scaffold would be in a better position to cause a catastrophe because it would not be subject to the same monitoring as the internal API. I call this a rogue internal deployment via external APIs. (It is “internal” because the model weights and the agent scaffold never leave the cluster.) I think preventing those is similarly important from a misalignment perspective as preventing [...] --- Outline: (01:43) Rogue internal deployments via external APIs (03:29) A variation: rogue internal deployments via cross-company APIs (04:28) A possible mitigation: preventing the creation of scaffolds that use external LLM APIs via monitoring (06:29) Why I am more pessimistic about other solutions (06:34) Monitoring the external API (08:47) Preventing access to external APIs (10:22) Monitoring access to sensitive permissions (10:57) Final thoughts --- First published: October 15th, 2025 Source: https://www.lesswrong.com/posts/fqRmcuspZuYBNiQuQ/rogue-internal-deployments-via-external-apis --- Narrated by TYPE III AUDIO. --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    12 min

À propos

Audio narrations of LessWrong posts.

Vous aimeriez peut‑être aussi