AI可可AI生活

[人人能懂] 从数据纯度、反馈标尺到心智公理

你是否想过,AI变聪明的速度,竟取决于数据里有多少“废话”?我们一句模糊的好评,又如何能变成让AI精准执行的指令?本期节目,我们将看到AI如何跳出经验的牢笼、自己悟出近道,并学会看人下菜碟,进化出因事而异的“情商”。我们甚至会揭示,洞察AI心思的终极难题,如何被巧妙地拆解成一道简单的计算题。准备好,和我一起探索这些最新论文背后的深刻智慧吧!

00:00:35 AI变聪明的秘密:不是模型有多神,而是数据里有多少“废话”

00:06:32 AI训练的两难困境:要么说不清,要么管太窄

00:12:11 AI导航升级:如何用“笨”数据,教出“聪明”的活地图?

00:18:03 AI的“情商”进化:怎么做到该一样时一样,该不同时不同?

00:23:45 猜心思的最高境界:把它变成一道简单计算题

本期介绍的几篇论文:

[LG] Scaling Laws are Redundancy Laws  

[Georgia Institute of Technology]  

https://arxiv.org/abs/2509.20721 

---

[CL] RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards  

[NVIDIA]  

https://arxiv.org/abs/2509.21319 

---

[LG] Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations  

[UC Berkeley & Princeton University]  

https://arxiv.org/abs/2509.20478 

---

[CL] LLM Output Homogenization is Task Dependent  

[FAIR at Meta]  

https://arxiv.org/abs/2509.21267 

---

[LG] Inverse Reinforcement Learning Using Just Classification and a Few Regressions  

[University of Washington & Netflix]  

https://arxiv.org/abs/2509.21172