AI可可AI生活

[人人能懂] 从打草稿、看地图到听取“少数派报告”

你有没有想过,AI解决难题,是靠“刷题”蒙对答案,还是真的理解了过程?在本期节目中,我们将看到最新论文如何教会AI养成“打草稿”的思考习惯,又如何在没有标准答案时,学会倾听宝贵的“少数派声音”。让我们一起探索,AI如何从一个“会说话的机器”进化为一个真正的“思考者”。

00:00:29 AI如何学会思考?一个奖励机制的悄然革命

00:05:15 高手过招,如何不“钻牛角尖”?

00:09:45 AI的集体智慧:当少数派报告比多数票更重要

00:15:11 AI换个思路看世界:当化学家扔掉“说明书”之后

00:21:15 好模型,不只看结果,更要看过程

本期介绍的几篇论文:

[LG] RLP: Reinforcement as a Pretraining Objective  

[NVIDIA & CMU]  

https://arxiv.org/abs/2510.01265 

---

[LG] RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems  

[CMU & Stanford University]  

https://arxiv.org/abs/2510.02263 

---

[CL] RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization  

[Iowa State University & Meta & UW–Madison]  

https://arxiv.org/abs/2510.02172 

---

[LG] Transformers Discover Molecular Structure Without Graph Priors  

[UC Berkeley]  

https://arxiv.org/abs/2510.02259 

---

[LG] Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models  

[CMU]  

https://arxiv.org/abs/2510.01544