你有没有想过,AI解决难题,是靠“刷题”蒙对答案,还是真的理解了过程?在本期节目中,我们将看到最新论文如何教会AI养成“打草稿”的思考习惯,又如何在没有标准答案时,学会倾听宝贵的“少数派声音”。让我们一起探索,AI如何从一个“会说话的机器”进化为一个真正的“思考者”。
00:00:29 AI如何学会思考?一个奖励机制的悄然革命
00:05:15 高手过招,如何不“钻牛角尖”?
00:09:45 AI的集体智慧:当少数派报告比多数票更重要
00:15:11 AI换个思路看世界:当化学家扔掉“说明书”之后
00:21:15 好模型,不只看结果,更要看过程
本期介绍的几篇论文:
[LG] RLP: Reinforcement as a Pretraining Objective
[NVIDIA & CMU]
https://arxiv.org/abs/2510.01265
---
[LG] RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
[CMU & Stanford University]
https://arxiv.org/abs/2510.02263
---
[CL] RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization
[Iowa State University & Meta & UW–Madison]
https://arxiv.org/abs/2510.02172
---
[LG] Transformers Discover Molecular Structure Without Graph Priors
[UC Berkeley]
https://arxiv.org/abs/2510.02259
---
[LG] Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models
[CMU]
https://arxiv.org/abs/2510.01544
Информация
- Подкаст
- ЧастотаЕжедневно
- Опубликовано4 октября 2025 г. в 00:27 UTC
- Длительность28 мин.
- ОграниченияБез ненормативной лексики