本期的 15 篇论文如下:
[00:22] 🧠 From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models(从分数到技能:金融大语言模型认知诊断评估框架)
[00:49] ✅ DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization(DuPO:通过双重偏好优化实现大模型可靠自验证)
[01:17] 🔮 FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction(FutureX:面向LLM智能体未来预测的先进实时基准)
[01:44] 🏗 MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds(MeshCoder:LLM赋能的点云结构化网格代码生成)
[02:14] 🪄 Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization(Tinker:扩散模型赋能3D——从稀疏输入实现多视角一致性编辑,无需逐场景优化)
[02:40] 🤖 From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery(从科学AI到具身科学:自主科学发现综述)
[03:06] ⚙ Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs(量化技术邂逅扩散大语言模型:扩散大语言模型后训练量化系统性研究)
[03:37] 🛠 MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers(MCP-Universe:基于真实世界模型上下文协议服务器的大语言模型基准测试)
[04:12] ⚡ NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model(NVIDIA Nemotron Nano 2:一个准确高效的混合Mamba-Transformer推理模型)
[04:45] 🤖 RynnEC: Bringing MLLMs into Embodied World(RynnEC:将多模态大语言模型引入具身世界)
[05:12] ⚖ On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting(在线强化学习与离线专家融合:通过动态加权协调监督微调与强化学习)
[05:41] 🧐 ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?(ViExam:视觉语言模型在越南语多模态考试题上能否超越人类?)
[06:08] ⚡ Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell(Leuvenshtein: 基于FHE的高效编辑距离计算,每单元单次自举)
[06:40] 📏 Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer(基于潜在深度平衡规范器的局部尺度等变性)
[07:06] 🤔 mSCoRe: a $M$ultilingual and Scalable Benchmark for $S$kill-based $Co$mmonsense $Re$asoning(mSCoRe: 一个多语言、可扩展的基于技能的常识推理基准)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
Информация
- Подкаст
- ЧастотаЕжедневно
- Опубликовано22 августа 2025 г. в 00:00 UTC
- Длительность8 мин.
- ОграниченияБез ненормативной лексики