Arxiv paper - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
In this episode, we discuss DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning by DeepSeek-AI. The paper introduces DeepSeek-R1-Zero, a reasoning model trained solely with large-scale reinforcement learning, which exhibits strong reasoning abilities but struggles with readability and language mixing. To overcome these limitations, the authors developed DeepSeek-R1 by adding multi-stage training and cold-start data, achieving performance on par with OpenAI’s models. Additionally, they open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models to support the research community.
Thông Tin
- Chương trình
- Tần suấtHằng ngày
- Đã xuất bản17:24 UTC 27 tháng 1, 2025
- Thời lượng5 phút
- Tập1,5 N
- Xếp hạngSạch