10月24日
第 1 季，第 88 集
41 分钟

Training Agents with Reinforcement Learning: Kyle Corbitt

In this episode, we speak with Kyle Corbitt, co-founder and CEO of OpenPip, recently acquired by CoreWeave, to explore the evolving role of reinforcement learning in building smarter, more reliable AI agents. Kyle shares the journey of OpenPipe from supervised fine-tuning to developing ART (Agent Reinforcement Trainer), their open-source RL toolkit designed to train AI agents that can think, adapt, and perform with greater autonomy. The discussion spans technical insights, practical applications, startup lessons from YC’s Startup School, and the future of agent-based AI systems. Key Topics Covered: Why reinforcement learning is gaining attention in modern Agent development The transition from supervised fine-tuning (SFT) to reinforcement learning (RL) Practical differences between RL and SFT, including weight movement and model reliability OpenPipe’s approach with ART: supporting multi-turn agent training and tool use How ART differs from OpenAI’s RFT implementation The importance of consistent agent behavior in production and how RL helps Avoiding reward hacking and the role of Ruler, OpenPipe’s LLM-based judging system Cost-efficiency strategies in RL training using serverless infrastructure OpenPipe’s long-term vision for self-improving agents Advice for AI startup founders on building in a rapidly evolving ecosystem Memorable Outtakes: On why reinforcement learning matters now: "What RL does… is it actually lets you solve the reliability problem. You can make your smaller model significantly more reliable—even more performant in your domain—through RL." On OpenAI's RFT vs. OpenPipe’s ART: "What you're missing [with OpenAI's RFT API] is the ability to define custom tools and multi-turn agent behavior. With ART, you own the code, define the tools, and have full control of the reward signal." Guest information, Mentioned Tools and Projects: Kyle Corbitt: https://www.linkedin.com/in/kcorbitt/ OpenPipe: https://openpipe.ai OpenPipe GitHub: https://github.com/OpenPipe/OpenPipe ART (Agent Reinforcement Trainer): https://github.com/OpenPipe/ART Ruler: LLM-based evaluation tool for training agents (part of the ART package): https://openpipe.ai/blog/ruler DeepSeek R1: https://api-docs.deepseek.com/ OpenAI RFT: https://platform.openai.com/docs/guides/reinforcement-fine-tuning SkyPilot: scale AI workloads https://github.com/skypilot-org/skypilot Notable Blog Posts: Ruler (LLM-as-a-Judge): https://www.openpipe.ai/blog/ruler Reward Hacking Post: https://www.openpipe.ai/blog/reward-hacking Serverless RL Backend: https://www.openpipe.ai/blog/serverless-rl ART Trainer: A New RL Trainer for Agents: https://openpipe.ai/blog/art-trainer-a-new-rl-trainer-for-agents Sponsored by: 🔥 ODSC AI West 2025 – The Leading AI Training Conference Join us in San Francisco from October 28th–30th for expert-led sessions on generative AI, LLMOps, and AI-driven automation. Use the code podcast for 10% off any ticket. Learn more: https://odsc.ai

单集网页

节目

ODSC's Ai X Podcast
频率

一周一更
发布时间

2025年10月24日 UTC 04:00
季

1
单集

88

Training Agents with Reinforcement Learning: Kyle Corbitt

信息