In this episode, we speak with Kyle Corbitt, co-founder and CEO of OpenPip, recently acquired by CoreWeave, to explore the evolving role of reinforcement learning in building smarter, more reliable AI agents. Kyle shares the journey of OpenPipe from supervised fine-tuning to developing ART (Agent Reinforcement Trainer), their open-source RL toolkit designed to train AI agents that can think, adapt, and perform with greater autonomy. The discussion spans technical insights, practical applications, startup lessons from YC’s Startup School, and the future of agent-based AI systems.
Key Topics Covered:
Why reinforcement learning is gaining attention in modern Agent development
The transition from supervised fine-tuning (SFT) to reinforcement learning (RL)
Practical differences between RL and SFT, including weight movement and model reliability
OpenPipe’s approach with ART: supporting multi-turn agent training and tool use
How ART differs from OpenAI’s RFT implementation
The importance of consistent agent behavior in production and how RL helps
Avoiding reward hacking and the role of Ruler, OpenPipe’s LLM-based judging system
Cost-efficiency strategies in RL training using serverless infrastructure
OpenPipe’s long-term vision for self-improving agents
Advice for AI startup founders on building in a rapidly evolving ecosystem
Memorable Outtakes:
On why reinforcement learning matters now:
"What RL does… is it actually lets you solve the reliability problem. You can make your smaller model significantly more reliable—even more performant in your domain—through RL."
On OpenAI's RFT vs. OpenPipe’s ART:
"What you're missing [with OpenAI's RFT API] is the ability to define custom tools and multi-turn agent behavior. With ART, you own the code, define the tools, and have full control of the reward signal."
Guest information, Mentioned Tools and Projects:
Kyle Corbitt: https://www.linkedin.com/in/kcorbitt/
OpenPipe: https://openpipe.ai
OpenPipe GitHub: https://github.com/OpenPipe/OpenPipe
ART (Agent Reinforcement Trainer): https://github.com/OpenPipe/ART
Ruler: LLM-based evaluation tool for training agents (part of the ART package): https://openpipe.ai/blog/ruler
DeepSeek R1: https://api-docs.deepseek.com/
OpenAI RFT: https://platform.openai.com/docs/guides/reinforcement-fine-tuning
SkyPilot: scale AI workloads https://github.com/skypilot-org/skypilot
Notable Blog Posts:
Ruler (LLM-as-a-Judge): https://www.openpipe.ai/blog/ruler
Reward Hacking Post: https://www.openpipe.ai/blog/reward-hacking
Serverless RL Backend: https://www.openpipe.ai/blog/serverless-rl
ART Trainer: A New RL Trainer for Agents: https://openpipe.ai/blog/art-trainer-a-new-rl-trainer-for-agents
Sponsored by:
🔥 ODSC AI West 2025 – The Leading AI Training Conference
Join us in San Francisco from October 28th–30th for expert-led sessions on generative AI, LLMOps, and AI-driven automation.
Use the code podcast for 10% off any ticket.
Learn more: https://odsc.ai
Информация
- Подкаст
- ЧастотаЕженедельно
- Опубликовано24 октября 2025 г. в 04:00 UTC
- Сезон1
- Выпуск88
