Shopify's distillation pipeline cuts production AI costs by up to 30x — and in some cases, the smaller model outperforms the frontier model on the narrow task. That's not a trade-off. That's a win on accuracy, latency, and cost simultaneously. Farhan Thawar, VP & Head of Engineering at Shopify, runs AI across one of the largest commerce platforms on earth. In this episode, he breaks down the exact infrastructure decisions Shopify made to avoid being locked into any single model provider — and why 29% of enterprise AI projects die from token costs, not model failure. Shopify built an internal LLM proxy that routes tokens across every major provider, enabling automatic failover when any one goes down. On top of that, their Universal Distillation Platform (UDP) lets any R&D team distill a frontier model (Opus 4, GPT-5+) down to a fine-tuned open source model (Qwen and others) for a specific subtask — in roughly a day, with evals baked in. Results range from 2x to 30x cheaper, faster, and more accurate than calling the frontier API for everything. Shopify currently runs roughly half a dozen of these distilled models in production, with more being added. Farhan also details River, Shopify's internal agentic substrate — a public-only Slack agent that queries their data warehouse, reads their PM system, and improves its own answers when engineers jump in to correct it. Plus: how Shopify governs AI-generated code at scale, why they killed their token leaderboard, how UCP is positioning Shopify's catalog for agentic commerce, and what a two-to-three year horizon looks like when agents start holding spending budgets and buying autonomously. 🎙️ GUEST: Farhan Thawar | VP & Head of Engineering, Shopify 🎙️ HOST: Sam Witteveen | VentureBeat __ If you enjoy these conversations, you need to be in Menlo Park this July. VB Transform 2026 is VentureBeat's flagship enterprise AI event, built entirely around one question: How do you orchestrate AI autonomy at scale? July 14–15, Hotel Nia. Real projects, proprietary research, no fluff. 50% off for listeners with code BEYONDTHEPILOT: https://bit.ly/4fK4F6z — **CHAPTERS** 00:00 Intro — Infrastructure-first philosophy at Shopify 02:00 Episode overview: token economy, distillation, and the cost crisis killing AI projects 03:00 Toby's "AI reflexivity" mandate and what it actually means for engineers 05:20 Ecosystem strategy: when to build vs. leave room for third-party developers 06:20 Developer tooling stack: GitHub Copilot (2021), Claude Code, Cursor, Codex, and Shopify's own River 08:00 LLM proxy architecture: bulk token purchasing, multi-provider failover, and usage reporting 09:00 Model agnosticism: why Shopify lets engineers choose their own harness 10:00 AI adoption beyond R&D — finance, HR, sales, and the Qwik internal deployment platform 11:20 Code governance: who owns AI-generated code going to production 13:00 Token maxing, leaderboards, and the shift from AI reflexivity to AI leverage 14:20 Circuit breakers: how Shopify catches runaway token spend without hard limits 15:20 LLM proxy deep dive: uptime, insights, and cross-team learning from usage data 16:40 River: Shopify's agentic substrate, public-channel-only design, and emergent HITL behavior 18:40 Model distillation explained: teacher/student models, narrow tasks, and the trade-offs 21:00 Universal Distillation Platform (UDP): how any team submits a distillation job in ~one day 22:20 Tangle: open source pipeline visualization for distillation workflows 22:40 Who uses UDP today, and Farhan's vision for auto-selecting the distillation target model 24:00 Evals in practice: golden datasets, Toloka for data generation, threshold-based deployment 26:20 Sim Gym: simulating A/B tests for small merchants without enough traffic 27:40 Pulse: async AI insights on store performance and conversion 28:20 GPU infrastructure trade-offs: when running your own inference makes sense at scale 29:20 Frontier vs. distilled model split in production — and why dev tokens stay frontier 30:40 Should Shopify train its own coding model? Why Farhan says not yet 31:40 UCP protocol, agentic commerce, and how Shopify's catalog surfaces in every LLM 35:00 Early signals: agentic commerce growth rate and the shift away from SEO 36:00 What developers and entrepreneurs should build for now given multi-channel uncertainty 37:20 River as a hive-mind agent — and what "truly agentic" actually means 38:40 Two-to-three year forecast: agents with spending budgets, autonomous purchasing, proactive outreach 40:00 Project Glasswing, model access loss (Fable), and the case for multi-provider architecture 42:00 Why every company should have a backup plan — and how Shopify built theirs years ago 43:20 Wrap-up --- Subscribe to VentureBeat: https://www.youtube.com/@VentureBeat Apple Podcasts: https://podcasts.apple.com/us/podcast/venturebeat/id1839285239 Spotify: https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4 Website: https://venturebeat.com LinkedIn: https://www.linkedin.com/company/venturebeat Newsletter: https://venturebeat.com/newsletters #EnterpriseAI #AIAgents #LLM #MLOps #AgenticAI — Learn more about your ad choices. Visit megaphone.fm/adchoices