85% of AI teams will hit a serious production failure this year. The only thing separating them from the 15% who don't? Evals. After nearly two decades of building AI systems at Microsoft, Facebook, and Dropbox, Ameya Bhatawdekar is now Field CTO at Braintrust, the AI observability platform used by Airtable, Notion, Stripe, Dropbox, Vercel, Cloudflare, Lovable, and Replit. We discuss a shift that most teams underestimate. The winners in AI are not just shipping faster. They are building systems that behave predictably, improve continuously, and earn user trust over time. As traditional monitoring breaks down in a probabilistic world, observability now requires learning how an AI system reasons, not just how it performs. This leads to a new paradigm where agents are no longer just executing tasks, but also analyzing and debugging other agents. The episode also traces the evolution of machine learning itself. From feature engineering to deep learning to transformers , each leap increased capability and reduced control. Evaluation is now where control sits. Ameya is clear on one point. Moving fast with weak evaluations feels like velocity, but it compounds into technical debt, unpredictable failures, and ultimately a loss of user trust. The teams that win are the ones that invest early in rigor, especially in understanding context, which is quickly becoming the hardest and most critical layer in AI systems. If you are a founder or engineer moving beyond the demo phase and trying to build durable, high-quality AI systems, this episode will change how you think about shipping. 0:00 — Trailer 00:55 — What’s Braintrust? 05:01 — What agents are shipping today 07:54 — What evals look like in practice for Notion & Zapier 09:44 — Evals vs Classic monitoring 11:33 — Who is the Field CTO? 16:35 — What goes wrong when agents fail 18:26 — Agents analyzing other agents 24:17 — Evals are existential in vibecoding 25:52 — Ship fast with weak evals or slow with strong evals? 25:41 — What makes enterprises trust an LLM? 29:25 — Do AI startups know how good their product is? 30:23 — 3 ML systems: Microsoft, Dropbox, Meta 36:30 — How the 2017 transformer paper changed everything 38:20 — All algorithms are predicting the next word 43:40 — What LLMs will do in 1 year ------------- India’s talent has built the world’s tech—now it’s time to lead it. This mission goes beyond startups. It’s about shifting the center of gravity in global tech to include the brilliance rising from India. What is Neon Fund? We invest in seed and early-stage founders from India and the diaspora building world-class Enterprise AI companies. We bring capital, conviction, and a community that’s done it before. Subscribe for real founder stories, investor perspectives, economist breakdowns, and a behind-the-scenes look at how we’re doing it all at Neon. ------------- Check us out on: Website: https://neon.fund/ Instagram: https://www.instagram.com/theneonshoww/ LinkedIn: https://www.linkedin.com/company/beneon/ Twitter: https://x.com/TheNeonShoww Connect with Siddhartha on: LinkedIn: https://www.linkedin.com/in/siddharthaahluwalia/ Twitter: https://x.com/siddharthaa7 ------------- This video is for informational purposes only. The views expressed are those of the individuals quoted and do not constitute professional advice. Send us Fan Mail