From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki

ODSC's Ai X Podcast
In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation. It's the same company behind the very popular open-source evaluation project, Phoenix. Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul where he scaled the business into a public company that was eventually acquired by Adobe. SHOW TOPICS: Jason’s background and key moments in his career Arize AI's founding journey and focus on observability and evaluation Primary challenges of evaluating GenAI and foundational models Using LLM / AI as-a-judge Common mistakes to avoid when evaluating LLMs Evaluation-driven development. AI agents, agentic AI, and challenge for evaluation Breaking down AI agents into manageable components. Agent Control Flow and assessing how agents make correct decisions at each step Evaluating individual actions performed by AI agents Retrieval Augmented Generation (RAG) evaluation Ensuring RAG retrieved information is accurate and relevant Risks and benefits of using open-source models vs. proprietary models, Large Language Model evaluation metrics The drawbacks of public benchmarks Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production The advantages of SLMs (Small language Models) Building an LLM task evaluation from scratch, the steps involved SHOW NOTES - Jason Lopatecki, CEO and Co-Founder of Arize AI: https://www.linkedin.com/in/jason-lopatecki-9509941 https://twitter.com/jason_lopatecki Arize AI: https://twitter.com/arizeai - Arize AI blogs https://arize.com/blog/ - Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/ - Foundational Models https://en.wikipedia.org/wiki/Foundation_model - AI Agents https://en.wikipedia.org/wiki/Intelligent_agent - Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/ - Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491 - Open LLM Leaderboard https://huggingface.co/open-llm-leaderboard - OpenAI o1 https://openai.com/o1/ - Mistral LLMs https://docs.mistral.ai/getting-started/models/models_overview/ - Llama 3.2 https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ - Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/ Phoenix - Open Source AI Observability & Evaluation -https://github.com/Arize-ai/phoenix This episode was sponsored by: Ai+ Training ⁠https://aiplus.training/⁠ Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering. And created in partnership with ODSC ⁠https://odsc.com/⁠ The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31.

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada