From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki
In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations.
Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation.
It's the same company behind the very popular open-source evaluation project, Phoenix.
Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul
where he scaled the business into a public company that was eventually acquired by Adobe.
SHOW TOPICS:
Jason’s background and key moments in his career
Arize AI's founding journey and focus on observability and evaluation
Primary challenges of evaluating GenAI and foundational models
Using LLM / AI as-a-judge
Common mistakes to avoid when evaluating LLMs
Evaluation-driven development.
AI agents, agentic AI, and challenge for evaluation
Breaking down AI agents into manageable components.
Agent Control Flow and assessing how agents make correct decisions at each step
Evaluating individual actions performed by AI agents
Retrieval Augmented Generation (RAG) evaluation
Ensuring RAG retrieved information is accurate and relevant
Risks and benefits of using open-source models vs. proprietary models,
Large Language Model evaluation metrics
The drawbacks of public benchmarks
Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production
The advantages of SLMs (Small language Models)
Building an LLM task evaluation from scratch, the steps involved
SHOW NOTES
- Jason Lopatecki, CEO and Co-Founder of Arize AI: https://www.linkedin.com/in/jason-lopatecki-9509941
https://twitter.com/jason_lopatecki
Arize AI: https://twitter.com/arizeai
- Arize AI blogs https://arize.com/blog/
- Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/
- Foundational Models https://en.wikipedia.org/wiki/Foundation_model
- AI Agents https://en.wikipedia.org/wiki/Intelligent_agent
- Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/
- Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491
- Open LLM Leaderboard https://huggingface.co/open-llm-leaderboard
- OpenAI o1 https://openai.com/o1/
- Mistral LLMs https://docs.mistral.ai/getting-started/models/models_overview/
- Llama 3.2 https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/
- Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/
Phoenix - Open Source AI Observability & Evaluation -https://github.com/Arize-ai/phoenix
This episode was sponsored by:
Ai+ Training https://aiplus.training/
Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering.
And created in partnership with ODSC https://odsc.com/
The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps
Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31.
Information
- Show
- FrequencyUpdated Weekly
- PublishedOctober 8, 2024 at 9:00 PM UTC
- Season1
- Episode41