YAAP (Yet Another AI Podcast)

AI21

5.0 (2)
טכנולוגיה

YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned. Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.

26 באוג׳

The Judge Model Diaries: Judging the Judges

Your LLM gave a great answer. But who decides what “great” means? In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story. Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.

30 דק׳
12 באוג׳

RLVR Lets Models Fail Their Way to the Top

Think you know fine-tuning? If your answer is RLHF, you don’t. In this episode, Itay, who leads the Alignment group at AI21, gives a no-fluff crash course on RLVR (Reinforcement Learning with Verifiable Rewards), the method powering today’s smartest coding and reasoning models. He explains why RLVR beats RLHF at its own game, how “hard to solve, easy to verify” tasks unlock exploration without chaos, and the emergent behaviors you only get when models are allowed to screw up. If you want to actually understand RLVR (and use it), start here. Key topics: How RLVR outsmarts RLHF in real-world training The “verified rewards” trick that kills reward hacking Emergent skills you don’t get with hand-holding: self-verification, backtracking, multi-path reasoning Why coding models took a giant leap forward Practical steps to train (and actually benefit from) RLVR models

49 דק׳
29 ביולי

RAG Is Not Solved – Your Evaluation Just Sucks

RAG Is Not Solved – Your Evaluation Just Sucks Your RAG pipeline is passing benchmarks, but failing reality. In this episode, Yuval sits down with Niv from AI21 to expose why most RAG evaluation is fundamentally flawed. From overhyped retrieval scores to chunking strategies that collapse under real-world complexity, they break down why your system isn’t as good as you think — and how structured RAG solves problems that traditional pipelines simply can't. Bonus: what do Seinfeld trivia, World Cup stats, and your enterprise SharePoint have in common? (hint: your RAG pipeline chokes on all of them). Key Topics: Why most RAG benchmarks reward the wrong thing (and hide real failures) The chunking trap: how bad segmentation sabotages good retrieval When LLMs ace the answer—but your pipeline still fails Structured RAG: pipeline that solves RAG problem over aggregative data (such as financial reports) Evaluation tips, tricks, and traps for AI builders

44 דק׳
15 ביולי

The Call Is Coming From Inside the Agent (And It Has Your Credentials)

The Call Is Coming From Inside the Agent (And It Has Your Credentials) You’ve shipped your first agent. It works. It’s useful. It might also be a security liability you don’t even know about. In this episode, Yuval talks to Zenity CTO Michael Bargury about how easy it is to hijack popular agent systems like Copilot and Cursor, what “zero-click” attacks look like in the agent era, and how to monitor, constrain, and secure your AI Agent in production. From sneaky prompt injections to memory-based persistence and infected multi-agent workflows, this is the “oh no” moment every builder needs. Key Topics: Why “ignore previous instructions” still works better than it should How one agent goes rogue… and infects the others Real-world attacks: social media triggers, CRM leaks, and logic bombs Observability 101 for AI: logs, reasoning traces, and root cause sanity The new rule: build like it will go rogue—because one day it will

50 דק׳
1 ביולי

Building Enterprise RAG: Lessons from 2+ Years of Production Deployments

Building production AI systems is hard — especially when you're pioneering entirely new categories. In this episode, Yuval speaks with Guy Becker, Group Product Manager at AI21, to trace the evolution from task-specific models to Agent planning and orchestration systems. Guy shares hard-won lessons from building some of the first RAG-as-a-service offerings when there were literally zero handbooks to follow. Key Topics: Task-specific models vs. general LLMs: Why focused, smaller models with pre and post-processing beat general purpose LLMs for business use cases. Building RAG before it was cool: Creating one of the first RAG-as-a-service platforms in early 2023 without any established patterns. The one-size-fits-all problem: Why chunking strategies, embedding models, and retrieval parameters need customization per use case. From SaaS to on-prem: Scaling deployment models for enterprise customers with sensitive data. When RAG breaks down: Multi-hop queries, metadata filtering, and why semantic search isn't always enough. Multi-agent orchestration: How AI21 Maestro uses automated planning to break complex queries into parallelizable subtasks. Production lessons: Evaluation strategies, quality guarantees, and building explainable AI systems for enterprise..

38 דק׳
טריילר

Trailer

YAAP (Yet Another AI Podcast)

1 דק׳
17 ביוני

You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing

Everyone's talking about AI agents, but most of what we call "agents" are just workflows in disguise. Real autonomous agents require planning. And that, changes everything. In this episode, Yuval speaks with AI21's Algo Tech Lead, Nitzan Cohen about why the popular React framework isn't enough and how planning architecture unlocks true agent capabilities. Key Topics: 1. The difference between workflows/chains and real autonomous agents 2. Why React agents fail at complex tasks, parallel execution, and user transparency 3. Free text vs. code-based planning approaches and their trade-offs 4. How planning enables multi-agent systems and model delegation 5. Training planners with reinforcement learning and replanning mechanisms 6. Evaluation challenges: Gaia benchmark, Agent Bench, and building custom datasets 7. Practical advice: When to upgrade from React and which frameworks to use From competitive analysis that runs in parallel to breaking down complex coding tasks, discover how planning transforms AI agents from simple tool-calling loops into sophisticated problem-solving systems.

33 דק׳
10 ביוני

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail

Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development. Key Topics: Why current benchmarks are broken: From MMLU's limitations to RAG leaderboards that don't reflect real-world performance The tool use illusion: Why 95% accuracy on tool calling benchmarks doesn't mean your agent can actually plan LLM-as-a-judge problems: How evaluation bottlenecks are capping progress compared to verifiable domains like coding Framework: friend or foe? When to ditch LangChain, LlamaIndex, and why minimal implementations often work better The real agent stack: MCP, sandbox environments, and the four essential components you actually need Beyond the hype cycle: From embeddings that can't distinguish positive from negative numbers to what comes after agents From FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it really takes to build agents that solve real problems — not just impressive demos. Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.

40 דק׳
29 במאי

Tool Calling 2.0: How MCP Is Standardizing AI Connections

MCP (Model Context Protocol) is changing how developers connect AI applications to external tools – but what exactly is it, and why should you care? In this episode, Yuval speaks with Etan Grundstein, Technical Product Manager (and formerly Director of Engineering) at AI21, to break down the protocol that’s standardizing AI integrations, moving beyond basic weather APIs and calculators to real-world productivity workflows. Key Topics: 1) What MCP actually is and how it differs from traditional tool calling 2) Real-world examples: Connecting AI to Jira, Notion, Git, and even Blender 3) The evolution from local MCP servers to cloud integrations 4) Authentication challenges and how they’re being addressed 5) Why developers are building MCP servers to build other MCP servers 6) Looking ahead: Agent-to-Agent protocols and what comes next

29 דק׳

טריילר

Trailer

YAAP (Yet Another AI Podcast)

19 ביוני

•

1 דק׳

5

מתוך 5

2 דירוגים

YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned. Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.

יוצרים

AI21
שנות פעילות

2K‏
פרקים

9
סיווג

תוכן הולם
זכויות יוצרים

© 2025 AI21
אתר התכנית

YAAP (Yet Another AI Podcast)