Weaviate Podcast

Weaviate

Join Connor Shorten as he interviews machine learning experts and explores Weaviate use cases from users and customers.

  1. 13H AGO

    Search Agents with Nandan Thakur - Weaviate Podcast #137!

    Dr. Nandan Thakur returns to the Weaviate Podcast fresh off defending his dissertation to discuss the evolution from neural retrieval to agentic search and his new work on Orbit, a synthetic training data pipeline for search agents. The conversation opens with reflections on his PhD journey, tracing the field's shift from ColBERT-style models and sparse retrievers through RAG and into today's agentic search paradigm where LLMs iteratively search, reason, and refine.The discussion dives deep into how Orbit generates multi-hop, riddle-style training queries using DeepSeek's API on a personal laptop over four to six months, making high-quality search agent training data accessible without massive compute budgets. Thakur draws a sharp distinction between deep research (broad, multi-tool report generation) and search agents (focused on search and browse tools to answer specific questions), then connects Orbit's multi-hop queries to BrowseComp's filter-style riddles where each clue narrows the answer space like a funnel. The conversation explores the design of deep research harnesses, chunking strategies, Anthropic's contextual retrieval for entity disambiguation, context compaction to manage bloated agent contexts, and memory services like Weaviate's Engram for compressing search results between reasoning rounds.From there, the episode tackles sequential versus parallel search trajectories, the pass@K approach to rollouts in GRPO training, and whether isolated trajectories should share progress through message passing. Thakur makes a compelling case for training search agents to produce keyword-focused queries optimized for BM25 versus semantic queries for dense retrieval: the idea that one query does not fit all search engines. The conversation closes on future directions: efficiency-focused Pareto frontiers for search agents, long-form report generation evaluation through TREC RAG, and the coming wave of multilingual and multimodal search benchmarks.

    1h 1m
  2. APR 27

    AgentIR with Zijian Chen and Xueguang Ma - Weaviate Podcast #136!

    Zijian Chen and Xueguang Ma from the University of Waterloo join the Weaviate Podcast to discuss AgentIR and why retrieval systems need to be redesigned from the ground up for AI agents. The conversation opens with a striking reframe: agents have become the primary consumers of search, inserting themselves as middleware between humans and information. Humans used to query search engines directly, now they delegate to ChatGPT, which searches on their behalf. This means retrieval algorithms are no longer optimized for their actual users.The discussion distinguishes reasoning-intensive retrieval from reasoning-aware retrieval. Reasoning-intensive tasks like BRIGHT involve single-hop queries where the connection between query and document is obscure but still one step. Agent IR tackles a fundamentally different problem, extremely multi-hop queries from benchmarks like BrowseComp-Plus, where each hop strictly depends on the previous one. The key insight behind AgentIR is that agents reveal their entire reasoning process in their reasoning traces, unlike humans who never write out their thought process. Existing retrievers discard this rich signal entirely. AgentIR jointly embeds the query and reasoning trace, training a retriever from scratch to exploit this agent-specific context.From there, the conversation covers BrowseComp-Plus, which extends OpenAI's BrowseComp with a fixed corpus to enable disentangled evaluation of agents and retrievers separately, something impossible when both the web and the search provider are black boxes. Building the corpus required over 400 hours of human annotation to ensure every hop in every reasoning chain had its supporting documents present. The discussion then moves into agent context management, contrasting compaction approaches with just-in-time memory retrieval from paged memory, referencing InfoFlow and the AgentFold paper. Xueguang shares a provocative take that neither single-vector nor multi-vector representations are optimal, arguing the field needs embeddings at the right granularity based on information density. The episode closes with Steven introducing AICI, Agent-Computer Interaction, as the successor to HCI, and Xueguang framing the open question of scaling search along two dimensions: deeper (more turns) versus wider (more parallel queries).

    1h 3m
  3. APR 6

    Data Agents with Shreya Shankar - Weaviate Podcast #135!

    Shreya Shankar from UC Berkeley joins the Weaviate Podcast to discuss data agents, the Data Agent Benchmark, and DocETL. The conversation opens with defining what a data agent actually is, not just text-to-SQL over a single table, but an AI system that can reason across dozens of heterogeneous databases, flat files, and knowledge repositories to answer complex organizational questions. Shreya explains why this multi-database reality makes existing benchmarks insufficient, motivating the Data Agent Benchmark where the best-performing agent achieves only 34–37% pass@1 accuracy. From there, the discussion dives into where agents fail. They don't explore data properly, they generate broken regex patterns, they struggle with different SQL dialects, and they give up when datasets get large. Interestingly, agents tend to pull data into Pandas rather than use database operators directly, likely because LLMs are more fluent in Python than in the nuances of each SQL dialect. The conversation moves into semantic operators, natural language variants of relational algebra, filter, map, join, aggregation, where predicates like "Is this a sports article?" replace handwritten regex, with implementations ranging from per-row LLM calls to synthesized code. Shreya then presents DocETL, a declarative system for processing unstructured data that uses LLM agents to propose query rewrite strategies like chunking, splitting, and map-then-reduce decompositions, optimizing for both accuracy and cost on long documents. This leads into a broader discussion of declarative versus imperative agent design, the tradeoff between letting agents write arbitrary Python and constraining them within frameworks that handle optimization and caching. The conversation also explores tribal knowledge, structuring learned facts about data quality into retrievable tables so agents can reuse discoveries across queries, and connects to recent work on using LLMs to discover new database query rewrite rules. The episode closes with a reflection on how classical database principles like query optimization and cardinality estimation are finding new life in the age of LLM-powered data systems. 0:05 What are Data Agents? 2:10 Multi-Database Systems 9:44 Semantic Operators 13:18 Querying Databases with Python 17:05 DocETL 24:34 Advanced Text-to-SQL 29:30 Claude Code and Databases 34:34 Self-Driving Databases 42:00 Agent Memory for Querying Databases 53:48 Exciting Directions for AI

    57 min
  4. MAR 23

    Multi-Vector Search with Amélie Chatelain and Antoine Chaffin - Weaviate Podcast #134!

    Amélie Chatelain and Antoine Chaffin from LightOn are leading the way in the next generation of search powered by Multi-Vector representations and Late Interaction. The podcast begins with what motivates them to work on Multi-Vector Search, continuing to discuss particular details such as the combination between lexical and semantic search, as well as bi-encoder speed with cross encoder accuracy. This discussion continues to present insights about training multi-vector models and how they differ from their single-vector predecessors. The conversation continues into particular successes of Late Interaction such as code, reasoning-intensive, and multimodal retrieval. Agents are great at searching with grep, but they are even better with ColGrep! Reasoning-Intensive Retrieval is a step change in how we think about search systems, beautifully enabled by both Late Interaction models and Agentic Search. Further, Multimodal Search, such as matching text with videos, is seeing massive benefits from Multi-Vector representations. The podcast continues to dive into the cost of MaxSim and how efficient methods such as MUVERA and PLAID can help. The podcast concludes with a presentation of their recent work on ColBERT-Zero, pre-training with Late Interaction instead of Single-Vector Dense Embedding models. LightOn are also the developers of PyLate, the world's leading open-source library for training these kinds of models.Chapters0:00 An Introduction to Multi-Vector Search6:00 Multi- vs. Single-Vector8:55 Comparison with Cross Encoders15:55 ColGrep for Coding Agents30:34 Reasoning-Intensive Retrieval42:02 Multimodal Multi-Vector48:34 The Cost of Multi-Vector53:26 MUVERA and PLAID1:06:18 ColBERT-Zero and PyLate1:08:35 ColBERT-Zero and PyLate

    1h 21m
  5. MAR 1

    AI-Powered Search with Doug Turnbull and Trey Grainger [#133]

    Doug Turnbull and Trey Grainger join the Weaviate Podcast to discuss all things AI-Powered Search! The conversation kicks off with designing search experiences, not all search queries are the same! Sometimes the user knows exactly what they want (a product ID, a specific file), other times they're exploring a broad category, and other times they need to compare and contrast options. AI is now making it possible to dynamically construct UIs around search results, moving toward what Trey describes as a "Minority Report"-style future where visualizations adapt on the fly to the query and the data.From there, the discussion dives into query understanding and domain modeling. Doug and Trey break down how LLMs can classify queries against existing taxonomies (like NAICS codes or Google's product taxonomy), while Trey explains a multi-tier RAG approach, using the index itself as grounding for query interpretation before executing the final retrieval. The conversation moves into agentic search, exploring whether iterative LLM-driven search loops reduce the need for ever-better embedding models, or whether simple tools like BM25 and grep are sufficient when paired with strong reasoning.Trey introduces wormhole vectors, a technique for traversing between sparse (lexical) and dense (semantic) vector spaces by treating query results as document sets with shared meaning, enabling exploration across vector spaces rather than treating them as orthogonal. The discussion also covers reflected intelligence, the idea of making search systems self-learning by mining user behavioral signals (clicks, purchases, skipped results) to continuously improve relevance through techniques like signals boosting, collaborative filtering, and learning to rank.The episode wraps with a conversation about how coding agents are changing the way Doug and Trey work, and Trey's philosophy of designing intentional agentic workflows with atomic agents rather than just handing an LLM a bag of tools.AI Powered Search (Discount Code = "weaviate")https://aipoweredsearch.com/live-course?promoCode=weaviate

    53 min
4
out of 5
4 Ratings

About

Join Connor Shorten as he interviews machine learning experts and explores Weaviate use cases from users and customers.

You Might Also Like