22 episodes

Vector Podcast is here to bring you the depth and breadth of Search Engine Technology, Product, Marketing, Business. In the podcast we talk with engineers, entrepreneurs, thinkers and tinkerers, who put their soul into search.

Depending on your interest, you should find a matching topic for you -- whether it is deep algorithmic aspect of search engines and information retrieval field, or examples of products offering deep tech to its users.

"Vector" -- because it aims to cover an emerging field of vector similarity search, giving you the ability to search content beyond text: audio, video, images and more.

"Vector" also because it is all about vector in your profession, product, marketing and business.

Vector Podcast Dmitry Kan

    • Science
    • 5.0 • 2 Ratings

Listen on Apple Podcasts
Requires macOS 11.4 or higher

Vector Podcast is here to bring you the depth and breadth of Search Engine Technology, Product, Marketing, Business. In the podcast we talk with engineers, entrepreneurs, thinkers and tinkerers, who put their soul into search.

Depending on your interest, you should find a matching topic for you -- whether it is deep algorithmic aspect of search engines and information retrieval field, or examples of products offering deep tech to its users.

"Vector" -- because it aims to cover an emerging field of vector similarity search, giving you the ability to search content beyond text: audio, video, images and more.

"Vector" also because it is all about vector in your profession, product, marketing and business.

Listen on Apple Podcasts
Requires macOS 11.4 or higher

    Louis Brandy - SQL meets Vector Search at Rockset

    Louis Brandy - SQL meets Vector Search at Rockset

    00:00 Intro
    00:42 Louis's background
    05:39 From Facebook to Rockset
    07:41 Embeddings prior to deep learning / LLM era
    12:35 What's Rockset as a product
    15:27 Use cases
    18:04 RocksDB as part of Rockset
    20:33 AI capabilities: ANN index, hybrid search
    25:11 Types of hybrid search
    28:05 Can one learn the alpha?
    30:03 Louis's prediction of the future of vector search
    33:55 RAG and other AI capabilities
    41:46 Call out to the Vector Search community
    46:16 Vector Databases vs Databases
    49:16 Question of WHY

    • 52 min
    Saurabh Rai - Growing Resume Matcher

    Saurabh Rai - Growing Resume Matcher

    Topics:
    00:00 Intro - how do you like our new design?
    00:52 Greets
    01:55 Saurabh's background
    03:04 Resume Matcher: 4.5K stars, 800 community members, 1.5K forks
    04:11 How did you grow the project?
    05:42 Target audience and how to use Resume Matcher
    09:00 How did you attract so many contributors?
    12:47 Architecture aspects
    15:10 Cloud or not
    16:12 Challenges in maintaining OS projects
    17:56 Developer marketing with Swirl AI Connect
    21:13 What you (listener) can help with
    22:52 What drives you?

    Show notes:
    - Resume Matcher: https://github.com/srbhr/Resume-Matcher
    website: https://resumematcher.fyi/

    - Ultimate CV by Martin John Yate: https://www.amazon.com/Ultimate-CV-Cr...

    - fastembed: https://github.com/qdrant/fastembed

    - Swirl: https://github.com/swirlai/swirl-search

    • 26 min
    Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs

    Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs

    Topics:
    00:00 Intro
    00:22 Quick demo of SWIRL on the summary transcript of this episode
    01:29 Sid’s background
    08:50 Enterprise vs Federated search
    17:48 How vector search covers for missing folksonomy in enterprise data
    26:07 Relevancy from vector search standpoint
    31:58 How ChatGPT improves programmer’s productivity
    32:57 Demo!
    45:23 Google PSE
    53:10 Ideal user of SWIRL
    57:22 Where SWIRL sits architecturally
    1:01:46 How to evolve SWIRL with domain expertise
    1:04:59 Reasons to go open source
    1:10:54 How SWIRL and Sid interact with ChatGPT
    1:23:22 The magical question of WHY
    1:27:58 Sid’s announcements to the community
    YouTube version: https://www.youtube.com/watch?v=vhQ5LM5pK_Y
    Design by Saurabh Rai: https://twitter.com/_srbhr_ Check out his Resume Matcher project: https://www.resumematcher.fyi/

    • 1 hr 32 min
    Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with Vector Search

    Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with Vector Search

    Topics:
    00:00 Intro
    02:20 Atita’s path into search engineering
    09:00 When it’s time to contribute to open source
    12:08 Taking management role vs software development
    14:36 Knowing what you like (and coming up with a Solr course)
    19:16 Read the source code (and cook)
    23:32 Open Bistro Innovations Lab and moving to Germany
    26:04 Affinity to Search world and working as a Search Relevance Consultant
    28:39 Bringing vector search to Chorus and Querqy
    34:09 What Atita learnt from Eric Pugh’s approach to improving Quepid
    36:53 Making vector search with Solr & Elasticsearch accessible through tooling and documentation
    41:09 Demystifying data embedding for clients (and for Java based search engines)
    43:10 Shifting away from generic to domain-specific in search+vector saga
    46:06 Hybrid search: where it will be useful to combine keyword with semantic search
    50:53 Choosing between new vector DBs and “old” keyword engines
    58:35 Women of Search
    1:14:03 Important (and friendly) People of Open Source
    1:22:38 Reinforcement learning applied to our careers
    1:26:57 The magical question of WHY
    1:29:26 Announcements
    See show notes on YouTube: https://www.youtube.com/watch?v=BVM6TUSfn3E

    • 1 hr 32 min
    Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning

    Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning

    Topics:
    00:00 Intro
    01:54 Things Connor learnt in the past year that changed his perception of Vector Search
    02:42 Is search becoming conversational?
    05:46 Connor asks Dmitry: How Large Language Models will change Search?
    08:39 Vector Search Pyramid
    09:53 Large models, data, Form vs Meaning and octopus underneath the ocean
    13:25 Examples of getting help from ChatGPT and how it compares to web search today
    18:32 Classical search engines with URLs for verification vs ChatGPT-style answers
    20:15 Hybrid search: keywords + semantic retrieval
    23:12 Connor asks Dmitry about his experience with sparse retrieval
    28:08 SPLADE vectors
    34:10 OOD-DiskANN: handling the out-of-distribution queries, and nuances of sparse vs dense indexing and search
    39:54 Ways to debug a query case in dense retrieval (spoiler: it is a challenge!)
    44:47 Intricacies of teaching ML models to understand your data and re-vectorization
    49:23 Local IDF vs global IDF and how dense search can approach this issue
    54:00 Realtime index
    59:01 Natural language to SQL
    1:04:47 Turning text into a causal DAG
    1:10:41 Engineering and Research as two highly intelligent disciplines
    1:18:34 Podcast search
    1:25:24 Ref2Vec for recommender systems
    1:29:48 Announcements
    For Show Notes, please check out the YouTube episode below.
    This episode on YouTube: https://www.youtube.com/watch?v=2Q-7taLZ374
    Podcast design: Saurabh Rai: https://twitter.com/srvbhr

    • 1 hr 33 min
    Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML

    Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML

    Toloka’s support for Academia: grants and educator partnerships
    https://toloka.ai/collaboration-with-educators-form
    https://toloka.ai/research-grants-form
    These are pages leading to them:
    https://toloka.ai/academy/education-partnerships
    https://toloka.ai/grants
    Topics:
    00:00 Intro
    01:25 Jenny’s path from graduating in ML to a Data Advocate role
    07:50 What goes into the labeling process with Toloka
    11:27 How to prepare data for labeling and design tasks
    16:01 Jenny’s take on why Relevancy needs more data in addition to clicks in Search
    18:23 Dmitry plays the Devil’s Advocate for a moment
    22:41 Implicit signals vs user behavior and offline A/B testing
    26:54 Dmitry goes back to advocating for good search practices
    27:42 Flower search as a concrete example of labeling for relevancy
    39:12 NDCG, ERR as ranking quality metrics
    44:27 Cross-annotator agreement, perfect list for NDCG and Aggregations
    47:17 On measuring and ensuring the quality of annotators with honeypots
    54:48 Deep-dive into aggregations
    59:55 Bias in data, SERP, labeling and A/B tests
    1:16:10 Is unbiased data attainable?
    1:23:20 Announcements
    This episode on YouTube: https://youtu.be/Xsw9vPFqGf4
    Podcast design: Saurabh Rai: https://twitter.com/srvbhr

    • 1 hr 26 min

Customer Reviews

5.0 out of 5
2 Ratings

2 Ratings

Top Podcasts In Science

Hidden Brain
Hidden Brain, Shankar Vedantam
Something You Should Know
Mike Carruthers | OmniCast Media | Cumulus Podcast Network
Radiolab
WNYC Studios
Crash Course Pods: The Universe
Crash Course Pods, Complexly
Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas
Sean Carroll | Wondery
Ologies with Alie Ward
Alie Ward

You Might Also Like

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
Practical AI: Machine Learning, Data Science
Changelog Media
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
Gradient Dissent: Conversations on AI
Lukas Biewald
Arxiv Papers
Igor Melnyk
Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn