Joan Fontanals - Principal Engineer - Jina AI Vector Podcast

    • Technology

Listen on Apple Podcasts
Requires macOS 11.4 or higher

Topics:
00:00 Intro
00:42 Joan's background
01:46 What attracted Joan's attention in Jina as a company and product?
04:39 Main area of focus for Joan in the product
05:46 How Open Source model works for Jina?
08:38 Deeper dive into Jina.AI as a product and technology stack
11:57 Does Jina fit the use cases of smaller / mid-size players with smaller amount of data?
13:45 KNN/ANN algorithms available in Jina
16:05 BigANN competition and BuddyPQ, increasing 12% in recall over FAISS
17:07 Does Jina support customers in model training? Finetuner
20:46 How does Jina framework compare to Vector Databases?
26:46 Jina's investment in user-friendly APIs
31:04 Applications of Jina beyond search engines, like question answering systems
33:20 How to bring bits of neural search into traditional keyword retrieval? Connection to model interpretability
41:14 Does Jina allow going multimodal, including images / audio etc?
46:03 The magical question of Why
55:20 Product announcement from Joan

Order your Jina swag https://docs.google.com/forms/d/e/1FA... Use this promo code: vectorPodcastxJinaAI

Show notes:
- Jina.AI: https://jina.ai/

- HNSW + PostgreSQL Indexer: [GitHub - jina-ai/executor-hnsw-postgres: A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL](https://github.com/jina-ai/executor-h...)

- pqlite: [GitHub - jina-ai/pqlite: A fast embedded library for Approximate Nearest Neighbor Search integrated with the Jina ecosystem](https://github.com/jina-ai/pqlite)

- BuddyPQ: [Billion-Scale Vector Search: Team Sisu and BuddyPQ | by Dmitry Kan | Big-ANN-Benchmarks | Nov, 2021 | Medium](https://medium.com/big-ann-benchmarks...)

- PaddlePaddle: [GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)](https://github.com/PaddlePaddle/Paddle)

- Jina Finetuner: [Finetuner 0.3.1 documentation](https://finetuner.jina.ai/)

- [Not All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science](https://towardsdatascience.com/milvus...)

- Fluent interface (method chaining): [Fluent interfaces in Python | Florian Einfalt – Developer](https://florianeinfalt.de/posts/fluen...)

- Sujit Pal’s blog: [Salmon Run](http://sujitpal.blogspot.com/)

- ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

Special thanks to Saurabh Rai for the Podcast Thumbnail: https://twitter.com/srbhr_ https://www.linkedin.com/in/srbh077/

Topics:
00:00 Intro
00:42 Joan's background
01:46 What attracted Joan's attention in Jina as a company and product?
04:39 Main area of focus for Joan in the product
05:46 How Open Source model works for Jina?
08:38 Deeper dive into Jina.AI as a product and technology stack
11:57 Does Jina fit the use cases of smaller / mid-size players with smaller amount of data?
13:45 KNN/ANN algorithms available in Jina
16:05 BigANN competition and BuddyPQ, increasing 12% in recall over FAISS
17:07 Does Jina support customers in model training? Finetuner
20:46 How does Jina framework compare to Vector Databases?
26:46 Jina's investment in user-friendly APIs
31:04 Applications of Jina beyond search engines, like question answering systems
33:20 How to bring bits of neural search into traditional keyword retrieval? Connection to model interpretability
41:14 Does Jina allow going multimodal, including images / audio etc?
46:03 The magical question of Why
55:20 Product announcement from Joan

Order your Jina swag https://docs.google.com/forms/d/e/1FA... Use this promo code: vectorPodcastxJinaAI

Show notes:
- Jina.AI: https://jina.ai/

- HNSW + PostgreSQL Indexer: [GitHub - jina-ai/executor-hnsw-postgres: A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL](https://github.com/jina-ai/executor-h...)

- pqlite: [GitHub - jina-ai/pqlite: A fast embedded library for Approximate Nearest Neighbor Search integrated with the Jina ecosystem](https://github.com/jina-ai/pqlite)

- BuddyPQ: [Billion-Scale Vector Search: Team Sisu and BuddyPQ | by Dmitry Kan | Big-ANN-Benchmarks | Nov, 2021 | Medium](https://medium.com/big-ann-benchmarks...)

- PaddlePaddle: [GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)](https://github.com/PaddlePaddle/Paddle)

- Jina Finetuner: [Finetuner 0.3.1 documentation](https://finetuner.jina.ai/)

- [Not All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science](https://towardsdatascience.com/milvus...)

- Fluent interface (method chaining): [Fluent interfaces in Python | Florian Einfalt – Developer](https://florianeinfalt.de/posts/fluen...)

- Sujit Pal’s blog: [Salmon Run](http://sujitpal.blogspot.com/)

- ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

Special thanks to Saurabh Rai for the Podcast Thumbnail: https://twitter.com/srbhr_ https://www.linkedin.com/in/srbh077/

Top Podcasts In Technology

Lex Fridman
Sähköautomiehet
Mikko Hyppönen & Tomi Tuominen
Jack Rhysider
Fellowmind
Recode & The Verge