74 episodes

The GeekNarrator podcast is a show hosted by Kaivalya Apte who is a Software Engineer and loves to talk about Technology, Technical Interviews, Self Improvement, Best Practices and Hustle.

Connect with Kaivalya Apte https://www.linkedin.com/in/kaivalya-apte-2217221a

Tech blogs: https://kaivalya-apte.medium.com/

Wanna talk? Book a slot here: https://calendly.com/speakwithkv/hey

Enjoy the show and please follow to get more updates. Also please don’t forget to rate and review the show.

Cheers

The GeekNarrator Kaivalya Apte

- Technology

- 4 JUN 2024
Scaling Derived Data for Planet-Scale Applications at Linkedin

Scaling Derived Data for Planet-Scale Applications at Linkedin

In this video I speak with Felix GV, who is a Principal Staff Engineer at Linkedin, and has done major contributions to the data infrastructure and Linkedin, including VeniceDB.

This episode will give you a good understanding of why we need a new database for storing "Derived Data" in a low latency, high performance manner, which is very important for Machine Learning workloads.

Chapters:
00:00 Introduction
01:42 The Evolution of LinkedIn's Databases
03:15 Challenges with Voldemort and the Birth of VeniceDB
08:42 Understanding Derived Data
13:33 Planet-Scale Applications and Multi-Region Support
17:40 Writing Data into VeniceDB
22:53 Merging Data in VeniceDB
40:31 Understanding the Architecture
40:47 Components of the Write Path
41:56 Leader and Follower Architecture
43:58 Partitioning and DaVinci Client
47:57 Read Patterns and Client Options
54:25 Fault Tolerance and Recommender Systems
01:01:19 Kafka Integration and Deployment
01:06:56 Roadmap and Future Improvements

Important links:
VeniceDB blog: https://www.linkedin.com/blog/engineering/open-source/open-sourcing-venice-linkedin-s-derived-data-platform
VeniceDB docs: https://venicedb.org/
Qcon: https://youtu.be/pJeg4V3JgYo?si=vblGUxp5fNdKPHoC

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#kafka #linkedin #venicedb #Rocksdb
- 1 hr 12 min
- 4 JUN 2024
SuperCharging PostgreSQL for Search and Analytics - ParadeDB (Philippe Noël)

SuperCharging PostgreSQL for Search and Analytics - ParadeDB (Philippe Noël)

In this video I speak with Philippe Noël, about ParadeDB, which is an Elasticsearch alternative built on Postgres, modernizing the features of Elasticsearch's product suite, starting with real-time search and analytics.

I hope you will enjoy and learn about the product.

Chapters:
00:00 Introduction
01:12 Challenges with Elasticsearch and the Need for ParadeDB
02:29 Why Postgres?
06:30 Technical Details of ParadeDB's Search Functionality
18:25 Analytics Capabilities of ParadeDB
24:00 Understanding ParadeDB Queries and Transactions
24:22 Application Logic and Data Workflows
25:14 Using PG Cron for Data Migration
30:05 Scaling Reads and Writes in Postgres
31:53 High Availability and Distributed Systems
34:31 Isolation of Workloads
39:38 Database Upgrades and Migrations
41:21 Using ParadeDB Extensions and Distributions
43:02 Observability and Monitoring
44:42 Upcoming Features and Roadmap
46:34 Final Thoughts

Important links:
Links:
GitHub: https://github.com/paradedb/paradedb
Website: https://paradedb.com
Docs: https://docs.paradedb.com/
Blog: https://blog.paradedb.com

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#postgresql #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign #elasticsearch
- 46 min
- 4 JUN 2024
Modern OLAP Database System Design with FDAP (Andrew Lamb)

Modern OLAP Database System Design with FDAP (Andrew Lamb)

In this video I speak with Andrew Lamb, Staff Software Engineer @Influxdb. We discuss FDAP (Flight, DataFusion, Arrow, Parquet) stack for modern OLAP database system design. Andrew shared some insights into why the FDAP stack is so powerful in designing and implementing a modern OLAP database.

Chapters:
00:00 Introduction
01:48 Understanding Analytics: Transactional vs Analytical Databases
04:41 The Genesis and Goals of the FDAP Stack
09:31 Decoding FDAP: Flight, Data Fusion, Arrow, and Parquet
12:40 Apache Parquet: Revolutionizing Columnar Storage
17:18 Apache Arrow: The In-Memory Game Changer
23:51 Interoperability and Migration with Apache Arrow
27:10 Comparing Apache Parquet and Arrow
28:26 Exploring Data Mutability in Analytic Systems
29:19 Handling Data Updates and Deletions
29:24 The Role of Immutable Storage in Analytics
30:42 Optimizing Data Storage and Mutation Strategies
34:20 Introducing Flight: Simplifying Data Transfer
35:02 Deep Dive into Flight's Benefits and SQL Support
39:20 Unpacking Data Fusion's SQL Support and Extensibility
46:12 The Interplay of FDAP Components in Analytics
51:49 Future Directions and Innovations in Data Analytics
56:04 Concluding Thoughts on FDAP and Its Impact

FDAP Stack: https://www.influxdata.com/glossary/fdap-stack/
FDAP Blog: https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/
InfluxDB: https://www.influxdata.com/

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign
- 56 min
- 4 JUN 2024
The ultimate multi-model Database, SurrealDB with Pratim Bhosale

The ultimate multi-model Database, SurrealDB with Pratim Bhosale

In this video I and Pratim Bhosale, Developer Advocate at SurrealDB, talk about SurrealDB, a multi-model database which aims to make Developer’s life easier by letting them focus mainly on the business logic and not on the Database choice. Following chapters will help you understand what is a multi-model database and how SurrealDB shines.

Chapters:
00:00 Introduction
01:48 The Genesis of SurrealDB
03:59 SurrealDB's Mission and Use Cases
07:34 Understanding Multi-Model Databases
10:30 Deep Dive into SurrealDB's Architecture
33:09 Deployment and Getting Started with SurrealDB
34:31 Future Developments and Use Case Considerations
43:51 Final Thoughts and How to Get Started

Important links:

Install SurrealDB
https://sdb.li/4bqwn38

SurrealDB Docs:
https://sdb.li/3wxjoxx

SurrealDB Website:
https://sdb.li/3JMK7JI

Surrealist:
https://sdb.li/4b7wcdh

SurrealDB GitHub:
https://sdb.li/3JRPNlE

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#surrealdb #elasticsearch #search #vectorsearch #acid #databases #sql #joins #indexes #graphdatabase
- 46 min
- 17 MAY 2024
Demystifying Real-time Analytics, Search and Hybrid Search with Dhruba, CTO @Rockset

Demystifying Real-time Analytics, Search and Hybrid Search with Dhruba, CTO @Rockset

In this video, I talk to Dhruba, CTO @Rockset about search and realtime analytics. We discussed deep internals of Rockset, its architecture and why is it a great fit for search and realtime analytics use cases.

Chapters:
00:00 Introduction
02:45 The Evolution of Data Systems: From Hadoop to Rockset
07:30 Understanding Rockset: Real-Time Analytics and Search Defined
12:01 The Technical Edge: Rockset vs. Elasticsearch
18:16 Deep Dive into Rockset's Architecture and Internals
28:21 Partitioning, Hashing, and Data Distribution in Rockset
36:56 Exploring Hot Storage and Cache Layers
37:40 Why Hot Storage is Essential for Low Latency
39:05 Optimizing Data Storage with Compression and Delta Encoding
39:49 Balancing Cost and Performance in Data Storage
41:50 The Power of Converged Indexing in Rockset
45:50 Efficient Query Execution and Index Management
54:51 Leveraging Mutability for Real-Time Analytics
59:24 Deep Dive into Query Processing and Optimization
01:04:21 Understanding Joins and Reporting Queries in Rockset
01:12:23 Future Directions and Vector Search Innovations

Index Conference: https://rockset.com/index-conf/

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#rockset #elasticsearch #search #vectorsearch #realtime #databases #sql #joins #indexes
- 1 hr 14 min
- 16 MAY 2024
Rapidly Simulate Production Traffic ft. Michael Drogalis

Rapidly Simulate Production Traffic ft. Michael Drogalis

In this episode we explore how to Rapidly Simulate Production Traffic with Michael Drogalis, using his creation ShadowTraffic. I am sure you will be able to relate to all the different problems mentioned in this episode and like how ShadowTraffic aims to solve those problems.

I hope you like this conversation.

Chapters:
00:00 Welcome to The Geek Narrator Podcast: Exploring Deep Tech
00:18 The Challenge of Simulating Production Traffic
00:59 Introducing Shadow Traffic: A Solution to Data Simulation
02:34 Understanding the Problem Space of Data Simulation
06:03 How Shadow Traffic Works: A Deep Dive
08:17 The Power of Declarative Data Generation with Shadow Traffic
10:40 Shadow Traffic's Architecture and Deployment
13:02 Configuring Load Testing and Throttling with Shadow Traffic
15:47 Testing and Validation in Shadow Traffic
20:42 Mimicking Production Data Distribution with Shadow Traffic
26:48 Innovative Features for Stream Processing Testing
28:47 Shadow Traffic: Adding Faults to Data for Robust Testing
29:04 Antithesis and Shadow Traffic: A Synergistic Approach
32:46 The Challenge of Generating Realistic Test Data
40:04 Enhancing Observability in Data Generation
41:50 Customer-Driven Roadmap and Future Vision
45:27 Closing Thoughts

ShadowTraffic: https://shadowtraffic.io/
Contact Michael: https://shadowtraffic.io/contact.html

Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.

Database internals series: https://youtu.be/yV_Zp0Mi3xs

Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

Stay Curios! Keep Learning!

#kafka #s3 #postgres #testing #streamprocessing #loadtesting #chaostesting #demo
- 47 min