74 episodes

The GeekNarrator podcast is a show hosted by Kaivalya Apte who is a Software Engineer and loves to talk about Technology, Technical Interviews, Self Improvement, Best Practices and Hustle.

Connect with Kaivalya Apte https://www.linkedin.com/in/kaivalya-apte-2217221a

Tech blogs: https://kaivalya-apte.medium.com/

Wanna talk? Book a slot here: https://calendly.com/speakwithkv/hey

Enjoy the show and please follow to get more updates. Also please don’t forget to rate and review the show.

Cheers

The GeekNarrator Kaivalya Apte

    • Technology

The GeekNarrator podcast is a show hosted by Kaivalya Apte who is a Software Engineer and loves to talk about Technology, Technical Interviews, Self Improvement, Best Practices and Hustle.

Connect with Kaivalya Apte https://www.linkedin.com/in/kaivalya-apte-2217221a

Tech blogs: https://kaivalya-apte.medium.com/

Wanna talk? Book a slot here: https://calendly.com/speakwithkv/hey

Enjoy the show and please follow to get more updates. Also please don’t forget to rate and review the show.

Cheers

    Scaling Derived Data for Planet-Scale Applications at Linkedin

    Scaling Derived Data for Planet-Scale Applications at Linkedin

    In this video I speak with Felix GV, who is a Principal Staff Engineer at Linkedin, and has done major contributions to the data infrastructure and Linkedin, including VeniceDB.

    This episode will give you a good understanding of why we need a new database for storing "Derived Data" in a low latency, high performance manner, which is very important for Machine Learning workloads.

    Chapters:
    00:00 Introduction
    01:42 The Evolution of LinkedIn's Databases
    03:15 Challenges with Voldemort and the Birth of VeniceDB
    08:42 Understanding Derived Data
    13:33 Planet-Scale Applications and Multi-Region Support
    17:40 Writing Data into VeniceDB
    22:53 Merging Data in VeniceDB
    40:31 Understanding the Architecture
    40:47 Components of the Write Path
    41:56 Leader and Follower Architecture
    43:58 Partitioning and DaVinci Client
    47:57 Read Patterns and Client Options
    54:25 Fault Tolerance and Recommender Systems
    01:01:19 Kafka Integration and Deployment
    01:06:56 Roadmap and Future Improvements

    Important links:
    VeniceDB blog: https://www.linkedin.com/blog/engineering/open-source/open-sourcing-venice-linkedin-s-derived-data-platform
    VeniceDB docs: https://venicedb.org/
    Qcon: https://youtu.be/pJeg4V3JgYo?si=vblGUxp5fNdKPHoC

    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #kafka #linkedin #venicedb #Rocksdb

    • 1 hr 12 min
    SuperCharging PostgreSQL for Search and Analytics - ParadeDB (Philippe Noël)

    SuperCharging PostgreSQL for Search and Analytics - ParadeDB (Philippe Noël)

    In this video I speak with Philippe Noël, about ParadeDB, which is an Elasticsearch alternative built on Postgres, modernizing the features of Elasticsearch's product suite, starting with real-time search and analytics.

    I hope you will enjoy and learn about the product.

    Chapters:
    00:00 Introduction
    01:12 Challenges with Elasticsearch and the Need for ParadeDB
    02:29 Why Postgres?
    06:30 Technical Details of ParadeDB's Search Functionality
    18:25 Analytics Capabilities of ParadeDB
    24:00 Understanding ParadeDB Queries and Transactions
    24:22 Application Logic and Data Workflows
    25:14 Using PG Cron for Data Migration
    30:05 Scaling Reads and Writes in Postgres
    31:53 High Availability and Distributed Systems
    34:31 Isolation of Workloads
    39:38 Database Upgrades and Migrations
    41:21 Using ParadeDB Extensions and Distributions
    43:02 Observability and Monitoring
    44:42 Upcoming Features and Roadmap
    46:34 Final Thoughts

    Important links:
    Links:
    GitHub: https://github.com/paradedb/paradedb
    Website: https://paradedb.com
    Docs: https://docs.paradedb.com/
    Blog: https://blog.paradedb.com

    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #postgresql #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign #elasticsearch

    • 46 min
    Modern OLAP Database System Design with FDAP (Andrew Lamb)

    Modern OLAP Database System Design with FDAP (Andrew Lamb)

    In this video I speak with Andrew Lamb, Staff Software Engineer @Influxdb. We discuss FDAP (Flight, DataFusion, Arrow, Parquet) stack for modern OLAP database system design. Andrew shared some insights into why the FDAP stack is so powerful in designing and implementing a modern OLAP database.

    Chapters:
    00:00 Introduction
    01:48 Understanding Analytics: Transactional vs Analytical Databases
    04:41 The Genesis and Goals of the FDAP Stack
    09:31 Decoding FDAP: Flight, Data Fusion, Arrow, and Parquet
    12:40 Apache Parquet: Revolutionizing Columnar Storage
    17:18 Apache Arrow: The In-Memory Game Changer
    23:51 Interoperability and Migration with Apache Arrow
    27:10 Comparing Apache Parquet and Arrow
    28:26 Exploring Data Mutability in Analytic Systems
    29:19 Handling Data Updates and Deletions
    29:24 The Role of Immutable Storage in Analytics
    30:42 Optimizing Data Storage and Mutation Strategies
    34:20 Introducing Flight: Simplifying Data Transfer
    35:02 Deep Dive into Flight's Benefits and SQL Support
    39:20 Unpacking Data Fusion's SQL Support and Extensibility
    46:12 The Interplay of FDAP Components in Analytics
    51:49 Future Directions and Innovations in Data Analytics
    56:04 Concluding Thoughts on FDAP and Its Impact

    FDAP Stack: https://www.influxdata.com/glossary/fdap-stack/
    FDAP Blog: https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/
    InfluxDB: https://www.influxdata.com/

    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #datafusion #parquet #sql #OLAP #apachearrow #database #systemdesign

    • 56 min
    The ultimate multi-model Database, SurrealDB with Pratim Bhosale

    The ultimate multi-model Database, SurrealDB with Pratim Bhosale

    In this video I and Pratim Bhosale, Developer Advocate at SurrealDB, talk about SurrealDB, a multi-model database which aims to make Developer’s life easier by letting them focus mainly on the business logic and not on the Database choice. Following chapters will help you understand what is a multi-model database and how SurrealDB shines.

    Chapters:
    00:00 Introduction
    01:48 The Genesis of SurrealDB
    03:59 SurrealDB's Mission and Use Cases
    07:34 Understanding Multi-Model Databases
    10:30 Deep Dive into SurrealDB's Architecture
    33:09 Deployment and Getting Started with SurrealDB
    34:31 Future Developments and Use Case Considerations
    43:51 Final Thoughts and How to Get Started


    Important links:

    Install SurrealDB
    https://sdb.li/4bqwn38

    SurrealDB Docs:
    https://sdb.li/3wxjoxx

    SurrealDB Website:
    https://sdb.li/3JMK7JI

    Surrealist:
    https://sdb.li/4b7wcdh

    SurrealDB GitHub:
    https://sdb.li/3JRPNlE


    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #surrealdb #elasticsearch #search #vectorsearch #acid #databases #sql #joins #indexes #graphdatabase

    • 46 min
    Demystifying Real-time Analytics, Search and Hybrid Search with Dhruba, CTO @Rockset

    Demystifying Real-time Analytics, Search and Hybrid Search with Dhruba, CTO @Rockset

    In this video, I talk to Dhruba, CTO @Rockset about search and realtime analytics. We discussed deep internals of Rockset, its architecture and why is it a great fit for search and realtime analytics use cases.

    Chapters:
    00:00 Introduction
    02:45 The Evolution of Data Systems: From Hadoop to Rockset
    07:30 Understanding Rockset: Real-Time Analytics and Search Defined
    12:01 The Technical Edge: Rockset vs. Elasticsearch
    18:16 Deep Dive into Rockset's Architecture and Internals
    28:21 Partitioning, Hashing, and Data Distribution in Rockset
    36:56 Exploring Hot Storage and Cache Layers
    37:40 Why Hot Storage is Essential for Low Latency
    39:05 Optimizing Data Storage with Compression and Delta Encoding
    39:49 Balancing Cost and Performance in Data Storage
    41:50 The Power of Converged Indexing in Rockset
    45:50 Efficient Query Execution and Index Management
    54:51 Leveraging Mutability for Real-Time Analytics
    59:24 Deep Dive into Query Processing and Optimization
    01:04:21 Understanding Joins and Reporting Queries in Rockset
    01:12:23 Future Directions and Vector Search Innovations

    Index Conference: https://rockset.com/index-conf/

    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #rockset #elasticsearch #search #vectorsearch #realtime #databases #sql #joins #indexes

    • 1 hr 14 min
    Rapidly Simulate Production Traffic ft. Michael Drogalis

    Rapidly Simulate Production Traffic ft. Michael Drogalis

    In this episode we explore how to Rapidly Simulate Production Traffic with Michael Drogalis, using his creation ShadowTraffic. I am sure you will be able to relate to all the different problems mentioned in this episode and like how ShadowTraffic aims to solve those problems.

    I hope you like this conversation.

    Chapters:
    00:00 Welcome to The Geek Narrator Podcast: Exploring Deep Tech
    00:18 The Challenge of Simulating Production Traffic
    00:59 Introducing Shadow Traffic: A Solution to Data Simulation
    02:34 Understanding the Problem Space of Data Simulation
    06:03 How Shadow Traffic Works: A Deep Dive
    08:17 The Power of Declarative Data Generation with Shadow Traffic
    10:40 Shadow Traffic's Architecture and Deployment
    13:02 Configuring Load Testing and Throttling with Shadow Traffic
    15:47 Testing and Validation in Shadow Traffic
    20:42 Mimicking Production Data Distribution with Shadow Traffic
    26:48 Innovative Features for Stream Processing Testing
    28:47 Shadow Traffic: Adding Faults to Data for Robust Testing
    29:04 Antithesis and Shadow Traffic: A Synergistic Approach
    32:46 The Challenge of Generating Realistic Test Data
    40:04 Enhancing Observability in Data Generation
    41:50 Customer-Driven Roadmap and Future Vision
    45:27 Closing Thoughts

    ShadowTraffic: https://shadowtraffic.io/
    Contact Michael: https://shadowtraffic.io/contact.html

    Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator

    If you like this episode, please hit the like button and share it with your network.
    Also please subscribe if you haven't yet.

    Database internals series: https://youtu.be/yV_Zp0Mi3xs

    Popular playlists:
    Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-

    Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17

    Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d

    Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN

    Stay Curios! Keep Learning!

    #kafka #s3 #postgres #testing #streamprocessing #loadtesting #chaostesting #demo

    • 47 min

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
Apple Events (video)
Apple
科技浪 Tech.wav
哈利
Lex Fridman Podcast
Lex Fridman
The TED AI Show
TED
Waveform: The MKBHD Podcast
Vox Media Podcast Network

You Might Also Like

Data Engineering Podcast
Tobias Macey
Software Engineering Daily
Software Engineering Daily
The Changelog: Software Development, Open Source
Changelog Media
The Stack Overflow Podcast
The Stack Overflow Podcast
CoRecursive: Coding Stories
Adam Gordon Bell - Software Developer
Go Time: Golang, Software Engineering
Changelog Media