
497 episodes

The Backend Engineering Show with Hussein Nasser Hussein Nasser
-
- Technology
Welcome to the Backend Engineering Show podcast with your host Hussein Nasser. If you like software engineering you’ve come to the right place. I discuss all sorts of software engineering technologies and news with specific focus on the backend. All opinions are my own.
Most of my content in the podcast is an audio version of videos I post on my youtube channel here http://www.youtube.com/c/HusseinNasser-software-engineering
Buy me a coffee
https://www.buymeacoffee.com/hnasr
🧑🏫 Courses I Teach
https://husseinnasser.com/courses
-
Your SSD lies but that's ok | Postgres fsync
fsync is a linux system call that flushes all pages and metadata for a given file to the disk. It is indeed an expensive operation but required for durability especially for database systems. Regular writes that make it to the disk controller are often placed in the SSD local cache to accumulate more writes before getting flushed to the NAND cells.
However when the disk controller receives this flush command it is required to immediately persist all of the data to the NAND cells.
Some SSDs however don't do that because they don't trust the host and no-op the fsync. In this video I explain this in details and go through details on how postgres provide so many options to fine tune fsync
0:00 Intro
1:00 A Write doesn’t write
2:00 File System Page Cache
6:00 Fsync
7:30 SSD Cache
9:20 SSD ignores the flush
9:30 15 Year old Firefox fsync bug
12:30 What happens if SSD loses power
15:00 What options does Postgres exposes?
15:30 open_sync (O_SYNC)
16:15 open_datasync (O_DSYNC)
17:10 O_DIRECT
19:00 fsync
20:50 fdatasync
21:13 fsync = off
23:30 Don’t make your API simple
26:00 Database on metal? -
The problem with software engineering
ego is the main problem to a defective software product. the ego of the engineer or the tech lead seeps into the quality of the product.
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
https://backend.husseinnasser.com -
2x Faster Reads and Writes with this MongoDB feature | Clustered Collections
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
https://database.husseinnasser.com
In version 5.3, MongoDB introduced a feature called clustered collection which stores documents in the _id index as oppose to the hidden wiredTiger hidden index. This eliminates an entire b+tree seek for reads using the _id index and also removes the additional write to the hidden index speeding both reads and writes.
However like we know in software engineering, everything has a cost. This feature does come with a few that one must be aware of before using it. In this video I discuss the following
How Original MongoDB Collections Work
How Clustered Collections Work
Benefits of Clustered Collections
Limitations of Clustered Collections
-
Prime Video Swaps Microservices for Monolith: 90% Cost Reduction
Prime video engineering team has posted a blog detailing how they moved their live stream monitoring service from microservices to a monolith reducing their cost by 90%, let us discuss this
0:00 Intro
2:00 Overview
10:35 Distributed System Overhead
21:30 From Microservices to Monolith
29:00 Scaling the Monolith
32:30 Takeaways
https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
https://backend.husseinnasser.com -
A Deep Dive in How Slow SELECT * is
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
https://database.husseinnasser.com
In a row-store database engine, rows are stored in units called pages. Each page has a fixed header and contains multiple rows, with each row having a record header followed by its respective columns. When the database fetches a page and places it in the shared buffer pool, we gain access to all rows and columns within that page. So, the question arises: if we have all the columns readily available in memory, why would SELECT * be slow and costly? Is it really as slow as people claim it to be? And if so why is it so? In this post, we will explore these questions and more.
0:00 Intro
1:49 Database Page Layout
5:00 How SELECT Works
10:49 No Index-Only Scans
18:00 Deserialization Cost
21:00 Not All Columns are Inline
28:00 Network Cost
36:00 Client Deserialization
https://medium.com/@hnasr/how-slow-is-select-8d4308ca1f0c -
AWS Serverless Lambda Supports Response Streaming
Lambda now supports Response payload streaming, now you can flush changes to the network socket as soon as it is available and it will be written to the client socket. I think this is a game changing feature
0:00 Intro
1:00 Traditional Lambda
3:00 Server Sent Events & Chunk-Encoding
5:00 What happens to clients?
6:00 Supported Regions
7:00 My thoughts
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
https://backend.husseinnasser.com