1 hr 1 min

#96 - Practical Guide to Implementing SRE and SLOs - Alex Hidalgo Tech Lead Journal

    • Technology

“Reliability is the most important thing. Your users define your reliability, so make sure you’re measuring the right thing. And 100% is out of the question, so pick the right target."

Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives”. In this episode, we discussed the practical guide on how to implement SRE and SLOs. Alex started by explaining the basic concept of service reliability and service truths. He then explained the concept of reliability stack, that includes the famous SRE concepts: SLI, SLO, and error budgets. Alex then shared his insights on how we can define a service reliability target, why a higher reliability target is expensive, and the risk of a service of being too reliable. Towards the end, Alex shared his tips on how we can build an SRE culture and how we can use the error budget as a communication tool within the organization.

Listen out for:


Career Journey - [00:07:19]
Understanding SRE & SLO - [00:14:17]
Service & Reliability - [00:17:30]
Service Truths - [00:21:06]
Reliability Stack - [00:23:45]
Defining Reliability Target - [00:27:11]
Higher Reliability is Expensive - [00:29:27]
SLI - [00:34:26]
Measuring Correctness - [00:37:30]
Critical User Journey - [00:41:49]
Being Too Reliable - [00:47:18]
Communicating with Error Budget - [00:51:02]
Building SRE Culture - [00:54:13]
3 Tech Lead Wisdom - [00:57:57]

_____

Alex Hidalgo’s Bio
Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives”. During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex’s previous jobs have included IT support, network security, restaurant work, t-shirt design, and hosting game shows at bars. When not sharing his passion for technology with others, you can find him scuba diving or watching college basketball. He lives in Brooklyn with his partner Jen and a rescue dog named Taco. Alex has a BA in philosophy from Virginia Commonwealth University.

Follow Alex:


Twitter – @ahidalgosre
Nobl9 – https://www.nobl9.com/
Website – https://www.alex-hidalgo.com/



Our Sponsors

DevTernity 2022 (devternity.com) is the top international software development conference with an emphasis on coding, architecture, and tech leadership skills. The lineup is truly stellar and features many legends of software development like Robert "Uncle Bob" Martin, Kent Beck, Scott Hanselman, Venkat Subramaniam, Kevlin Henney, and many others! The conference takes place online, and we have the 10% discount code for you: AWSM_TLJ.


Skills Matter is the global community and events platform for software professionals. It is an easier way for technologists to grow their careers by connecting you and your peers with the best-in-class tech industry experts and communities. You get on-demand access to their latest content, thought leadership insights as well as the exciting schedule of tech events running across all time zones.
Head on over to skillsmatter.com to become part of the tech community that matters most to you - it’s free to join and easy to keep up with the latest tech trends.



Like this episode?
Subscribe on your favorite podcast app and submit your feedback.
Follow @techleadjournal on LinkedIn, Twitter, and Instagram.
Pledge your support by becoming a patron.
For more info about the episode (including quotes and transcript), visit techleadjournal.dev/episodes/96.

“Reliability is the most important thing. Your users define your reliability, so make sure you’re measuring the right thing. And 100% is out of the question, so pick the right target."

Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives”. In this episode, we discussed the practical guide on how to implement SRE and SLOs. Alex started by explaining the basic concept of service reliability and service truths. He then explained the concept of reliability stack, that includes the famous SRE concepts: SLI, SLO, and error budgets. Alex then shared his insights on how we can define a service reliability target, why a higher reliability target is expensive, and the risk of a service of being too reliable. Towards the end, Alex shared his tips on how we can build an SRE culture and how we can use the error budget as a communication tool within the organization.

Listen out for:


Career Journey - [00:07:19]
Understanding SRE & SLO - [00:14:17]
Service & Reliability - [00:17:30]
Service Truths - [00:21:06]
Reliability Stack - [00:23:45]
Defining Reliability Target - [00:27:11]
Higher Reliability is Expensive - [00:29:27]
SLI - [00:34:26]
Measuring Correctness - [00:37:30]
Critical User Journey - [00:41:49]
Being Too Reliable - [00:47:18]
Communicating with Error Budget - [00:51:02]
Building SRE Culture - [00:54:13]
3 Tech Lead Wisdom - [00:57:57]

_____

Alex Hidalgo’s Bio
Alex Hidalgo is the Principal Reliability Advocate at Nobl9 and author of “Implementing Service Level Objectives”. During his career he has developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex’s previous jobs have included IT support, network security, restaurant work, t-shirt design, and hosting game shows at bars. When not sharing his passion for technology with others, you can find him scuba diving or watching college basketball. He lives in Brooklyn with his partner Jen and a rescue dog named Taco. Alex has a BA in philosophy from Virginia Commonwealth University.

Follow Alex:


Twitter – @ahidalgosre
Nobl9 – https://www.nobl9.com/
Website – https://www.alex-hidalgo.com/



Our Sponsors

DevTernity 2022 (devternity.com) is the top international software development conference with an emphasis on coding, architecture, and tech leadership skills. The lineup is truly stellar and features many legends of software development like Robert "Uncle Bob" Martin, Kent Beck, Scott Hanselman, Venkat Subramaniam, Kevlin Henney, and many others! The conference takes place online, and we have the 10% discount code for you: AWSM_TLJ.


Skills Matter is the global community and events platform for software professionals. It is an easier way for technologists to grow their careers by connecting you and your peers with the best-in-class tech industry experts and communities. You get on-demand access to their latest content, thought leadership insights as well as the exciting schedule of tech events running across all time zones.
Head on over to skillsmatter.com to become part of the tech community that matters most to you - it’s free to join and easy to keep up with the latest tech trends.



Like this episode?
Subscribe on your favorite podcast app and submit your feedback.
Follow @techleadjournal on LinkedIn, Twitter, and Instagram.
Pledge your support by becoming a patron.
For more info about the episode (including quotes and transcript), visit techleadjournal.dev/episodes/96.

1 hr 1 min

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
In Her Ellement
Boston Consulting Group BCG
Acquired
Ben Gilbert and David Rosenthal
Hard Fork
The New York Times
Deep Questions with Cal Newport
Cal Newport