![](/assets/artwork/1x1-42817eea7ade52607a760cbee00d1495.gif)
59 min
![](/assets/artwork/1x1-42817eea7ade52607a760cbee00d1495.gif)
SRE at Google: Planet-scale observability - OpenObservability Talks S2E05 OpenObservability Talks
-
- Technology
Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability.
Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems.
The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg
Show Notes:
scale and size of Google Identity services operation
evolution from monitoring to observability
telemetry collection
SRE job description is changing
Google Dapper
Google Census
operating end-to-end observability at scale
flexibility vs. runbook in SRE
how SRE at google different
transition from monolith to MSA
Linux Foundation launching a DevOps bootcamp
Parca OSS launched
how to intro SRE culture
Resources:
Dapper paper: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Borg paper: Large-scale cluster management at Google with Borg
MonArch paper: Monarch: Google’s Planet-Scale In-Memory Time Series Database
SRE books
Systemantics
Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability.
Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems.
The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg
Show Notes:
scale and size of Google Identity services operation
evolution from monitoring to observability
telemetry collection
SRE job description is changing
Google Dapper
Google Census
operating end-to-end observability at scale
flexibility vs. runbook in SRE
how SRE at google different
transition from monolith to MSA
Linux Foundation launching a DevOps bootcamp
Parca OSS launched
how to intro SRE culture
Resources:
Dapper paper: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Borg paper: Large-scale cluster management at Google with Borg
MonArch paper: Monarch: Google’s Planet-Scale In-Memory Time Series Database
SRE books
Systemantics
59 min