Most companies are doing SRE wrong.
Hiring SREs doesn’t make you reliable. Metrics dashboards don’t guarantee accountability. And cultural change doesn’t happen because you wrote it on a slide deck.
In this episode, Duncan Mapes and Jason Ehmke push back against the misconceptions. They argue that SRE isn’t a bolt-on team but a systemic shift in how engineering works. Without shared accountability, meaningful metrics, and cultural buy-in, SRE will fail.
And no, copying Google’s model isn’t the answer.
If you think SRE is just a headcount play, this episode will challenge everything you believe. Got a different perspective? Drop us a review, share your comments, and send your toughest SRE questions our way.
Top Takeaways:
- SRE is a complex practice that varies across organizations.
- Defining SRE upfront can prevent chaos later.
- SRE is not just about taking over responsibilities; it's about collaboration.
- The role of SREs is to guide and support application teams.
- Key metrics for SRE success include mean time to detect and restore.
- Cultural transformation is essential for successful SRE implementation.
- Finding early wins can help demonstrate the value of SRE.
- Effective communication is crucial for SREs to succeed.
- SRE teams should focus on toil reduction and automation.
- Building a strong relationship between SREs and app teams is vital.
Mentioned in this Episode:
Site Reliability Engineering: How Google Runs Production Systems - https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/
Connect with us:
Duncan Mapes
Jason Ehmke
DevGrid.io
DevGrid on LinkedIn
DevGrid on X
Information
- Show
- FrequencyUpdated Weekly
- PublishedSeptember 15, 2025 at 9:00 AM UTC
- Length44 min
- Episode19
- RatingClean