
47 episodes

Slight Reliability Stephen Townshend
-
- Technology
-
-
5.0 • 2 Ratings
-
Learning SRE, one day at a time.
-
Slight Reliability Episode 47 - Cloud Dependency Reliability with Jeff Martens and Ryan Duffield
In this episode Stephen Townshend discusses our increased dependency on third party cloud services and what this means for reliability with Jeff Martens and Ryan Duffield from https://metrist.io/.
You can find Jeff...
On LinkedIn: https://www.linkedin.com/in/jmartens/
On Twitter: https://twitter.com/Jmartens
You can find Ryan...
On StackOverflow: https://stackoverflow.com/users/2696/ryan-duffield
On GitHub: https://github.com/rduffield
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre -
Slight Reliability Episode 46 - Raw Telemetry
In this episode I propose the use of scatterplots of raw data to better understand how our systems are behaviour and what our customers are experiencing. The ideas from this episode come from my time as a performance engineer and working with legends in that space Richard Leeke (https://www.linkedin.com/in/richard-leeke-450448/) and Neil Davies (https://www.linkedin.com/in/neildaviesnz/).
For some basic examples of scatterplots and what they show you versus line charts check out an article I wrote back in 2017 called "Let's Talk About Averages": https://www.linkedin.com/pulse/lets-talk-averages-stephen-townshend/
Another proponent of scatterplots is Stijn Schepers (https://www.linkedin.com/in/stijnschepers/). Here's an article he wrote about it in 2019: https://www.linkedin.com/pulse/performance-testing-act-like-detective-use-raw-data-stijn-schepers/
Neil Davies' article on tornado scatters "Chasing Tornadoes" can be found here: http://www.performance-workshop.org/wp/wp-content/uploads/2013/12/Chasing_Tornadoes_Davies.pdf
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre -
Slight Reliability Episode 45 - Telemetry Fluency with Paige Cruz
In this episode we discuss uplifting telemetry knowledge within engineering teams to enrich their work (and their lives) with Paige Cruz from Chronosphere. We cover why not to take a chainsaw to your observability in order to cut costs, the dark side of auto-instrumentation, story telling with live data, and much more.
The book that Paige recommends at the end is "Effecting Monitoring and Alerting for Web Operations": https://www.oreilly.com/library/view/effective-monitoring-and/9781449333515/
You can check out Chronosphere here: https://chronosphere.io/
You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre -
Slight Reliability Episode 44 - Cognitive Overload with Paige Cruz
In this episode we discuss cognitive overload in SRE with Paige Cruz from Chronosphere. We cover both what cognitive load is, what causes it, as well as some potential antidotes and preventative measures.
You can check out Chronosphere here: https://chronosphere.io/
You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre -
Slight Reliability Episode 43 - Beyond Observability
In this episode I discuss my "bigger picture" perspective of what observability needs to be, and why it's important we include business and customer into what we monitor in the Digital Era.
The books I highlight in this episode are...
Observability Engineering https://www.oreilly.com/library/view/observability-engineering/9781492076438/
Sooner, Safer, Happier: https://soonersaferhappier.com/book/
The Phoenix Project https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/
The Unicorn Project https://www.oreilly.com/library/view/the-unicorn-project/9781098124175/
Accelerate: https://www.oreilly.com/library/view/accelerate/9781457191435/
You can grab a copy of the 2022 State of DevOps report at: https://cloud.google.com/devops/state-of-devops
The blog I mentioned was The Insight Industrial Complex: https://benn.substack.com/p/insight-industrial-complex
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre -
Slight Reliability Episode 42 - Reliability Insights with José Velez
In this episode we speak to José Velez from Rely about reliability at scale, a top down approach to SLOs, the potential and limitations of AI and ML in operations, the question of service ownership, utilising the business criticality of services in how we monitor the underlying infrastructure, and much more.
You can check out Rely at https://www.rely.io/
You can find José on LinkedIn: https://www.linkedin.com/in/josevelez-relyio/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre