The brutal truth about digital performance engineering and operations.
Andreas (aka Andi) Grabner and Brian Wilson are veterans of the digital performance world. Combined they have seen too many applications not scaling and performing up to expectations. With more rapid deployment models made possible through continuous delivery and a mentality shift sparked by DevOps they feel it’s time to share their stories. In each episode, they and their guests discuss different topics concerning performance, ranging from common performance problems for specific technology platforms to best practices in development, testing, deploying and monitoring software performance and user experience. Be prepared to learn a lot about metrics.
Andi & Brian both work at Dynatrace, where they get to witness more real world customer performance issues than they can TPS report at.
The SLO Dilemma: Slight Reliability Discussions with Stephen Townshend
For some out there SLOs (Service Level Objectives) are the silver bullet to building and operating reliable software. But nothing is as shiny on the inside as it looks on the outside.
In this episode we invited Stephen Townshend, former Performance Engineer now converted to Site (Slight) Reliability. Stephen (@the_kiwi_sre) has experienced the tough side of establishing SLOs within an organization. It’s a constant battle between focusing on reliability and new features and a lack of change in culture.
Listen in and learn about the 9 pre-requisites for SLOs that Stephen has identified such as: having a certain level of observability, define clear business objectives, define ownership and give autonomy or establishing a blameless culture
Stephen on Linked in
Stephen on Twitter
Here the additional resources we brought up during our talk:
Slight Reliability YouTube: https://www.youtube.com/c/SlightReliability
Slight Reliability Podcast: https://www.buzzsprout.com/1698445
Our LinkedIn discussion: https://www.linkedin.com/posts/scottmooreconsulting_7-steps-to-identify-and-implement-effective-activity-6938919857459462144--RI7
The State of Cloud Native Security with Anais Urlichs
Security is everyone’s business. And as everyone seems to be moving to Cloud Native it's important to understand what the security landscape in k8s, containerized apps, serverless, … looks like.
To learn more about this we invited Anais Urlichs (@urlichsanais), Developer Advocate at Aqua Security and CNCF Ambassador of the year 2021. Over the past years Anais has educated thousands of people on cloud native, devops and security on her YouTube Channel.
Tune in and learn more about the different approaches to security in cloud native, which open source projects are out there and how her advise on embedding security in your day2day work.
Some additional links we discussed can be found here:
Anais on Linkedin: https://www.linkedin.com/in/urlichsanais/
Anais on Twitter: https://twitter.com/urlichsanais
Weekly DevOps Newsletter: https://anaisurl.com/
WTFisSRE Talk: https://www.youtube.com/watch?v=0zL61AiOaK0
Anais’s YouTube channel: https://www.youtube.com/c/AnaisUrlichs
Aqua Open Source YouTube Channel: https://www.youtube.com/channel/UCb4mfRT5UWpjoUQRcIE2qOQ
DevOps is 80% culture: But what does this really mean with April Edwards
While this episode started out with a recap of April Edwards (@TheAprilEdwards) keynote called “Putting the Ops into DevOps” we quickly got April talk about what measures Microsoft has set to embrace the cultural change needed for their DevOps transformation: Every service has a public health dashboard, putting the customer in the center, make products open source, eat your own dog food, align your objectives with the team, …
Besides this great conversation that finally gave some great input on what cultural change really looks like we learned from her background in Ops, moving to Dev, getting into the cloud and now inspiring Ops teams to have it easier in their job using automation. Tune in, learn and get inspired.
We also talked about the late Abel Wang and how Microsoft UK is supporting Girls Who Code.
April on Linkedin
April on Twitter
Putting the Ops into DevOps keynote
Supporting Girls Who Code in memory of Abel Want
Introducing OpenFeature – Stepping into the footsteps of OpenTelemetry with Mike Beemer and Todd Baert
Feature Flagging has gained a lot of momentum which we can observe by counting the number of feature flagging solutions. To ensure a good developer experience when implementing feature flags the CNCF OpenFeature project was launched during KubeCon 2022 in Valencia. It is aiming to provide a feature flag standard similar to what OpenTelemetry did for Observability.
Tune in to this podcast where we have two of the founding members Mike Beemer and Todd Baert explain why it was the right time to initiate the project, which problems it solves and what use cases feature flagging brings to organizations.
If you want to learn more about the project check out the following resources discussed during the podcast
ITPro Today Launch Coverage: https://www.itprotoday.com/testing-and-quality-assurance/open-source-openfeature-project-takes-flight-advance-feature-flags
Getting Started with Chaos Engineering through Game Days with Mandi Walls
How do you plan for unplanned work such as fixing systems when they unexpectedly break in production? Just like firefighters – the best approach to practice those situations so that you are better prepared when they happen.
In this episode we have Mandi Walls, DevOps Advocate at PagerDuty, explain why she loves Game Days where she is “practicing for the weird things that might happen”. Prior to her current role she worked for Chef and AOL – picking up a lot of the things she is now advocating for. In our conversation Mandi (@lnxchk) gives us insights into how to best prepare and run game days, shared her thoughts on what good chaos scenarios (unreliable backend, slow dns …) are and which health metrics (team health, # incidents out of hours, …) to look at in your current incident response to figure out what a good game day scenario actually is.
Mandi on Linkedin: https://www.linkedin.com/in/mandiwalls/
In our talk we mentioned a couple of resources – here they are:
Mandi’s talk at DevOpsDays Raleigh: https://devopsdays.org/events/2022-raleigh/program/mandi-walls
Ops Guides: https://www.pagerduty.com/ops-guides/
Why SREs are not your new Sys Admins with Hilliary Lipsig
“The most significant body of my SRE work is architectural reviews, disaster and failover planning and help with SLIs and SLOs of applications that would like to become SRE supported.”
This statement comes from Hilliary Lipsig, Principal SRE at Red Hat, as her introduction to what the role of an SRE should be. Hilliary and her teams are helping organizations getting their applications cloud native ready so that the operational aspect of keeping a system up & running and within Error Budgets can be handled by an SRE Team.
Listen in to this episode and learn about the key advices she has for every organization that wants to build and operate resilient systems. And understand why every suggestion she makes has to be and will always be evidence-based!
In the talk we mentioned a couple of tools and practices. Here are the links:
Hilliary on Linkedin: https://www.linkedin.com/in/hilliary-lipsig-a5935245/
Listen to talk Helm and Back again: an SRE Guide to choosing from DevConf.cz: https://www.youtube.com/watch?v=HQuK6txYS3g
Get your stopwatches ready!
PurePerformance is an essential testing, performance, digital transformation, and application tech podcast. Part good performance reminders, part funny conversations, and part amazing performance tips, this podcast keeps it fun and interesting.