A monthly podcast about Chaos Engineering, presented by Gremlin, Inc. Find us on Twitter at @BTOPpod.
This episode we speak with Kelsey Hightower. Kelsey is a Principal Developer Advocate at Google.
Topics include: Promise Theory, is Kubernetes hard, running databases on Kubernetes, the meat cloud, empathy sessions, how Kubernetes has helped standardize Ops practices, learning from failure at scale at Google, and the importance of the Inclusion part of D&I.
This episode we speak with Kolton Andrus, the CEO and co-founder of Gremlin. Topics include: The role of a Call Leader in incidents, using Chaos Engineering as runtime validation, FIT and application level fault injection, Jesse Robbins and early experiments at Amazon, oncall training, Lineage Driven Fault Injection (LDFI), the value of looking at real traffic instead of synthetic transactions, and the challenges people face when starting to do Chaos Engineering.
This episode we speak with Haley Tucker. Haley is a Senior Software Engineer on the Resilience Engineering team at Netflix. Topics include: Running Chaos Engineering experiments as A/B tests, testing dependencies, fallbacks, testing in production, and why Chaos Monkey is less interesting at Netflix now.
This episode we speak with Matthew Simons. Matthew is a Senior Product Development Manager at Workiva and he leads the Quality Assessment team there. Topics include: Supporting and encouraging reliability at Workiva, why Workiva moved from App Engine to EKS, how to tighten the customer feedback loop, how Chaos Engineering can help folks who are oncall, and fatal optimism and the asteroid that may hit the Earth in the year 2181 (it’s real y’all).
This episode we speak with Subbu Allamaraju. Subbu is a Senior Technologist at the Expedia Group. Topics include: Learning from incidents, changing culture, Why Complex Systems Fail, drifting into failure, forming a hypothesis, showing value from your reliability work, and the importance of understanding how your business makes money.
This episode we speak with Adrian Hornsby, a Senior Tech Evangelist at Amazon Web Services. Topics include: Curiosity and breaking things, the cost of downtime, Jesse Robbins and early failure injection at Amazon, making the case to management for Chaos Engineering, forming a hypothesis, and random experiments vs Game Days.
Customer ReviewsSee All
It's been so great being able to hear from different companies and find out what they are working on and lessons learned.
Excellent listen, thanks for sharing these customer stories.
First Chaos Engineering Podcast
Really neat to see this podcast for the wider engineering community to learn about Chaos Engineering and SRE