This is Fine! A podcast about resilience engineering and software

Colette Alexander and Clint Byrum

A podcast about resilience engineering and software. Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help!  Write us anonymously at our open question form Email us at: thisisfine.softwarepodcast@gmail.com Call us and leave a voicemail, or text us at: ‪(401) 592-7574‬

  1. 2D AGO

    Interviewing for Incident Analysis w/special guest John Allspaw

    The new website is live! thisisfinepod.com You can find John Allspaw at Adaptive Capacity Labs: https://www.adaptivecapacitylabs.com Mike McGill, the skateboarder: https://en.wikipedia.org/wiki/Mike_McGill Annie Duke’s Thinking in Bets, referenced by our question-asker is a great one: https://bookshop.org/p/books/thinking-in-bets-making-smarter-decisions-when-you-don-t-have-all-the-facts-annie-duke/31466984521c3d8a?ean=9780735216372&next=t Naturalistic Decision Making has its own association, which has a ton of resources (and a conference!) - https://naturalisticdecisionmaking.org/ They also have a podcast! https://naturalisticdecisionmaking.org/new-podcast/ Gary Klein is the NDM guy - https://bookshop.org/p/books/seeing-what-others-don-t-the-remarkable-ways-we-gain-insights-chief-scientist-gary-klein/c4ae5e017fe005ff?ean=9781610393829&next=t We contrast him and his style of approaching cognition and decision making with Kahneman and Tversky. Kahneman and Tversky wrote a lot, but Judgement Under Uncertainty is probably the most famous? https://www.science.org/doi/abs/10.1126/science.185.4157.1124 And Kahneman wrote Thinking Fast and Slow: https://bookshop.org/p/books/thinking-fast-and-slow-daniel-kahneman-phd/83a544fe6f98df87?ean=9780606275644&next=t It has been zero episodes since we’ve mentioned Lisanne Bainbridge’s Ironies of Automation: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf But also she has Verbal Reports as evidence of the process operator’s knowledge: https://www.sciencedirect.com/science/article/abs/pii/S1071581979603075?via%3Dihub And the Etsy Debriefing Guide is super great: https://extfiles.etsy.com/DebriefingFacilitationGuide.pdf Sidney Dekker and The Field Guide are foundational: https://bookshop.org/p/books/the-field-guide-to-understanding-human-error-sidney-dekker/3a4209dfc8b3a721?ean=9781472439055&next=t From Dekker’s field guide (pg 47) there is a list referencing Gary Klein’s questions for an incident investigation: Cues:  What were you seeing? What were you focusing on? What were you expecting to happen? Interpretation:  If you had to describe the situation to your colleague at that point, what would you have told? Errors:  What mistakes (for example in interpretation) were likely at this point? Previous experience/knowledge: Were you reminded of any previous experience? Did this situation fit a standard scenario? Were you trained to deal with this situation? Were there any rules that applied clearly here? Did any other sources of knowledge suggest what to do? Goals: What were you trying to achieve? Were there multiple goals at the same time? Was there time pressure or other limitations on what you could do? Taking action: How did you judge you could influence the course of events? Did you discuss or mentally imagine a number of options or did you know straight away what to do? Outcome: Did the outcome fit your expectation? Did you have to update your assessment of the situation? John mentioned Uptime Labs, who do staged worlds for software incidents: https://uptimelabs.io/ Facets of Complexity in Situated Work is here: https://www.researchgate.net/publication/345523195_Facets_of_Complexity_in_Situated_Work On the Jamie Zawinski quote: https://regex.info/blog/2006-09-15/247 If you don’t know the parable of the blind men and the elephant: https://en.wikipedia.org/wiki/Blind_men_and_an_elephant

    1h 1m
  2. MAY 4

    Paper Club: Two Years Before the Mast w/special guest Eric Dobbs

    Mitchell Hashimoto’s post on leaving Github: https://mitchellh.com/writing/ghostty-leaving-github The Reddit post on Github’s availability historically (that we find questionable): https://www.reddit.com/r/github/comments/1rnvhs9/githubs_historic_downtime_scraped_and_plotted/ A reminder, the Messy 9 are: congestion, cascade, conflict, lag, saturation, friction, tempo, surprise, tangles We have sometimes loved his stuff, but Gergely is annoying us with these posts: https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for?r=78c7k&utm_medium=email https://x.com/GergelyOrosz/status/2048017382036082706 You can find the RISF store with Hindsight Bias merch here: https://www.bonfire.com/store/risf/ You can find a copy of Richard Cook’s Two Years Before the Mast at Lorin’s Blog: https://surfingcomplexity.blog/wp-content/uploads/2026/03/twoyearsbeforethemast.pdf A reminder, Richard Cook’s How Complex Systems Fail can be found at http://how.complexsystems.fail Some writing on the 1996 Annenberg conference: https://www.researchgate.net/publication/351953417_Coming_Together_The Folk models paper (not by Woods, by Dekker and Hollnagel), which is specifically targeting Situational Awareness as being a folk model: https://link.springer.com/article/10.1007/s10111-003-0136-9 Some stuff about SNAFU Catchers: https://www.snafucatchers.com/ And https://snafucatchers.github.io/ Eric referenced our conversation with Beth Long about Building and Revising Adaptive Capacity, which she co-wrote with Richard Cook about New Relic’s real-life example of resilience engineering: https://youtu.be/A_rU4-M61Hk  and https://www.sciencedirect.com/science/article/abs/pii/S0003687020301903?via%3Dihub for the paper Erik Hollnagel’s RAG get’s referenced: https://erikhollnagel.com/onewebmedia/RAG%20Outline%20V2.pdf Once again, we link you to Lorin’s Law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/ Eric is referencing Lund, that is their Human Factors and Systems Safety program: https://www.humanfactors.lth.se/ Check out Crisis Engineering! https://crisisengineering.layeraleph.com/crisis-engineering-the-book/ The upcoming RISF event on Practice of Practice Gamelan: https://resilienceinsoftware.org/events/245030

    45 min
  3. APR 14

    SRECon Americas 2026 Recap

    Colette’s talk at SRECon intro: https://www.usenix.org/conference/srecon26americas/presentation/alexander Clint’s talk at SRECon intro: https://www.usenix.org/conference/srecon26americas/presentation/byrum Dan Slimmon is an excellent engineer (per Clint’s shoutout) and ALSO an excellent podcast creator/host: https://techblows.net/ Michelle Brush’s Keynote summary is here: https://www.usenix.org/conference/srecon26americas/presentation/brush Jevon’s Paradox: https://en.wikipedia.org/wiki/Jevons_paradox Dr. Nicole Forsgren’s talk summary: https://www.usenix.org/conference/srecon26americas/presentation/forsgren DORA is always worth a dive into if you haven’t taken a look yet: https://dora.dev/ The blog post Colette mentioned comparing AI gold rush to Mao’s Revolution: https://leehanchung.github.io/blogs/2026/04/05/the-ai-great-leap-forward/ Many people have written about why MTTR is a bad metric to track, you can read a write up from Adrian Hornsby here: https://newsletter.resiliumlabs.com/p/mttr-problems-better-incident-metrics And watch the OG, Courtney Nash, speak about it here: https://www.youtube.com/watch?v=uhCgBOHo8EY Beth Long’s SRE Soundbath: https://www.usenix.org/conference/srecon26americas/presentation/long Vanessa Huerta-Granda’s talk is summarized here: https://www.usenix.org/conference/srecon26americas/presentation/huerta-granda Martin Smith and Abe Hoffman’s talk is summarized here: https://www.usenix.org/conference/srecon26americas/presentation/hoffman Some information about Metrist: https://vault42consulting.com/about/portfolio/metrist AI Agents Good Bad and Ugly talk: https://www.usenix.org/conference/srecon26americas/presentation/budichenko The CAST talk: https://www.usenix.org/conference/srecon26americas/presentation/barroso Engineering a Safer World by Nancy Leveson is worth a look: https://bookshop.org/p/books/engineering-a-safer-world-systems-thinking-applied-to-safety-nancy-g-leveson/57b01ef464f9f81b?ean=9780262533690&next=t Erik Hollnagel wrote the book on FRAM and it has a lot of support in the safety world across industries: https://functionalresonance.com/ and https://etn-peter.eu/2021/02/11/fram-in-a-nutshell/ are good resources. Daria Barteneva’s closing keynote on game theory and SRE was great: https://www.usenix.org/conference/srecon26americas/presentation/barteneva Some good stuff on Above the Line/Below the Line, if you’re curious:  https://queue.acm.org/detail.cfm?id=3380777 https://www.youtube.com/watch?v=xA5U85LSk0M Lorin Hochstein’s closing keynote on storytelling was rad: https://www.usenix.org/conference/srecon26americas/presentation/hochstein SRECon EMEA 2026 (in Dublin) has their CFP up: https://www.usenix.org/conference/srecon26emea/call-for-participation As always, you can check out the Resilience in Software Foundation at resilienceinsoftware.org

    55 min
  4. FEB 1

    The Messy 9 and Coding with AI - A Panel Discussion

    Special thanks to John Allspaw, Sheeri Cabral, Martin Smith, and David Woods for joining us! Ben Affleck’s been making the promo rounds, but the specific convo we reference is recapped here: https://www.moviemaker.com/ben-affleck-ai-explains/ The Messy 9 are: congestion cascade conflict lag saturation friction tempo surprise tangles Dave’s been doing a set of videos on Resilience Engineering, some of which have some crossover with the Messy 9 - you can find the first one here: https://resiliencefoundations.github.io/video-1-introduction-pt-1-it's-all-about-viability.html Previous TiF episode on the messy 9: https://www.thisisfinepod.com/the-pod/complex-systems-and-the-messy-nine-wspecial-guests-dave-woods-and-john-allspaw Richard Cook on Above the Line/Below the Line:  Written - https://dl.acm.org/doi/pdf/10.1145/3379510 A good excerpt from a talk from John Allspaw on Above the Line/Below the Line: https://www.youtube.com/watch?v=8bxj-FLEi10&list=PLb1aZTnPf3-OMChMkrr6WsokRI6LOnuem   Colette mentioned the competence knowledge model: https://en.wikipedia.org/wiki/Four_stages_of_competence There’s a good argument based on the conversation here that AI makes it harder for Consciously Incompetent people to graduate to Conscious Competence. And, in Martin’s case, it makes Unconsciously Competent folks need to backtrack into Conscience Competence to “teach” it how to do things they don’t always think about. We can reset the clock to 0 episodes since we’ve mentioned the Ironies of Automation: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf There is a good blog on Jamie Zawinski’s saying on regular expressions here: https://regex.info/blog/2006-09-15/247 Alex Gorbachev and The Battle Against Any Guess seems to have become a paper https://www.researchgate.net/publication/251255185_Battle_Against_Any_Guess Dave talks about Robust Yet Fragile as part of Resilience Engineering here: https://www.youtube.com/watch?v=gFotUdLL2zs Lorin Hochstein’s blog post that Dave is referencing is https://surfingcomplexity.blog/2026/01/19/amdahl-gustafson-coding-agents-and-you/ Fred writes a good one on the Law of Stretched Systems: ​​https://ferd.ca/the-law-of-stretched-cognitive-systems.html The 1985 paper Dave keeps mentioning could be any number of things he released that year, but I have a hunch it’s this one: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/511 or this one: https://link.springer.com/chapter/10.1007/978-3-642-50329-0_11 Dave references a lot of things around the economic sustainability around AI, and Ed Zitron has been writing quite a bit about that for the last year and change. See: https://www.wheresyoured.at/wheres-the-money/ https://www.wheresyoured.at/big-tech-2tr/ Among others.

    1h 43m
  5. JAN 17

    Going Solid

    If you’re feeling like you need to do more to respond to our moment: Lots of place to donate to in the twin cities are listed here: https://mspmag.com/arts-and-culture/general-interest/ice-minnesota-support-immigrant-communities-fundraisers-food-drives-trainings/ You can always find mutual aid networks in your own area, including immigrant aid networks https://immigrantdefensenetwork.org/ does good work, too The Hometown Holler podcast with Tressie McMillan Cottom was a wonderful discussion: https://www.youtube.com/watch?v=2gr4mW8aR-g The Ruth Wilson Gilmore’s interview that I quoted clumsily is here: https://www.nytimes.com/2019/04/17/magazine/prison-abolition-ruth-wilson-gilmore.html  The paper itself: https://qualitysafety.bmj.com/content/14/2/130.short If you haven’t seen The Pitt, you should, it’s super good: https://en.wikipedia.org/wiki/The_Pitt Charles Perrow’s Normal Accidents has more definitions/examples of coupling: https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-professor-charles-perrow/cad38a43fcffa1f8?ean=9780691004129&next=t Some stuff on microservices and coupling here: https://microservices.io/post/architecture/2023/03/28/microservice-architecture-essentials-loose-coupling.html Colette’s #notanad endorsement for paper organizing is https://paperpile.com/ Rasmussen’s boundary model comes initially from his paper here: https://www.sciencedirect.com/science/article/abs/pii/S0925753597000520 And if you want a good writeup on Rasmussen’s boundary model explaining it, you can always read Lorin’s blog: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/ Dr Cook’s talk at Velocity is a classic, and goes over Rasmussen’s boundary model really well: https://www.youtube.com/watch?v=PGLYEDpNu60 Fred does a great job writing about the Law of Stretched Systems and how it applies to his own work on his blog: https://ferd.ca/the-law-of-stretched-cognitive-systems.html “Plans are nothing, but planning is everything” is a paraphrase of Eisenhower: https://www.presidency.ucsb.edu/documents/remarks-the-national-defense-executive-reserve-conference Want to chat about this paper with other folks? Come to the RISF live event for a Paper Party! https://resilienceinsoftware.org/events/157553

    1h 2m

Ratings & Reviews

5
out of 5
4 Ratings

About

A podcast about resilience engineering and software. Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help!  Write us anonymously at our open question form Email us at: thisisfine.softwarepodcast@gmail.com Call us and leave a voicemail, or text us at: ‪(401) 592-7574‬

You Might Also Like