#critical, #incidents, #war-room, #sos.
Every startup creates one at some point: a channel in which, whenever the fecal matter strikes the atmospheric propulsor, an attempt at coordination takes place. It's one of those ubiquitous inevitabilities of working in the tech scene today.
Our very own Critical Channel, however, aims to highlight some different inevitabilities. From organisational culture in a high-growth situation, to personal mental health and work-life balance. From manipulating Conway's Law to evolve your out-of-control microservices architecture, to managing churn and offboarding.
All hard problems, all anathema to an organisation if they crop up at the wrong time. But there's never been a #critical channel for this stuff.
Well, not until now.
Episode 14: Pedro, Roll a D10!
The Problem: Tempo is really good, you guys.
What is observability? | Grafana LabsPingdom - Website Performance and Availability MonitoringOpenTelemetry - High-quality, ubiquitous, and portable telemetry to enable effective observabilityThe Istio Service MeshDomain Oriented Observability | Martin FowlerGrafana TempoGrafana LokiPrometheusPattern: Distributed Tracing | Microservices.ioJaeger TracingShift-left testing | Wikipedia
Episode 13: Microservices: Because Yes.
The Problem: There were Legitimate Business Reasons, but we regret everything anyway.
With a title like this one, how can we not make it to the top of Hacker News?
This episode we discuss all the reasons you might not want to go down the Microservices route, and then tell you how to do it anyway. There's a lot of things to think about when you're on this journey, and we've gone ahead and made all the mistakes so you don't have to.
Plus, listen to the end to hear our collective shame about one time we all really buggered things up by trying to Standardise All The Things. Ah, simpler times...
Samson Q2U Dynamic USB/XLR microphoneNative Instruments Komplete Audio 1 USB Audio InterfaceMartin Fowler: Microservice definitionWikipedia: Domain-Driven DesignMartin Fowler: Bounded ContextsDomain-Driven Design: Tackling Complexity in the Heart of Software (aka The Big Blue Book) by Eric EvansBuilding Microservices by Sam NewmanEvent StormingSam Newman: Demystifying Conway's LawAgile with Edele: Building an Interaction MapBuilding Evolutionary Architectures by Patrick Kua
Episode 12: Gardening Leave
The problem: Turns out drunk babies aren't funny.
It's time for a break. We've focussed too hard, over the last dozen episodes, on bringing you top quality content related to engineering management, organisational culture, and whatever the third thing is that we say in every intro. Move over, titular easy problems - it's time for a podcast about parenting.
In this episode, we discuss side projects, whether companies should be able to lay a stake to the work you do in your free time, parenting, the peer pressure that comes alongside seeing others' side projects on social media, parenting, how and why you can be incentivised by your employer to work on side projects, parenting, creativity and how productisation can stifle your side projects, take a quick break from that to chat about parenting, before finishing up with a quick chat on trust and parenting.
Incidentally, one of us just became a dad. Coincidence?
Episode 11: FAAAAANG
The problem: FAAAAAAAAAAAAAANG.
The stars have been gone billions of years now. Black holes burnt out. All but one, where the last dregs of civilization fought over the last dregs of Hawking radiation, before that black hole too ran its course. Now there's only you, floating in the void.
A chime. Unmistakeably early 21st century, even untold millennia later: a push notification. You reach for your phone, pushing the obvious questions away - how has this artefact survived the aeons, the implausibility of it still having power or network - and with hands numbed by entropy, clumsily thumb the sensor to unlock it and read the eleven fateful words of the end of time itself:
"I'd like to add you to my professional network on LinkedIn!"
This week, whether it's your first or your fifteenth job in the tech industry, we try to give some advice in finding it. How do you know which companies are going to be good places to work? When hiring, what stands out to us in a CV, and what's an obvious red flag? And how much is a life of Microsoft™ SharePoint® worth to you?
Why senior engineers hate coding interviewsRouting the technical interview
Episode 10: Senior Headless Chicken (Incidents Part Two)
The Problem: It's never the API gateway (until it is).
Your monitoring is on point, you have a symfony of alerts with appropriate priority levels, runbooks are written and up-to-date, and your services autoscale like an absolute mother hubbard.
But Johnny Stakeholder doesn't give a damn how sophisticated your stack is. Johnny Stakeholder is going to trigger an incident at 4am, with the only details being a blurry photo of an inscrutable 500 error. Yes, Johnny Stakeholder takes pictures of his screen with his phone. Johnny Stakeholder suggests you deal with it. Johnny Stakeholder is going on a cigarette break and when Johnny Stakeholder gets back he expects it to be fixed.
In this second half of our Incident Response two-parter: what should happen when the pager goes off? We dissect a typical incident (at least, from our experience). How do you organise an effective response? What steps should be taken to understand what the underlying issue is? And what if you're not able to fix it in a reasonable time?
Increment: What happens when the pager goes off?The role of the incident commanderPagerDuty: Incident Roles
Episode 9: The Bug Team (Incidents Part One)
The Problem: You may only know a single tcpdump command, but you're sure as hell going to use it.
We're trying something new this week - it's a two-parter! We decided to live up to our name and talk about critical incident response procedures.
In this first half, we talk about how to craft a sustainable on-call rotation, how to compensate your engineers for living with the stress of on-call, and how to convince management that you need an on-call rotation.
Plus, Warnar definitely does not advocate for drink-driving. Don't do that.
MTBF, MTTR, MTTA, and MTTFCrafting Sustainable On-Call RotationsHow Monzo do on-callHow Monzo's on-call system evolvedGoogle SRE: Error Budgets and Maintenance WindowsFifth Gear: What's Worse, Drink Driving or Driving Tired?