Ship It Weekly - DevOps, SRE, and Platform Engineering News

Teller's Tech - DevOps SRE Podcast

Ship It Weekly is a short, practical recap of what actually matters in DevOps, SRE, and platform engineering. Each episode, your host Brian Teller walks through the latest outages, releases, tools, and incident writeups, then translates them into “here’s what this means for your systems” instead of just reading headlines. Expect a couple of main stories with context, a quick hit of tools or releases worth bookmarking, and the occasional segment on on-call, burnout, or team culture. This isn’t a certification prep show or a lab walkthrough. It’s aimed at people who are already working in the space and want to stay sharp without scrolling status pages and blogs all week. You’ll hear about things like cloud provider incidents, Kubernetes and platform trends, Terraform and infrastructure changes, and real postmortems that are actually worth your time. Most episodes are 10–25 minutes, so you can catch up on the way to work or between meetings. Every now and then there will be a “special” focused on a big outage or a specific theme, but the default format is simple: what happened, why it matters, and what you might want to do about it in your own environment. If you’re the person people DM when something is broken in prod, or you’re building the platform everyone else ships on top of, Ship It Weekly is meant to be in your rotation.

  1. Azure VM Control Plane Outage, GitHub Agent HQ (Claude + Codex), Claude Opus 4.6, Gemini CLI, MCP

    16H AGO

    Azure VM Control Plane Outage, GitHub Agent HQ (Claude + Codex), Claude Opus 4.6, Gemini CLI, MCP

    This week on Ship It Weekly, Brian hits four “control plane + trust boundary” stories where the glue layer becomes the incident. Azure had a platform incident that impacted VM management operations across multiple regions. Your app can be up, but ops is degraded. GitHub is pushing Agent HQ (Claude + Codex in the repo/CI flow), and Actions added a case() function so workflow logic is less brittle. MCP is becoming platform plumbing: Miro launched an MCP server and Kong launched an MCP Registry. Links Azure status incident (VM service management issues) https://azure.status.microsoft/en-us/status/history/?trackingId=FNJ8-VQZ GitHub Agent HQ: Claude + Codex https://github.blog/news-insights/company-news/pick-your-agent-use-claude-and-codex-on-agent-hq/ GitHub Actions update (case() function) https://github.blog/changelog/2026-01-29-github-actions-smarter-editing-clearer-debugging-and-a-new-case-function/ Claude Opus 4.6 https://www.anthropic.com/news/claude-opus-4-6 How Google SREs use Gemini CLI https://cloud.google.com/blog/topics/developers-practitioners/how-google-sres-use-gemini-cli-to-solve-real-world-outages Miro MCP server announcement https://www.businesswire.com/news/home/20260202411670/en/Miro-Launches-MCP-Server-to-Connect-Visual-Collaboration-With-AI-Coding-Tools Kong MCP Registry announcement https://konghq.com/company/press-room/press-release/kong-introduces-mcp-registry GitHub Actions hosted runners incident thread https://github.com/orgs/community/discussions/186184 DockerDash / Ask Gordon research https://noma.security/blog/dockerdash-two-attack-paths-one-ai-supply-chain-crisis/ Terraform 1.15 alpha https://github.com/hashicorp/terraform/releases/tag/v1.15.0-alpha20260204 Wiz Moltbook write-up https://www.wiz.io/blog/exposed-moltbook-database-reveals-millions-of-api-keys Chainguard “EmeritOSS” https://www.chainguard.dev/unchained/introducing-chainguard-emeritoss More episodes + details: https://shipitweekly.fm

    21 min
  2. CodeBreach in AWS CodeBuild, Bazel TLS Certificate Expiry Breaks Builds, Helm Charts Reliability Audit, and New n8n Sandbox Escape RCE

    JAN 30

    CodeBreach in AWS CodeBuild, Bazel TLS Certificate Expiry Breaks Builds, Helm Charts Reliability Audit, and New n8n Sandbox Escape RCE

    This week on Ship It Weekly, Brian looks at four “glue failures” that can turn into real outages and real security risk. We start with CodeBreach: AWS disclosed a CodeBuild webhook filter misconfig in a small set of AWS-managed repos. The takeaway is simple: CI trigger logic is part of your security boundary now. Next is the Bazel TLS cert expiry incident. Cert failures are a binary cliff, and “auto renew” is only one link in the chain. Third is Helm chart reliability. Prequel reviewed 105 charts and found a lot of demo-friendly defaults that don’t hold up under real load, rollouts, or node drains. Fourth is n8n. Two new high-severity flaws disclosed by JFrog. “Authenticated” still matters because workflow authoring is basically code execution, and these tools sit next to your secrets. Lightning round: Fence, HashiCorp agent-skills, marimo, and a cautionary agent-loop story. Links AWS CodeBreach bulletin https://aws.amazon.com/security/security-bulletins/2026-002-AWS/ Wiz research https://www.wiz.io/blog/wiz-research-codebreach-vulnerability-aws-codebuild Bazel postmortem https://blog.bazel.build/2026/01/16/ssl-cert-expiry.html Helm report https://www.prequel.dev/blog-post/the-real-state-of-helm-chart-reliability-2025-hidden-risks-in-100-open-source-charts n8n coverage https://thehackernews.com/2026/01/two-high-severity-n8n-flaws-allow.html Fence https://github.com/Use-Tusk/fence agent-skills https://github.com/hashicorp/agent-skills marimo https://marimo.io/ Agent loop story https://www.theregister.com/2026/01/27/ralph_wiggum_claude_loops/ Related n8n episodes: https://www.tellerstech.com/ship-it-weekly/n8n-critical-cve-cve-2026-21858-aws-gpu-capacity-blocks-price-hike-netflix-temporal/ https://www.tellerstech.com/ship-it-weekly/n8n-auth-rce-cve-2026-21877-github-artifact-permissions-and-aws-devops-agent-lessons/ More episodes + details: https://shipitweekly.fm

    19 min
  3. Ship It Conversations: AI Automation for SMBs: What to Automate (And What Not To) (with Austin Reed)

    JAN 27

    Ship It Conversations: AI Automation for SMBs: What to Automate (And What Not To) (with Austin Reed)

    This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps). In this Ship It: Conversations episode I talk with Austin Reed from horizon.dev about AI and automation for small and mid-sized businesses, and what actually works once you leave the demo world. We get into the most common automation wins he sees (sales and customer service), why a lot of projects fail due to communication and unclear specs more than the tech, and the trap of thinking “AI makes it cheap.” Austin shares how they push teams toward quick wins first, then iterate with prototypes so you don’t spend $10k automating a thing that never even happens. We also talk guardrails: when “human-in-the-loop” makes sense, what he avoids automating (finance-heavy logic, HIPAA/medical, government), and why the goal is usually leverage, not replacing people. On the dev side, we nerd out a bit on the tooling they’re using day to day: GPT and Claude, Cursor, PR review help, CI/CD workflows, and why knowing how to architect and validate output matters way more than people think. If you’re a DevOps/SRE type helping the business “do AI,” or you’re just tired of automation hype that ignores real constraints like credentials, scope creep, and operational risk, this one is very much about the practical middle ground. Links from the episode: Austin on LinkedIn: https://www.linkedin.com/in/automationsexpert/ horizon.dev: horizon.dev YouTube: https://www.youtube.com/@horizonsoftwaredev Skool: https://www.skool.com/automation-masters If you found this useful, share it with the person on your team who keeps saying “we should automate that” but hasn’t dealt with the messy parts yet. More information on our website: https://shipitweekly.fm

    25 min
  4. curl Shuts Down Bug Bounties Due to AI Slop, AWS RDS Blue/Green Cuts Switchover Downtime to ~5 Seconds, and Amazon ECR Adds Cross-Repository Layer Sharing

    JAN 24

    curl Shuts Down Bug Bounties Due to AI Slop, AWS RDS Blue/Green Cuts Switchover Downtime to ~5 Seconds, and Amazon ECR Adds Cross-Repository Layer Sharing

    This week on Ship It Weekly, Brian looks at three different versions of the same problem: systems are getting faster, but human attention is still the bottleneck. We start with curl shutting down their bug bounty program after getting flooded with low-quality “AI slop” reports. It’s not a “security vs maintainers” story, it’s an incentives and signal-to-noise story. When the cost to generate reports goes to zero, you basically DoS the people doing triage. Next, AWS improved RDS Blue/Green Deployments to cut writer switchover downtime to typically ~5 seconds or less (single-region). That’s a big deal, but “fast switchover” doesn’t automatically mean “safe upgrade.” Your connection pooling, retries, and app behavior still decide whether it’s a blip or a cascade. Third, Amazon ECR added cross-repository layer sharing. Sounds small, but if you’ve got a lot of repos and you’re constantly rebuilding/pushing the same base layers, this can reduce storage duplication and speed up pushes in real fleets. Lightning round covers a practical Kubernetes clientcmd write-up, a solid “robust Helm charts” post, a traceroute-on-steroids style tool, and Docker Kanvas as another signal that vendors are trying to make “local-to-cloud” workflows feel less painful. We wrap with Honeycomb’s interim report on their extended EU outage, and the part that always hits hardest in long incidents: managing engineer energy and coordination over multiple days is a first-class reliability concern. Links from this episode curl bug bounties shutdown https://github.com/curl/curl/pull/20312 RDS Blue/Green faster switchover https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-rds-blue-green-deployments-reduces-downtime/ ECR cross-repo layer sharing https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-ecr-cross-repository-layer-sharing/ Kubernetes clientcmd apiserver access https://kubernetes.io/blog/2026/01/19/clientcmd-apiserver-access/ Building robust Helm charts https://www.willmunn.xyz/devops/helm/kubernetes/2026/01/17/building-robust-helm-charts.html ttl tool https://github.com/lance0/ttl Docker Kanvas (InfoQ) https://www.infoq.com/news/2026/01/docker-kanvas-cloud-deployment/ Honeycomb EU interim report https://status.honeycomb.io/incidents/pjzh0mtqw3vt SRE Weekly issue #504 https://sreweekly.com/sre-weekly-issue-504/ More episodes + details: https://shipitweekly.fm

    16 min
  5. n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent Lessons

    JAN 16

    n8n Auth RCE (CVE-2026-21877), GitHub Artifact Permissions, and AWS DevOps Agent Lessons

    This week on Ship It Weekly, the theme is simple: the automation layer has become a control plane, and that changes how you should think about risk. We start with n8n’s latest critical vulnerability, CVE-2026-21877. This one is different from the unauth “Ni8mare” issue we covered in Episode 12. It’s authenticated RCE, which means the real question isn’t only “is it internet exposed,” it’s who can log in, who can create or modify workflows, and what those workflows can reach. Takeaway: treat workflow automation tools like CI systems. They run code, they hold credentials, and they can pivot into real infrastructure. Next is GitHub’s new fine-grained permission for artifact metadata. Small change, big least-privilege implications for Actions workflows. It’s also a good forcing function to clean up permission sprawl across repos. Third is AWS’s DevOps Agent story, and the best part is that it’s not hype. It’s a real look at what it takes to operationalize agents: evaluation, observability into tool calls/decisions, and control loops with brakes and approvals. Prototype is cheap. Reliability is the work. Lightning round: GitHub secret scanning changes that can quietly impact governance, a punchy Claude Code “guardrails aren’t guaranteed” reminder, Block’s Goose as another example of agent workflows getting productized, and OpenCode as an “agent runner” pattern worth watching if you’re experimenting locally. Links n8n CVE-2026-21877 (authenticated RCE) https://thehackernews.com/2026/01/n8n-warns-of-cvss-100-rce-vulnerability.html?m=1 Episode 12 (n8n “Ni8mare” / CVE-2026-21858) https://www.tellerstech.com/ship-it-weekly/n8n-critical-cve-cve-2026-21858-aws-gpu-capacity-blocks-price-hike-netflix-temporal/ GitHub: fine-grained permission for artifact metadata (GA) https://github.blog/changelog/2026-01-13-new-fine-grained-permission-for-artifact-metadata-is-now-generally-available/ GitHub secret scanning: extended metadata auto-enabled (Feb 18) https://github.blog/changelog/2026-01-15-secret-scanning-extended-metadata-to-be-automatically-enabled-for-certain-repositories/ Claude Code issue thread (Bedrock guardrails gap) https://github.com/anthropics/claude-code/issues/17118 Block Goose (tutorial + sessions/context) https://block.github.io/goose/docs/tutorials/rpi https://block.github.io/goose/docs/guides/sessions/smart-context-management OpenCode https://opencode.ai More episodes + details: https://shipitweekly.fm

    12 min
  6. Ship It Conversations: Human-in-the-Loop Fixer Bots and AI Guardrails in CI/CD (with Gracious James)

    JAN 12

    Ship It Conversations: Human-in-the-Loop Fixer Bots and AI Guardrails in CI/CD (with Gracious James)

    This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps). In this Ship It: Conversations episode I talk with Gracious James Eluvathingal about TARS, his “human-in-the-loop” fixer bot wired into CI/CD. We get into why he built it in the first place, how he stitches together n8n, GitHub, SSH, and guardrailed commands, and what it actually looks like when an AI agent helps with incident response without being allowed to nuke prod. We also dig into rollback phases, where humans stay in the loop, and why validating every LLM output before acting on it is the single most important guardrail. If you’re curious about AI agents in pipelines but hate the idea of a fully autonomous “ops bot,” this one is very much about the middle ground: segmenting workflows, limiting blast radius, and using agents to reduce toil instead of replace engineers. Gracious also walks through where he’d like to take TARS next (Terraform, infra-level decisions, more tools) and gives some solid advice for teams who want to experiment with agents in CI/CD without starting with “let’s give it root and see what happens.” Links from the episode: Gracious on LinkedIn: https://www.linkedin.com/in/gracious-james-eluvathingal TARS overview post: https://www.linkedin.com/posts/gracious-james-eluvathingal_aiagents-devops-automation-activity-7391064503892987904-psQ4 If you found this useful, share it with the person on your team who’s poking at AI automation and worrying about guardrails. More information on our website: https://shipitweekly.fm

    22 min
  7. n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal

    JAN 9

    n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal

    This week on Ship It Weekly, Brian’s theme is basically: the “automation layer” is not a side tool anymore. It’s part of your perimeter, part of your reliability story, and sometimes part of your budget problem too. We start with the n8n security issue. A lot of teams use n8n as glue for ops workflows, which means it tends to collect credentials and touch real systems. When something like this drops, the right move is to treat it like production-adjacent infra: patch fast, restrict exposure, and assume anything stored in the tool is high value. Next is AWS quietly raising prices on EC2 Capacity Blocks for ML. Even if you’re not a GPU-heavy shop, it’s a useful signal: scarce compute behaves like a market. If you do rely on scheduled GPU capacity, it’s time to revisit forecasts and make sure your FinOps tripwires catch rate changes before the end-of-month surprise. Third is Netflix’s write-up on using Temporal for reliable cloud operations. The best takeaway is not “go adopt Temporal tomorrow.” It’s the pattern: long-running operational workflows should be resumable, observable, and safe to retry. If your critical ops are still bash scripts and brittle pipelines, you’re one transient failure away from a very dumb day. In the lightning round: Kubernetes Dashboard getting archived and the “ops dependencies die” reality check, Docker pushing hardened images as a safer baseline and Pipedash. Links SRE Weekly issue 504 (source roundup) https://sreweekly.com/sre-weekly-issue-504/ n8n CVE (NVD) https://nvd.nist.gov/vuln/detail/CVE-2026-21858 n8n community advisory https://community.n8n.io/t/security-advisory-security-vulnerability-in-n8n-versions-1-65-1-120-4/247305 AWS price increase coverage (The Register) https://www.theregister.com/2026/01/05/aws_price_increase/ Netflix: Temporal powering reliable cloud operations https://netflixtechblog.com/how-temporal-powers-reliable-cloud-operations-at-netflix-73c69ccb5953 Kubernetes SIG-UI thread (Dashboard archiving) https://groups.google.com/g/kubernetes-sig-ui/c/vpYIRDMysek/m/wd2iedUKDwAJ Kubernetes Dashboard repo (archived) https://github.com/kubernetes/dashboard Pipedash https://github.com/hcavarsan/pipedash Docker Hardened Images https://www.docker.com/blog/docker-hardened-images-for-every-developer/ More episodes and more details on this episode can be found on our website: https://shipitweekly.fm

    16 min
  8. Ship It Conversations: Backstage vs Internal IDPs, and Why DevEx Muscle Matters (with Danny Teller)

    JAN 6

    Ship It Conversations: Backstage vs Internal IDPs, and Why DevEx Muscle Matters (with Danny Teller)

    This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps). I sat down with Danny Teller, a DevOps Architect and Tech Lead Manager at Tipalti, to talk about internal developer platforms and the reality behind “just set up a developer portal.” We get into Backstage versus internal IDPs, why adoption is the real battle, and why platform/DevEx maturity matters more than whatever tool you pick. What we covered Backstage vs internal IDPs Backstage is a solid starting point for a developer portal, but it doesn’t magically create standards, ownership, or platform maturity. We talk about when Backstage fits, and when teams end up building internal tooling anyway. DevEx muscle (the make-or-break) Danny’s take: the portal UI is the easy part. The hard part is the ongoing work that makes it useful: paved roads, sane defaults, support, and keeping the catalog/data accurate so engineers trust it. Where teams get burned Common failure mode: teams ship a portal first, then realize they don’t have the resourcing, ownership, or workflows behind it. Adoption fades fast if the portal doesn’t remove real friction. A build vs buy gut check We walk through practical signals that push you toward open source Backstage, a managed Backstage offering, or a commercial portal. We also hit the maintenance trap: if you build too much, you’ve created a second product. Links and resources Danny Teller's Linkedin: https://www.linkedin.com/in/danny-teller/ matlas — one CLI for Atlas and MongoDB: https://github.com/teabranch/matlas-cli Backstage: https://backstage.io/ Roadie (managed Backstage): https://roadie.io/ Port: https://www.port.io/ Cortex: https://www.cortex.io/ OpsLevel: https://www.opslevel.com/ Atlassian Compass: https://www.atlassian.com/software/compass Humanitec Platform Orchestrator: https://humanitec.com/products/platform-orchestrator Northflank: https://northflank.com/ If you enjoyed this episode Ship It Weekly is still the weekly news recap, and I’m dropping these guest convos in between. Follow/subscribe so you catch both, and if this was useful, share it with a platform/devex friend and leave a quick rating or review. It helps more than it should. Visit our website at https://www.shipitweekly.fm

    26 min

Ratings & Reviews

5
out of 5
9 Ratings

About

Ship It Weekly is a short, practical recap of what actually matters in DevOps, SRE, and platform engineering. Each episode, your host Brian Teller walks through the latest outages, releases, tools, and incident writeups, then translates them into “here’s what this means for your systems” instead of just reading headlines. Expect a couple of main stories with context, a quick hit of tools or releases worth bookmarking, and the occasional segment on on-call, burnout, or team culture. This isn’t a certification prep show or a lab walkthrough. It’s aimed at people who are already working in the space and want to stay sharp without scrolling status pages and blogs all week. You’ll hear about things like cloud provider incidents, Kubernetes and platform trends, Terraform and infrastructure changes, and real postmortems that are actually worth your time. Most episodes are 10–25 minutes, so you can catch up on the way to work or between meetings. Every now and then there will be a “special” focused on a big outage or a specific theme, but the default format is simple: what happened, why it matters, and what you might want to do about it in your own environment. If you’re the person people DM when something is broken in prod, or you’re building the platform everyone else ships on top of, Ship It Weekly is meant to be in your rotation.

You Might Also Like