CloudChat

Carl and Brandon

Conversations about building software and designing architecture in the cloud natively.

  1. Whoops, No VM's!!!

    3 NOV

    Whoops, No VM's!!!

    Episode 0027 - Whoops, No VM's!!! You've planned for redundancy, scaling, and failover, but what happens when the cloud itself runs out of space? In this episode, Carl and Brandon untangle capacity (what the provider physically or logically has available in a region or zone) versus quota (the soft limit on what you can consume). Mixing the two leads to painful surprises during scale events and failovers. We talk through how capacity shortfalls show up in real life—zones that are full, SKUs that vary by location, and limited supply for GPU-heavy instances, and the patterns that help: design for multiple zones and regions, add retry and fallback logic with flexible SKUs, balance spot with on-demand, and hold a baseline with reservations or time-bound commitments. We close on the business side: the price of headroom, when commitments make sense, and simple pipeline and monitoring checks so "no capacity" errors fail fast instead of 30 minutes into a deploy. Links AWS Auto Scaling allocation strategies AWS EC2 Capacity Reservations AWS insufficient capacity guidance AWS Savings Plans AWS Service Quotas Azure On-demand Capacity Reservations Azure quotas overview Azure region pairs Azure subscription and service limits Azure VM allocation failures Azure VM Scale Sets orchestration modes (Flexible) GCP Compute Engine Reservations GCP quota alerts and monitoring GCP Regional Managed Instance Groups GCP resource availability errors Google Cloud quotas overview Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    51 min
  2. Are Your Cloud Costs Too Damn High???

    6 OCT

    Are Your Cloud Costs Too Damn High???

    Episode 0026 - Are Your Cloud Costs Too Damn High??? Cloud cost optimization is about designing systems that perform efficiently without wasting money. In this episode, Carl and Brandon break down how AWS, Azure, and Google Cloud help teams rightsize compute, manage storage tiers, and control networking costs. They talk through savings plans, spot instances, lifecycle management, and data transfer strategies that keep performance high and waste low. The discussion then moves into monitoring, automation, and FinOps culture, where budgets, policies, and shared accountability make optimization stick. They cover dashboards, tagging, auto-shutdown routines, and partner-led programs that unlock funding and deeper discounts. Real-world stories from enterprises and startups highlight one key truth: cost management is not a cleanup exercise, it is an ongoing habit that keeps cloud architectures both efficient and sustainable. Links AWS: Well-Architected Framework – Cost Optimization pillar AWS: How to Use AWS Well-Architected with Trusted Advisor for Cost Optimization AWS: AWS Savings Plans AWS: Amazon EC2 Spot Instances Azure: Microsoft Cost Management + Billing (overview) Azure: Quickstart: Start using Cost Analysis Azure: Common cost analysis uses in Cost Management Azure: Control Azure spending and manage bills (learning path) GCP: Create, edit, or delete budgets and budget alerts (Cloud Billing) GCP: Cloud Billing Budget API overview GCP: Committed Use Discounts (Compute) GCP: Understand your bill – pricing & billing (Google Developers) Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    59 min
  3. The Sound of Security

    8 SEP

    The Sound of Security

    Episode 0025 - The Sound of Security Security is more than a feature, it's a pillar of the Well-Architected Framework. In this episode, Carl and Brandon explore how AWS, Azure, and GCP approach security across identity and access, infrastructure defense, data protection, monitoring, governance, and the shared responsibility model. They compare tools and practices like IAM, RBAC, and conditional access; network firewalls, WAFs, and DDoS protection; encryption at rest and in transit; and incident detection and automated remediation. The conversation also dives into security testing, drift detection with IaC, compliance posture, and how policy enforcement differs across the big three. The episode closes with a reminder that cloud security is always shared, and is never finished. Links AWS: Well-Architected Framework – Security pillar AWS: Identity and Access Management (IAM) AWS: AWS Shield and WAF AWS: Amazon Macie AWS: Amazon GuardDuty AWS: AWS Config Azure: Azure Well-Architected Framework – Security Azure: Microsoft Entra ID (Azure AD) Azure: Azure Role-Based Access Control (RBAC) Azure: Azure Key Vault Azure: Defender for Cloud Azure: Microsoft Sentinel Google Cloud: Google Cloud Architecture Framework – Security Google Cloud: IAM overview Google Cloud: Cloud Armor Google Cloud: Cloud KMS Google Cloud: Data Loss Prevention (DLP) API Google Cloud: Security Command Center Google Cloud: Assured Workloads Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    1 h y 7 min
  4. 4 AGO

    Operating Excellently

    Episode 0024 - Operating Excellently Operational excellence goes beyond uptime, it's about building and operating cloud systems with discipline, automation, and continuous improvement. Carl and Brandon break down what operational excellence really means, drawing a distinction between striving for perfection and building resilient, adaptable systems. They discuss how principles from AWS, Azure, and GCP converge around key practices like repeatable automation, structured change management, and process validation. The episode dives into real-world strategies for automation, incident readiness, and observability, including where and how to insert gates, use feature flags, and integrate infrastructure as code across cloud platforms. From avoiding certificate-induced outages to catching misconfigurations early, the key theme is consistency at scale. The discussion also emphasizes the cultural side, why shared ownership, retrospectives, and iterative postmortems matter just as much as tooling. Links Ansible: Ansible community documentation AWS Docs: Amazon CloudWatch documentation overview AWS Docs: Operational Excellence whitepaper AWS Docs: Prescriptive Guidance: Operational Excellence AWS Docs: Using CloudWatch dashboards and alarms AWS Docs: Well‑Architected Framework – Operational Excellence pillar AWS: Getting started with Amazon CloudWatch Google Cloud: Continuously improve and innovate Google Cloud: Manage incidents and problems Google Cloud: Operational Excellence pillar overview Google Cloud: Operational readiness & performance using CloudOps HashiCorp Docs: Terraform configuration language reference HashiCorp Docs: Terraform documentation Microsoft Docs: Automation of tasks with PowerShell in Power Platform Microsoft Learn: Azure Automation documentation Microsoft Learn: Azure Monitor documentation Microsoft Learn: Operational Excellence maturity model Microsoft Learn: Operational Excellence overview & quickstart Microsoft Learn: Operational Excellence principles (maturity model, practices) Microsoft Learn: PowerShell documentation PowerShell Universal Docs: PowerShell Universal platform guide Red Hat Docs: Ansible Automation Platform guide Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    53 min
  5. Turbocharged: Mastering Performance in Cloud Architecture

    7 JUL

    Turbocharged: Mastering Performance in Cloud Architecture

    Episode 0023 - Turbocharged: Mastering Performance in Cloud Architecture Cloud performance is one of those words that everyone agrees matters, but often means different things depending on who you ask. Is it latency? Is it autoscaling? Is it picking the right SKU size? We cover the fundamentals of designing for performance in the cloud: how to select the right compute options, when to scale up or out, and what it takes to reduce latency across global workloads. We explore autoscaling strategies, observability tooling, cost tradeoffs, and real-world tuning stories—plus we wrap with a cheat sheet of optimization tools across AWS, Azure, and GCP. Performance isn't just about throwing more cores or RAM at a problem. It's a set of design choices you make continuously—choices that affect cost, scalability, and user experience. Use the principles and tools in your cloud provider to experiment, monitor, and improve. Producer's note: we encountered some technical issues during recording, so apologies for the audio quality in some parts. The content is still solid, and we hope you find it valuable! Links: AWS Trusted Advisor AWS Well-Architected Framework Azure Advisor Azure Well-Architected Framework – Performance Cloud Load Balancing (GCP) GCP Architecture Framework GCP Recommender PerfKit Benchmarker SLOs and SLIs (Google SRE Workbook) Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    52 min
  6. What is Cloud Resiliency, Really?

    2 JUN

    What is Cloud Resiliency, Really?

    Episode 0022 - What is Cloud Resiliency, Really? Carl and Brandon break down the core concepts behind cloud resiliency, availability, reliability, and redundancy — how they relate, where they differ, and why understanding those distinctions is critical. Just because a service is "always on" doesn't mean it's resilient. They explore the difference between planned and unplanned outages, how graceful degradation works in practice, and why resiliency is measured by recovery, not just uptime. It's not just about uptime. It's about what breaks, how you recover, and what keeps going when everything else doesn't. They also cover the architectural side: distributed systems, zone-aware deployments, chaos testing, and recovery strategies that go beyond documentation. With real-world failure scenarios and practical planning advice, this episode helps cloud teams build for failure — before it happens. Links: AWS | Failover with AWS AWS | Well-Architected Framework: Reliability Pillar Azure | Reliability design principles Azure | Resiliency Overview Azure | Well-Architected Framework: Reliability Pillar Google Cloud | Architecture Framework: Reliability Pillar Google Cloud | Patterns for scalable and resilient apps Google Cloud | Site Reliability Engineering (SRE) Book principlesofchaos.org | Principles of Chaos Engineering Visit us at: twitter.com/CloudChatTech discord.cloudchat.tech cloudchatpodcast@gmail.com linkedin.com/company/cloudchat

    56 min

Calificaciones y reseñas

5
de 5
5 calificaciones

Acerca de

Conversations about building software and designing architecture in the cloud natively.