Screaming in the Cloud

Corey Quinn

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

  1. Is It Broken Everywhere or Just for Me with Omri Sass

    ٢٢ يناير

    Is It Broken Everywhere or Just for Me with Omri Sass

    When your website stops working at 3 AM, you need to answer one question fast: Is it my code or is a big cloud provider having problems? Omri Sass from Datadog explains updog.ai, a tool that monitors whether major services like AWS, CloudFlare, and others are actually working. Instead of asking people to report problems like Down Detector does, updog uses real data from thousands of computers to detect when services go down. Omri shares why this took 6 years to build, how they process massive amounts of data with machine learning, and why cloud providers have been strangely upset about these tools existing. About Omri:  Omri Sass is a Director of Product Management at Datadog, where he leads and supports a team of 25+ product managers driving initiatives across Bits AI SRE, Data Observability, Service Management, and most recently, the launch of updog.ai. Outside of work, Omri is an avid sci-fi reader, a dedicated yoga practitioner, and happily outmatched by his cat. Show Highlights: (02:12) What is Updog and How Does It Work (03:38) Why Knowing If It's a Global Problem Matters (04:01) The Problem With Testing Every Endpoint Yourself (05:52) How Datadog Discovered EC2 Outages From Their Own Systems (10:38) When AWS Regions Go Down and Cascade Failures (13:13) What Happens When Services Rebuild Completely(16:29) The Most Important Learning During a 3 AM Incident(20:11) Why This Took So Long to Build(23:40) When Datadog Going Down Isn't Critical Path(25:22) How They Picked Which AWS Services to Monitor(27:07) What Comes Next for Updog(30:11) Where to Find Omri and Updog Links:  Datadog: datadoghq.com Omir’s LinkedIn: https://www.linkedin.com/in/omri-sass-65632a14/ Sponsored by: duckbillhq.com

    ٣١ من الدقائق
  2. Solving the 20-Year S3 File System Problem with Hunter Leath

    ٢٠ يناير

    Solving the 20-Year S3 File System Problem with Hunter Leath

    Hunter Leath, CEO of Archil, spent 8 years building Amazon's EFS file storage system, learning exactly why making cloud storage act like a hard drive always fails. Old programs need hard drives, but cloud storage doesn't work like hard drives—a problem that's existed for 20 years. Now Hunter's building Archil, which puts super-fast storage between programs and S3 so they can finally work together. Your programs think they're talking to a regular disk while your data lives safely in the cloud. Hunter explains how they're doing what others couldn't, why it costs less than Amazon's own solutions, and why file systems suddenly matter again in the AI era. Show Highlights: (01:37) What Archil Does and Why It Exists (02:26) Why Mounting S3 as a File System Has Always Failed (03:07) What Building EFS Taught Hunter (06:55) Using Fast SSDs as a Cache Layer for S3 (09:45) Attaching Archil to Your Existing S3 Buckets (15:08) Why Archil Costs Less Than EBS When You Do the Math (17:56) What Happens If Amazon Builds This Feature (19:20) Competing With EBS Performance on GP3 Volumes (21:43) Raising $6.7 Million Without an AI Pitch (23:46) What Customers Get Wrong About Archil (28:07) Accessing Data Stored in Glacier Deep Archive (29:24) The Plan to Get Into the Linux Kernel  (30:51) Where to Find Hunter About Hunter Leath:  Hunter is the founder and CEO of Archil, which transforms S3 buckets into infinite, local file systems that provide instant access to massive data sets. Prior to Archill, Hunter spent the last ten years in the cloud storage industry, including 8 years building Amazon's Elastic File System product and one year on Netflix's core storage team. Links:Hunter Leath on LinkedIn: https://www.linkedin.com/in/hleath/ Hunter Leath on X: https://x.com/jhleath/ Archil’s Website: https://archil.com Sponsored by: duckbillhq.com

    ٣٢ من الدقائق
  3. Building Systems That Work Even When Everything Breaks with Ben Hartshorne

    ١٥ يناير

    Building Systems That Work Even When Everything Breaks with Ben Hartshorne

    When AWS has a major outage, what actually happens behind the scenes? Ben Hartshorne, a principal engineer at Honeycomb, joins Corey Quinn to discuss a recent AWS outage and how they kept customer data safe even when their systems couldn't fully work. Ben explains why building services that expect things to break is the only way to survive these outages. Ben also shares how Honeycomb used its own tools to cut their AWS Lambda costs in half by tracking five different things in a spreadsheet and making small changes to all of them. About Ben Hartshorne:  Ben has spent much of his career setting up monitoring systems for startups and now is thrilled to help the industry see a better way. He is always eager to find the right graph to understand a service and will look for every excuse to include a whiteboard in the discussion. Show highlights:  (02:41)Two Stories About Cost Optimization (04:20) Cutting Lambda Costs by 50% (08:01) Surviving the AWS Outage (09:20) Preserving Customer Data During the Outage (13:08) Should You Leave AWS After an Outage? (15:09) Multi-Region Costs 10x More (18:10) Vendor Dependencies (22:06) How LaunchDarkly's SDK Handles Outages (24:40) Rate Limiting Yourself (29:00) How Much Instrumentation Is Too Much? (34:28) Where to Find Ben Links:  Linkedin: https://www.linkedin.com/in/benhartshorne/ GitHub: https://github.com/maplebed Sponsored by: duckbillhq.com

    ٣٦ من الدقائق
  4. Engineering Around Extreme S3 Scale with R. Tyler Croy

    ١٣ يناير

    Engineering Around Extreme S3 Scale with R. Tyler Croy

    R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out. About R. Tyler:  R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization. Show Highlights:01:48 Scribd's 18-Year History 04:00 One Document Becomes Billions of Files 05:47 When Normal Physics Stop Working 08:02 Why S3 Metadata Costs Too Much 10:50 How AI Made Old Documents Valuable 13:30 From 100 Billion to 100 Million Objects 15:05 The Curse of Retail Pricing  19:17 How Data Scientists Create Growth 21:18 De-Normalizing Data Problems 25:29 Evolving Old Systems 27:45 Billions Added Since Summer 29:29 Underused S3 Features 31:48 Where to Find Tyler Links:  Scribd: https://tech.scribd.comMastodon:  https://hacky.town/@rtylerGitHub: https://github.com/rtyler Sponsored by: duckbillhq.com

    ٣٤ من الدقائق
  5. Avery Pennarun on Tailscale's Evolution: From Mesh VPN to AI Security Gateway

    ٨ يناير

    Avery Pennarun on Tailscale's Evolution: From Mesh VPN to AI Security Gateway

    Corey Quinn sits down with Avery Pennarun, co-founder and CEO of Tailscale, for a deep dive into how the company is reinventing networking for the modern era. From finally making VPNs behave the way they should to tackling AI security with zero-click authentication, Avery shares candid insights on building infrastructure people actually love using, and love talking about. They get into everything: surviving 100% year-over-year growth, why running on two tailnets at once is pure chaos, and how Tailscale makes “secure by default” feel effortless. Plus, they dig into why FreeBSD firewalls needed some tough love, the uncomfortable truth behind POCs, and even the surprisingly useful trick of turning your Apple TV into an exit node. About Avery: Avery Pennarun is the co-founder and CEO of Tailscale, where he’s redefining secure networking with a simple, Zero Trust approach. A veteran software engineer with experience ranging from startups to Google, he’s known for turning complex systems into approachable, user-friendly tools. His contributions to projects like wvdial, bup, and sshuttle reflect his belief that great technology should be both powerful and easy to use. With a mix of technical depth and dry humor, Avery shares insights on modern networking, internet evolution, and the realities of scaling a startup. Highlights:(0:00) Introduction to Tailscale and Security (00:52) Sponsorship and Personal Experiences (02:07) Technical Deep Dive into Tail Scale (06:10) Challenges and Future of Tail Scale (22:45) Building the Tail Net's API (23:54) Connecting Cloud Providers with Tailscale (25:22) Tailscale as a Security Solution (26:44) Innovations and Future of Tailscale Sponsored by: duckbillhq.com

    ٤٤ من الدقائق
٤٫٧
من ٥
‫٩٢ من التقييمات‬

حول

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

قد يعجبك أيضًا