Smooth Scaling: System Design for High Traffic

Queue-it

Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP Customer Experience & Solutions at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.

  1. The Rise of Cloud Prem: Data Ownership in the Age of AI with Galileo's Sam Dhar

    2D AGO

    The Rise of Cloud Prem: Data Ownership in the Age of AI with Galileo's Sam Dhar

    Sam Dhar has spent 14 years building infrastructure at Cisco, Amazon Alexa, and Adobe, and now works as Senior Staff Engineer and AI infrastructure leader at Galileo, the enterprise AI evaluation platform. In this episode of the Smooth Scaling Podcast, Sam walks host Jose Quaresma through Cloud Prem: deploying your full product stack inside the customer's own cloud environment instead of running it as SaaS. They get into why the model is resurging, and it mostly comes down to data. Enterprises want ownership and control, plus a heavy compliance load (SOC 2, HIPAA, fully air-gapped government workloads), and they do not want a vendor sitting in the read path of their most sensitive data. Sam is candid about the hard parts. Cloud Prem can be a losing game on margins, deployment is the slowest thing in the pipeline, and every customer environment is different enough to reset the work. The conversation closes on AI: why it makes Cloud Prem urgent, the brutal GPU shortage, and why self-hosting an Opus-class model is still out of reach for most companies. A direct, practitioner-level look at where enterprise AI infrastructure is actually heading. Episode page ---(00:00) - Intro (01:08) - What Cloud Prem actually is (06:05) - Why Cloud Prem is resurging now (09:37) - Provider, vendor, customer: who owns what (11:10) - "Data is paramount": the compliance driver (14:29) - Shipping software into someone else's environment (19:57) - When Cloud Prem becomes a losing game (26:48) - Quality, and the control plane / data plane split (28:50) - Monitoring without seeing the customer's data (30:52) - Why Sam moved to AI evals (34:56) - Self-hosting LLMs and the GPU bottleneck (38:01) - Smaller runtimes, frontier-level intelligence (41:46) - Why AI makes Cloud Prem urgent (46:59) - Rapid fire: the one book to read (49:01) - "Business equals scalability" Satyam “Sam” Dhar is a senior Staff Engineer and AI infrastructure leader at Galileo, where he designs systems that support real-time LLM workflows at enterprise scale. Prior to Galileo, he spent over six years at Adobe, contributing to AI-powered product development, evaluation platforms, and large-scale data systems. Earlier in his career at Amazon, he worked on high-throughput distributed services supporting Alexa’s device orchestration. Based in San Francisco, Sam’s insights and commentary have been featured in Newsweek, CNET, InfoQ, The New Stack, The Deep View, and others. He is also a Senior Member of the Institute of Electrical and Electronics Engineers. 🔗 Connect Sam Dhar: https://www.linkedin.com/in/satyamdhar/ Host José Quaresma: https://www.linkedin.com/in/jose-quaresma/ This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner. © Queue-it, 2026

    50 min
  2. A Decade of Kubernetes Lessons with Chris Nesbitt-Smith

    APR 28

    A Decade of Kubernetes Lessons with Chris Nesbitt-Smith

    Chris Nesbitt-Smith has been running Kubernetes in production since version 0.4 — long before pods, before managed services, before most of today's tooling existed. In this episode of Smooth Scaling, he sits down with José Quaresma to share what a decade of running Kubernetes for UK government citizen-facing services has taught him about scaling critical infrastructure. The conversation covers why Kubernetes was the least bad option (and largely still is), why relying on autoscaling means you've already lost, and how Gregor Hohpe's "guardrails versus lane assist" metaphor changes the way you think about capacity. Chris makes the case for climbing the service stack — SaaS first, then Functions as a Service, then Platform as a Service, and only reluctantly managed Kubernetes — and explains why tech is one of the only industries that builds critical systems without ever pricing the risk of failure. A direct, opinionated look at what scaling really demands when the stakes are real and the budget isn't infinite. Episode page ---(00:01) - Intro (01:23) - Running Kubernetes since v0.4 in UK government (04:56) - Why pod rescheduling went full circle (09:07) - "Brave and stupid": running alpha-stage K8s in production (14:58) - Helm, DevOps as a job title, and cultural drift (16:43) - Climb the service stack (SaaS → FaaS → PaaS → managed K8s) (20:48) - Why engineers resist giving up control (23:52) - Tech doesn't quantify risk the way every other industry does (27:14) - If you're relying on autoscaling, it's already too late (28:30) - The KubeCon Black Friday game: dropping requests as strategy (33:03) - Graceful degradation up the stack (35:34) - "Mostly myths": data sovereignty vs. data residency (38:35) - Cloudflare and "deploy to the world" as a different paradigm (41:53) - The legacy debt sitting in UK public sector tech (46:03) - Rapid-fire: build advice, recommended reading, scalability is... Chris Nesbitt-Smith is an independent technology strategist, a Kubernetes instructor at LearnKube, and the architect of the UK Government's National Digital Exchange. Based in London, he works at the intersection of policy, security, and modern infrastructure — advising UK and international government departments, multinational enterprises, and large NGOs on cloud-native transformation and DevSecOps. A regular speaker at KubeCon, DevSecCon, and Open Source Summit, his talks span container security, policy-as-versioned-code, and platform engineering. He also blogs regularly on his blog Cloudy with Chance of Freefall. 🔗 Connect Guest Chris Nesbitt-Smith: https://uk.linkedin.com/in/cnesbittsmith Host José Quaresma: https://www.linkedin.com/in/jose-quaresma/ This podcast is researched by Joseph Thwaites, produced by Perseu Mandillo, and brought to you by Queue-it, your virtual waiting room partner. © Queue-it, 2026

    50 min
  3. Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko

    APR 9

    Autoscaling in Production: When It Works and When It Doesn't with Zaigham Sarfaraz and Šimon Bučko

    In this episode, José Quaresma sits down with two Queue-it engineers — Zaigham Sarfaraz, Engineering Manager, and Šimon Bučko, Senior Software Engineer — to talk autoscaling in production. They cover the fundamentals of horizontal and vertical scaling, why stateless architecture matters for scaling out, and what happens when the metrics you're scaling on don't match your actual bottleneck. The conversation gets real when Zaigham shares a war story of autoscaling failing during an iPhone launch — one million users in one second — and how that experience reshaped how the team thinks about pre-scaling for extreme traffic. Šimon challenges the temptation to rely on default configurations and explains why the days you most need autoscaling to work are exactly the days it might not. Episode page ---(00:00) - Introduction (00:46) - What is autoscaling under the hood? (03:25) - Why scaling down matters too (03:53) - Horizontal vs. vertical scaling (05:43) - When vertical scaling is the better choice (07:56) - Stateful vs. stateless applications (10:42) - Solving state for horizontal scaling (12:14) - The role of load balancers (14:31) - Choosing the right scaling metrics (16:46) - Is serverless the silver bullet? (21:34) - The cost paradox of autoscaling (23:40) - iPhone launch: when the whole world wants to buy a product (25:56) - Why autoscaling isn't enough for non-linear traffic (30:37) - The fallacy of the rule of thumb (32:48) - Rapid fire questions Šimon Bučko is a Senior Software Engineer at Queue-it, working across full-stack development. He is an AWS Certified Solutions Architect Professional with strong experience in software architecture and bridging the gap between business needs and technical execution.  Zaigham Sarfaraz is an Engineering Manager at Queue-it with over 15 years of experience across frontend, backend, infrastructure, and people leadership. He is an AWS Certified Cloud Practitioner and plays a key role in ensuring stable system operations while contributing to the continuous improvement of Queue-it's backend architecture.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

    36 min
  4. Observability as a Product: Building Platforms Engineers Actually Use with Iris Dyrmishi

    MAR 17

    Observability as a Product: Building Platforms Engineers Actually Use with Iris Dyrmishi

    In this episode, José Quaresma speaks with Iris Dyrmishi, Senior Observability Engineer at Miro, about building an observability platform that hundreds of engineers actually trust and use. Iris explains how her team treats observability as an internal product, walks through Miro's tracing migration from Jaeger and Zipkin to OpenTelemetry with zero disruption, and shares how teams now use traces proactively to find bottlenecks before they become outages. The conversation also covers the honest downsides — alert noise, dashboard sprawl, and the cost of observability — including a recent example using eBPF and Grafana Beyla to uncover hidden networking expenses that transformed Miro's cloud bill. Episode page ---(00:00) - Intro (00:59) - Building Observability as a Product at Miro (04:08) - Migrating to OpenTelemetry (09:21) - Industry Maturity and the Business Case (12:02) - From Reactive to Proactive Observability (14:34) - Logs vs. Tracing Explained (18:04) - Team Ownership, AI, and Freedom (24:38) - The Downsides and Costs of Observability (29:58) - Rapid Fire and Close Iris Dyrmishi is a Senior Observability Engineer at Miro, where she builds and maintains the company's observability platform. She started as a backend engineer before moving into SRE roles at Worten Portugal and Farfetch, where she developed her specialty in tracing and drove OpenTelemetry migrations across large engineering organisations without disrupting existing workflows. A CNCF Ambassador, co-organiser of Kubernetes Community Days Porto, and active voice in the observability community, she writes extensively about practical adoption challenges and has spoken at KubeCon EU and on the o11ycast podcast. Her guiding philosophy: observability is a team sport. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

    33 min
  5. Online Traffic in the Age of Agentic AI with Hans Skovgaard

    FEB 24

    Online Traffic in the Age of Agentic AI with Hans Skovgaard

    In this episode of Smooth Scaling, José Quaresma speaks with Hans Skovgaard, Chief Technology and Product Officer at Queue-it, about a shift that is already underway and accelerating fast: the internet now carries more automated bot traffic than human traffic — and agentic AI is about to make that gap much wider. Hans explains why the old model of "bots versus humans" is fundamentally broken, and why the real question is no longer who is visiting your site, but what their intent is. The conversation covers why autoscaling can no longer protect against the extreme traffic bursts that AI agents will generate, how to make bot attacks economically unviable, and what a future of AI agents buying concert tickets on your behalf actually looks like in practice. Hans also unpacks the evolving landscape of digital identity — from payment certificates to the EU Digital Identity Wallet — and what it means to build systems that can tell a genuine buyer from a scalper running 100,000 simultaneous requests. Episode page ---(00:00) - Introduction (01:19) - The Internet Just Changed — More Bots Than Humans Online (03:51) - The New Threat Isn't Bots vs. Humans. It's Intent. (06:06) - Why Autoscaling Can't Save You in the Agentic Age (09:00) - Making Attacks Expensive — The Economics of Bot Defence (11:02) - What Does the Future Actually Look Like? The AI Agent Buying Your Tickets (14:30) - The Next Generation of Challenges — Easy for Humans, Costly for Bots (18:53) - The Deeper Problem: Volatility Is Going Out of Control (20:24) - Can We Prove You're Human? Identity, Trust & the EU Wallet (25:45) - Rapid Fire (30:07) - Outro Hans J. Skovgaard is Chief Technology and Product Officer at Queue-it, the Copenhagen-founded SaaS company whose virtual waiting room technology helps the world's biggest brands manage traffic surges and prevent bot abuse during high-demand online events. With over two decades of experience leading engineering and product organisations in Nordic software companies, Hans has built a career at the intersection of deep technical expertise and strategic leadership. Before Queue-it, he served as CTPO at Penneo, a Nasdaq Copenhagen-listed RegTech company, and as CTO and VP of R&D at Capture One, where he led the company's spin-off from Phase One, launched its first SaaS product, and shipped Capture One for iPad. Earlier, he held engineering leadership roles at Milestone Systems and Microsoft. He holds an M.Sc. in Artificial Intelligence from the University of Edinburgh and an MBA from IMD, and has published research at AAAI, IEEE, and ACM.This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

    31 min
  6. Running High-Traffic Product Launches at Build-A-Bear with Art Huggard

    FEB 3

    Running High-Traffic Product Launches at Build-A-Bear with Art Huggard

    In this episode of Smooth Scaling, José Quaresma sits down with Art Huggard, former VP of E-Commerce at Build-A-Bear, who transformed the company's online presence from a crashing website to a $70 million business over eight years. Art shares his unconventional path from chemical engineering to e-commerce leadership at Bass Pro Shops, Hudson's Bay, and Build-A-Bear. He reveals how the company went from website crashes every hour during the 2016 holiday season to successfully managing viral product launches like Baby Yoda that sold out in four hours. Art discusses Queue-it's virtual waiting room for handling extreme traffic spikes, real-time system tuning during flash sales, and the importance of balancing technical infrastructure with guest experience. The conversation covers cloud scalability challenges, order management bottlenecks in Salesforce Commerce Cloud, and what it takes to handle 300+ orders per minute. The episode illustrates how preparation and cross-industry lessons can turn unpredictable demand into business success. Episode page ---(00:00) - Welcome to the Smooth Scaling Podcast (01:03) - From Chemical Engineer to Ecommerce Leader (05:01) - How Early Ecommerce Got the Experience Wrong (07:30) - Walking Into a Website That Was Crashing (09:56) - Why Build-A-Bear Isn't Just a Toy Company (12:03) - Using AI to Remove Bottlenecks and Ship Faster (14:43) - COVID, Baby Yoda, and Sudden Demand Spikes (16:05) - What 300 Orders a Minute Really Looks Like (22:38) - Finding the Real Bottlenecks in the Stack (25:29) - From Ammunition to Baby Yoda: Cross-Industry Lessons (27:57) - Book Recommendations and Professional Advice (30:32) - What Scalability Really Means Art Huggard is a leading expert in Digital Commerce. He has helped many well known brands such as Build-A-Bear, Bass Pro Shops, Tracker Boats, Hudson Bay and others move from chaos to High Growth. He has a keen understanding of the entire customer ecosystem including Web, Order Management, CRM, Loyalty and Digital Marketing. Known for building high performance teams Art has been an excellent mentor to many at the companies where he has worked. Most recently Art has formed Gateway-Commerce (www.gateway-commerce.com) where he provides fractional consulting to companies looking to make significant improvements to how they serve their guests.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. © Queue-it, 2026

    32 min
  7. Database Scaling at Intercom: Aurora, PlanetScale & Incident Response with Engineering Director Ryan Sherlock

    JAN 13

    Database Scaling at Intercom: Aurora, PlanetScale & Incident Response with Engineering Director Ryan Sherlock

    In this episode of Smooth Scaling, José Quaresma talks with Ryan Sherlock, Director of Engineering at Intercom, about the realities of scaling databases in a fast-growing SaaS product. Ryan shares Intercom’s journey from a single MySQL database through Aurora, proxies, and per-customer scaling patterns—and what eventually pushed the team toward PlanetScale. The conversation also explores Intercom’s heartbeat-based approach to incident detection and response, focusing on customer impact rather than infrastructure metrics. Episode page ---(00:00) - Intro and episode overview (01:14) - Early scaling pains: systems going down every day (02:56) - Database evolution: MySQL, caching, Aurora, and ProxySQL (07:36) - Tens of billions of rows and the table Intercom couldn’t migrate (09:07) - Intercom’s multi-region architecture and the EU region (10:59) - Why Intercom moved from Aurora to PlanetScale (Vitess) (15:12) - PlanetScale in practice: shards, VTGate, and zero-downtime upgrades (22:39) - Heartbeat metrics and automated incident response (30:03) - AWS outage case study: DynamoDB failure and real-time recovery (34:17) - Incident mitigation lessons: “I’m now a web box” and VTGate limits (41:40) - Rapid fire questions: books, career advice, and scalability mindset Ryan Sherlock is Senior Director of Engineering at Intercom in Dublin, where he leads the core technologies and infrastructure groups that power Intercom’s AI first customer service platform. Through talks and writing on the Intercom engineering blog, he shares practical playbooks on scaling infrastructure and engineering enablement, running high leverage incident response, and using heartbeat metrics to tie reliability directly to real customer outcomes rather than just server graphs. Outside Intercom, he serves on the board of the Rails Foundation, helping steward the future of the Ruby on Rails ecosystem. Before moving into tech leadership, Ryan spent several years as a professional cyclist, an experience he wrote about in “Why you should have skin in the engineering game”, and that still shapes how he thinks about risk, ownership, and reliability in software. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2026

    46 min
  8. Infrastructure as the Product: Designing Data-Heavy Systems with Product VP Maria Petrova

    12/16/2025

    Infrastructure as the Product: Designing Data-Heavy Systems with Product VP Maria Petrova

    Infrastructure is often treated as a backend concern, but in practice it shapes how users experience a product. In this episode of Smooth Scaling, Product VP Maria Petrova explores what it means when infrastructure becomes the product, looking at real-world, data-heavy systems where decisions around compute, data resolution, scheduling, regions, and cost directly impact scalability and user experience. The conversation dives into scaling beyond the MVP, balancing accuracy with performance, and why both engineers and product managers need to think carefully about infrastructure trade-offs when operating at scale. Episode page ---(00:00) - Welcome to the Smooth Scaling Podcast (01:02) - Infrastructure Is the Product (And Why It Shapes UX) (05:34) - Performance, Databases, and Why Compute Matters Again (07:27) - How TWAICE Scales Battery Analytics With Sensor Data (10:44) - What TWAICE Optimizes (And What It Doesn’t) (14:38) - What Product Managers Must Understand About Infrastructure (20:32) - Supermetrics: Multi-Cloud, Compliance, and Customer Expectations (25:01) - Cutting Compute Costs at TWAICE Without Losing Accuracy (32:05) - Principles for Building Scalable Data Products (34:48) - Rapid Fire: Books, Advice, and What Scalability Means Maria Petrova is a product leader known for scaling data-driven platforms and building high-performing product teams.With over a decade of experience across AdTech, eCommerce, and green tech, she’s led teams at Supermetrics, Zalando, Smartly.io, and now TWAICE, where she’s shaping AI-powered energy intelligence solutions. Maria is also the founder of Value Lab, a consultancy that embeds expert product talent into growing teams. She’s passionate about building products that truly solve customer problems at scale. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2025

    40 min

Ratings & Reviews

5
out of 5
2 Ratings

About

Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP Customer Experience & Solutions at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.

You Might Also Like