Smooth Scaling: System Design for High Traffic

Queue-it

Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.

  1. Hype Event Protection: How Akamai & Queue-it Stop Bots at Scale, with Ilia Bromberg & Martin Larsen

    1 DAY AGO

    Hype Event Protection: How Akamai & Queue-it Stop Bots at Scale, with Ilia Bromberg & Martin Larsen

    In this episode of the Smooth Scaling Podcast, Ilia Bromberg (Akamai) and Martin Larsen (Queue-it) explore the evolution of bots, the growing complexity of detecting them, and the real-world impact on hype events like product drops and ticket sales. They introduce Hype Event Protection, a new joint solution from Queue-it and Akamai, designed to level the playing field for genuine users. The discussion covers technical approaches to bot mitigation, performance optimization, and the importance of layered defenses for high-demand online events. Episode page ---(00:00) - Welcome & Guest Introductions (01:01) - Why Bots Are a Problem (04:06) - Good Bots vs. Bad Bots (07:49) - How Bots Have Evolved (11:42) - Bots Move Into E-commerce (13:10) - Residential IPs and Hidden Networks (15:42) - What Is a Hype Event? (18:46) - Why Queue-it and Akamai Partnered (22:20) - Fairness, Trust & Brand Reputation (28:36) - How Hype Event Protection Works (35:59) - Preparing for Big Events (44:34) - Real Results from Beta Customers (46:42) - How to Get Started & Wrap-Up Ilia Bromberg is a Principal Solutions Engineer at Akamai Technologies with nearly 30 years experience helping organizations secure and scale their digital environments. A seasoned leader in web and application security, he has been named Akamai’s Solutions Engineer of the Year and has earned multiple hackathon and innovation awards. He holds CISSP, CCSP, and GWAPT certifications and specializes in WAFs, bot management, API security, DNS, and zero trust technologies.  Martin Larsen is a Distinguished Product Architect at Queue-it. Starting as a software developer, Martin was one of the company’s first employees. He played an instrumental role in building the foundations of Queue-it and is heavily involved in activities including the design, architecture, testing, and deployment of the virtual waiting room, as well as defining and executing on product vision. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.  © Queue-it, 2025

    49 min
  2. Handling 200k Requests Per Second Surges with Zalando SRE Manager Johannes Boumans

    23 SEPT

    Handling 200k Requests Per Second Surges with Zalando SRE Manager Johannes Boumans

    In this episode, Johannes Boumans, Engineering Manager in Zalando’s SRE team, shares how Lounge by Zalando handles daily surges of up to 200,000 requests per second. He discusses the shift from monoliths to microservices, the “you build it, you run it” model, SRE champions, and the trade-offs behind reliability, fairness, and cost. From bot defense to chaos engineering, it’s a deep dive into scaling one of Europe’s largest e-commerce platforms. Episode page ---Johannes Boumans is an Engineering Manager in the SRE organization at Zalando, where he leads reliability efforts for Zalando Lounge, the company’s off-price shopping destination. Over nearly 10 years at Zalando, Johannes has grown from product support into SRE leadership, where he now supports 25 engineering teams in building resilient, fair, and scalable systems. Johannes is passionate about the “you build it, you run it” philosophy and champions practices like chaos engineering, predictive scaling, and bot defense to keep systems reliable. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. Chapters00:00 – Intro01:28 – Zalando: Europe's leading fashion destination02:42 – The company’s rapid tech evolution since 200803:41 – From one team to 25: Johannes’ journey05:48 – How the SRE champions model works08:00 – What reliability really means at Zalando09:27 – From monolith to full DevOps accountability11:32 – What makes Lounge by Zalando unique12:50 – Dealing with massive daily traffic spikes14:05 – Predictive scaling and real-time cost control17:15 – First-come, first-served: fairness at scale22:11 – Solving the challenges of limited inventory25:09 – Combating bots with layered protections27:12 – Trade-offs: performance vs. experience29:38 – Why Lounge doesn’t have a search function31:17 – Advice for engineering managers facing traffic surges34:25 – Chaos testing in production—including turning off zones35:53 – Scaling advice for daily vs. seasonal peaks37:55 – Evaluating virtual waiting rooms for fairness39:30 – Book & mindset recommendations for engineers41:43 – Scalability is… balance, cost, and confidence © Queue-it, 2025

    43 min
  3. Special Episode: The Digital Experiences that Build & Break Trust, with CMO Jillian Als

    9 SEPT

    Special Episode: The Digital Experiences that Build & Break Trust, with CMO Jillian Als

    In this episode of Smooth Scaling, Jillian Als, CMO at Queue-it, unpacks The Age of Online Trust report. She explores why reliability is the license to operate, how trust is earned in drops but lost in buckets, and what 1,000 consumers revealed about their expectations for fairness, transparency, and resilient digital experiences. For technical leaders, the findings confirm that every percentage point of uptime and performance directly impacts trust, loyalty, and long-term business growth. Episode page ---Jillian Als is Chief Marketing Officer at Queue-it, where she leads global marketing efforts to help businesses earn and protect online trust for billions of digital visitors each year. With 15+ years in B2B SaaS marketing, she’s known for her expertise in go-to-market strategy, demand generation, and brand development, as well as her passion for building happy, high-performing teams. A frequent speaker at industry podcasts and events like SaaSiest2025 and Funnel Vision, Jillian brings a deep understanding of consumer behavior and the link between digital performance, transparency, and loyalty. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Intro: Trust, Scale & a Special Guest (01:06) - The Meaning of Reliability (03:08) - Exploring Technical and Commercial Views on Reliability (04:53) - A Deep Dive Into the “Age of Online Trust” Report (07:13) - The Global Survey Methodology (08:13) - The Definition of Online Trust (10:28) - The Ongoing Importance of Trust Beyond Peak Events (12:33) - Key Findings: How Bad Experiences Erode Trust (13:54) - Gen Z’s Higher Trust Expectations (16:26) - Preference for Smooth Experiences Over Speed (18:45) - The Psychology Behind Informed Waiting (21:11) - How Trust Fuels Loyalty, Spend, and Advocacy (26:31) - Technical Takeaways From the Report (29:06) - Rapid Fire Insights on Scalability, Books, and Career Advice © Queue-it, 2025

    32 min
  4. Lessons from Supporting Hundreds of Peak Traffic Events with Praveen Thakur

    26 AUG

    Lessons from Supporting Hundreds of Peak Traffic Events with Praveen Thakur

    In this episode of Smooth Scaling, Jose is joined by Praveen Thakur, Queue-it’s Head of Technical Engagement, APAC who shares what it takes to prepare for and succeed during high-traffic online events. From coordinating mission control rooms to navigating bot threats and post-event analysis, Praveen shares lessons learned from years of hands-on experience with retailers, ticketing providers, and government organizations. The discussion offers a behind-the-scenes look at the technical and organizational decisions that shape successful peak traffic events. Episode page ---Praveen Thakur is Head of Technical Engagement, APAC at Queue-it, where he works closely with teams across the region on technical integration, performance readiness, and post-event analysis. With over 13 years of experience spanning product engineering, consulting, and in-house IT roles, he brings deep expertise in cloud, DevOps, and distributed systems. He’s particularly focused on aligning technology decisions with business goals and building resilient, outcome-oriented teams. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Welcome to the Smooth Scaling Podcast (01:00) - What is technical engagement at Queue-it? (04:03) - How Praveen became head of technical engagement (07:09) - Preparing retailers for peak traffic events (15:11) - Scheduled events vs. 24/7 peak protection (18:21) - Why you might restrict traffic intentionally (20:48) - Inside a mission control “war room” (26:50) - Post-event evaluation & common mistakes (28:14) - Covering the full user journey (30:10) - How the bot landscape has changed (32:22) - There are no bullet proof solutions against bots (34:07) - Rapid-fire questions with Praveen Thakur (37:42) - Wrapping up the episode © Queue-it, 2025

    38 min
  5. Scaling Ticketing Systems for traffic bursts & bots with Line-Up's Barnaby Clark

    12 AUG

    Scaling Ticketing Systems for traffic bursts & bots with Line-Up's Barnaby Clark

    In this episode, Barnaby Clark, CEO of Line-Up, reveals the engineering practices behind resilient ticketing systems that handle real-world demand. Barnaby explains how Line-Up rebuilt their platform from the ground up to meet the complex needs of live events, from unique inventory structures and API scaling to predictive load handling and third-party integrations. Barnaby dives into the evolving threat of bots, the nuances of asynchronous payments, and how to design for bursts in traffic without breaking the customer experience. It’s a practical look at infrastructure, performance, and the unpredictable nature of ticketing at scale. Episode page ---Barnaby Clark is CEO and Co-Founder at Line-Up. He has 12 years of experience designing innovative software products across diverse stacks, scaling and guiding cross-functional teams, building high-growth e-commerce platforms, and overcoming complex software challenges. Line-Up was shortlisted for Best Technology Provider at the British Media Awards, won Seedcamp London and has secured multiple funding rounds from angel investors, institutional backers, and corporate entities. Prior to Line-Up, Barnaby spent 5 years working on Mergers & Acquisitions and private capital fundraising efforts within the technology sector.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Introduction: Designing Scalable Ticketing Systems (00:45) - Barnaby’s Journey to Founding Line-Up (02:15) - Unique Scaling Challenges in Ticket Sales (05:08) - Breaking Down the Ticket Purchase Journey (06:41) - Read vs. Write Operations in Scalability (09:53) - Handling Sudden Traffic Spikes (12:11) - Predictive Scaling and Early User Signals (17:26) - Integrating Third-Party Ticket Sales APIs (20:02) - Payment Providers and Asynchronous Challenges (26:09) - Disaster Recovery and System Protection (30:35) - Tackling Bots and Fraud in Ticketing (35:04) - Rapid-Fire Insights & Recommendations © Queue-it, 2025

    40 min
  6. Multi-Cloud & Hybrid Cloud Strategies & Considerations with Usman Mir

    29 JUL

    Multi-Cloud & Hybrid Cloud Strategies & Considerations with Usman Mir

    In this episode, Usman Mir, Senior Engineering Manager at Queue-it, shares insights into how to evaluate and implement hybrid and multi-cloud strategies. Usman draws on his 10+ years experience in automation and cloud infrastructure, diving into real-world definitions, legal and cost considerations, vendor lock-in risks, and the growing need for cloud-native, containerized setups. From hot-hot setups to data sovereignty, Usman breaks down the trade-offs and the practical steps for moving toward a modern cloud setup. Episode page ---Usman Mir is an experienced IT leader with 15+ years across software development, DevOps, and cloud architecture. Now Senior Engineering Manager at Queue-it, he’s led teams and delivered solutions in industries from retail to telecom. Usman has built ecommerce platforms, managed hybrid and multi-cloud environments, and advised on automation and governance—always bridging the gap between business and tech. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.  This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Welcome to Smooth Scaling with Usman Mir (01:52) - From Commodore 64 to Cloud Architect (02:56) - Public, Private, Hybrid & Multi‑Cloud Explained (06:37) - The Latest Multi‑Cloud Trends (10:29) - Choosing Between Cold & Hot‑Hot Strategies (14:03) - Why Portability Matters in Multi‑Cloud (19:18) - Do You Need DevOps for Multi‑Cloud? (21:49) - How to Plan Your Multi‑Cloud Migration (24:32) - Avoiding Vendor Lock‑In in the Cloud (27:50) - Migration Stories & Key Takeaways © Queue-it, 2025

    39 min
  7. From Chaos to Reliability with Gremlin CEO Kolton Andrus

    1 JUL

    From Chaos to Reliability with Gremlin CEO Kolton Andrus

    In this episode, Kolton Andrus, Founder and CEO of Gremlin deep dives into all things chaos engineering and reliability testing. Kolton shares his journey from leading reliability efforts at Amazon and Netflix to founding Gremlin, an enterprise reliability platform. They discuss what it really takes to build resilient systems, the cultural shift required to prioritize reliability, and how Gremlin is working to reshape accountability in engineering teams. From testing dependencies to aligning incentives, this conversation is packed with real-world insights into scaling systems (and teams) that don't break under pressure.Episode page---Kolton Andrus is the CEO and founder of Gremlin. Prior, he focused on building and operating reliable systems at Netflix and Amazon. At both companies he operated systems at scale, managed company wide incidents and helped build out their respective reliability programs and toolsets. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Intro & Guest: Kolton Andrus (04:20) - Founding Gremlin (2016) (08:47) - Rewarding Invisible Reliability Work (12:27) - Proving Reliability’s Business Value (15:21) - Rethinking the “Chaos Engineering” Label (20:18) - Chaos Testing to Reliability Scores (24:25) - Spreading Reliability Culture Across Teams (28:50) - Safe, Incremental Failure Testing in Prod (33:30) - Load + Fault Testing for Peak Traffic (36:30) - AI’s Opportunities & Risks for Ops (39:30) - Defining Scalability as Elasticity (44:18) - Key Takeaways & Farewell © Queue-it, 2025

    45 min
  8. The Cost of Scaling for Peak Demand with Head of Engineering Martin Jensen

    17 JUN

    The Cost of Scaling for Peak Demand with Head of Engineering Martin Jensen

    In this episode, Martin Jensen, Head of Engineering, breaks down the true cost of scaling for peak demand. He explains the limits of autoscaling, when pre-scaling makes sense, and how tools like virtual waiting rooms are used to handle sudden spikes in traffic. Martin also discusses system bottlenecks, performance trade-offs, and practical strategies for staying in control during high-demand moments like ticket sales, product drops, and popular registrations.Episode page---This episode´s guest is Martin Jensen. Martin Nørskov Jensen is an experienced engineering leader and Head of Engineering at Queue-it. With 15+ years in software development and 5+ years in leadership, he builds agile, high-performing teams focused on collaboration, trust, and engineering excellence. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale. This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. (00:00) - Intro (00:58) - Meet Guest Martin Jensen (02:10) - What exactly *is* peak demand? (03:20) - Real-world peak-traffic examples (05:39) - Auto- vs pre-scaling strategies (07:09) - Scaling limits & hidden costs (10:11) - Virtual waiting rooms explained (13:33) - How queues + scaling fit together (18:45) - CDNs, caches & other toolkits (26:08) - Key take-aways & pro tips (29:32) - Outro © Queue-it, 2025

    30 min

About

Smooth Scaling: System Design for High Traffic focuses on all things scalability, reliability, and performance. Tune in for expert advice on how to scale systems, control costs, boost availability, optimize performance, and get the most out of your tech stack. Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. He’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that remain reliable at scale.