Alexa's Input (AI)

Alexa Griffith

Alexa’s Input is a podcast about how technology actually moves forward. Hosted by Alexa Griffith, it features conversations with engineers, founders, CEOs, and leaders shaping today’s tech landscape. Each episode digs into the decisions behind the systems — what’s being built, what’s being questioned, and why it matters now. Opinions are my own Linktree: https://linktr.ee/alexagriffith Website: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/ X: @lexal0u

  1. 4d ago

    How vLLM and llm-d Changed AI Inference with Rob Shaw

    In this episode of Alexa’s Input (AI), I sat down with Rob Shaw from Red Hat to talk about how AI inference evolved from a simple model serving problem into a large-scale distributed systems problem. We explored the infrastructure shifts behind modern LLM serving, including how vLLM and PagedAttention changed the economics and efficiency of inference, why KV cache management became one of the most important bottlenecks in production AI systems, and how orchestration layers like llm-d are emerging to coordinate distributed inference. We also discuss: how LLM inference differs from traditional model serving runtimes KV cache, prefix caching, and cache-aware routing why throughput and latency became major infrastructure challenges long-context agents and repeated inference calls distributed inference on Kubernetes intelligent routing, flow control, and load balancing prefill/decode disaggregation enterprise AI deployment realities vLLM has become one of the most important open-source projects in AI infrastructure, and llm-d represents a newer shift toward treating inference as a coordinated distributed system rather than just a single runtime problem. If you want to better understand the systems layer beneath modern AI applications, this episode is a deep dive into where inference infrastructure is heading next. General Podcast Links Watch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠ More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠ Learn more about the host at Website: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠ Find out more about the guest at: LinkedIn: https://www.linkedin.com/in/robert-shaw-1a01399a/ Red Hat Articles: https://developers.redhat.com/author/robert-shaw Github: https://github.com/robertgshaw2-redhat Resources vLLM Website: https://vllm.ai/ vLLM GitHub Repository: https://github.com/vllm-project/vllm llm-d Website: https://llm-d.ai/ llm-d GitHub Repository - https://github.com/llm-d/llm-d Keywords AI inference, VLLM, LMD, distributed inference, GPU optimization, open source AI, Kubernetes, multi-cluster deployment, AI infrastructure, enterprise AI AI infrastructure, Kubernetes, model optimization, speculative decoding, mixture of experts, AI deployment, performance tuning, AI systems, neural network scaling Key Topics Evolution of vLLM and llm-d Distributed inference and routing GPU utilization and performance optimization Open source AI infrastructure Enterprise deployment challenges and solutions Standardization in Kubernetes for NIC exposure Performance optimizations: quantization and speculative decoding Mixture of experts architecture and parallelism strategies Flow control and request scheduling in AI systems Emerging hardware for AI inference, Cerebras processor Reinforcement learning and AI system support Modular architecture of vLLM and ecosystem projects

    1h 43m
  2. May 24

    Intelligence Per Watt with Emilio Andere

    On this episode of Alexa’s Input (AI), I sit down with Emilio Andere, co-founder and CEO of Wafer, to talk about the future of AI infrastructure, inference optimization, and the economics driving the AI compute race. We discuss: why “intelligence per watt” may become one of the defining metrics of the AI erathe current GPU and accelerator landscape across NVIDIA, AMD, TPUs, and emerging hardware startupswhy software optimization is becoming just as important as hardware itselfinference optimization strategieswhy AI infrastructure companies are racing up the stackwhat it’s actually like building an AI infrastructure startup todayand more! Emilio also shares lessons from founding Wafer, thoughts on the future of open-source AI infrastructure, and why he believes optimizing intelligence itself could become one of the most important engineering problems. General Podcast Links Watch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠ More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠ Learn more about the host at Website: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠ Find out more about the guest at: LinkedIn: https://www.linkedin.com/in/emi-andere/ Wafer Website: https://www.wafer.ai/ Wafer AI / Y Combinator Article: https://www.ycombinator.com/companies/wafer Chapters 00:00 Exploring AI Conversations and Recent Podcasts 02:14 Intelligence per Watt: A New Metric for AI 07:35 The Manifesto: Efficiency in Civilization 12:40 Founding Wafer: The Journey Begins 18:08 The GPU Hardware Landscape and Market Dynamics 23:07 AMD's Growing Presence in the GPU Market 24:07 Emerging Competitors in the AI Hardware Space 26:04 Comparing TPUs and GPUs 27:21 Acquisition and Availability of TPUs 28:33 Navigating the GPU Marketplace 30:05 Understanding Neo Cloud Economics 33:30 The AI Bubble Debate 36:25 Optimizing AI Models for Performance 44:46 Bottlenecks in AI Model Performance 48:08 Future Directions in AI Hardware Optimization 54:39 Balancing Speed and Cost in AI Performance 56:54 Kernel Arena: Benchmarking AI Performance 01:03:45 Lessons from Founding: Sales and Emotional Resilience 01:07:38 The Future of AI: Trends and Predictions 01:13:03 Outro Keywords AI hardware, inference optimization, intelligence per watt, GPU market, AI infrastructure, Wafer, AI bubble, TPU, GPU bottleneck, AI efficiency AI optimization, large language models, AI hardware, quantization, speculative decoding, benchmarking, AI infrastructure, model training, AI startups

    1h 14m
  3. May 17

    Building Reliable Systems at Bloomberg with Sal Furino

    In this episode of Alexa’s Input (AI), I sit down with Sal Furino to explore the hidden engineering work that keeps modern systems reliable. We break down what Service Level Objectives, Indicators (SLOs/SLIs), and error budgets actually mean in practice, why reliability is as much a cultural problem as a technical one, and how teams can better measure real user experience instead of just infrastructure health. Sal also explains reliability engineering and the challenges of reliability at scale, like: Why latency and correctness become harder to measure with GenAIThe difference between a bad incident and a fundamentally bad systemHow observability and telemetry shape modern engineering organizationsWhy most teams focus too much on infrastructure metrics and not enough on user happiness Why “the best systems are the ones nobody notices.”If you work in AI infrastructure, distributed systems, platform engineering, observability, or SRE, this episode is a must listen! SRECon Talk Dashboards & Dragons: Reliability Magic for AI Platforms by Alexa Griffith and Sal Furino: https://youtu.be/aWMB_7ksbkc?si=S49nPyAl_hCUIH7y General Podcast Links Watch: ⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠ ⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠ More: ⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠ Learn more about the host at Website: ⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠ Find out more about the guest at: LinkedIn: https://www.linkedin.com/in/salvatore-furino/ Rootly Interview: https://rootly.com/humans-of-reliability/salvatore-furino Reliability at Scale Talk: https://youtu.be/J-VrU5JHPlk?si=8aV8acy57NWX30KA Bloomberg Careers: https://bloomberg.avature.net/careers/SearchJobs Chapters 00:00 - Introduction: Reliability in a world reshaped by generative AI02:22 - The importance of seamless, background system design04:41 - Becoming a Customer Reliability Engineer at Bloomberg05:17 - Clarifying the CRE role and its customer focus08:02 - The importance of observability and high-scale performance in finance09:00 - Balancing technical and cultural aspects of reliability10:19 - Coaching teams to be proactive using error budgets and SLIs12:21 - The social-technical system: People, processes, and tools13:06 - Mediation of differing opinions on reliability practices15:06 - The nuanced approach to alerting and incident response17:08 - The significance of tiered SLOs and the concept of error budgets21:08 - Using signals like latency, correctness, availability, saturation in system measurement22:53 - The impact of service level "nines" on system design and resilience28:00 - Handling non-determinism and trust in AI responses33:01 - Error budgets and their role in managing deployments34:10 - The challenge of achieving five nines and data durability considerations40:03 - Adapting SLOs for GenAI systems: core principles remain intact42:23 - Measuring non-deterministic AI responses and quality proxies44:41 - The ongoing importance of reliability even in AI/ML contexts47:25 - Reacting to error budget exhaustion and proactive mitigation50:42 - The significance of involving cross-functional teams during outages55:36 - Advocating reliability investment to leadership56:24 - The customer perspective: reliability as a fundamental feature58:42 - Connecting with Sal Furino: where to follow his work and learn more about Bloomberg's engineering culture59:20 - Final advice: Focus on user happiness to avoid common pitfalls in adopting SLOs

    54 min
  4. May 10

    Laila: Reinventing Dating as a Social Marketplace with Kaan Divitoğlu

    In this episode of Alexa’s Input (AI), I sit down with Kaan Divitoğlu, founder of Laila — a New York based startup rethinking online dating as a social marketplace centered around real plans instead of endless swiping. We talk about why traditional dating apps struggle to create real-world connection, how marketplace dynamics shape modern dating behavior, and why Kaan believes the future of dating products is less about “matching soulmates” and more about helping people actually get out on first dates. Kaan shares what he’s learned building a product around something emotional, unpredictable, and deeply human: connection. We also get into:• The metrics behind dating products and user behavior• Why most matches never turn into real dates• Designing around human psychology and social incentives• AI in dating apps — where it helps and where it shouldn’t• The process of building Laila• Social media growth, creator strategies, and startup distribution• Why Kaan thinks apps themselves may eventually disappear Links Watch: ⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠ Listen:⁠⁠ https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠ More: ⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠ Learn more about the host at Website: ⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠ Find out more about the guest at: LinkedIn: https://www.linkedin.com/in/kaan-divitoglu-152779105/ Laila Website: https://laila.nyc Laila Instagram: https://www.instagram.com/laila.social Chapters 00:00 Introduction to Layla and Its Concept 04:10 The Journey of Building Layla 08:43 User Feedback and Validation 13:35 Metrics of Success in Dating Apps 18:23 Differentiation in the Dating App Market 22:54 Understanding User Behavior and Expectations 27:37 Challenges in the Dating Landscape 29:50 Loneliness and Social Skills in Modern Dating 30:51 AI's Role in Dating Apps 34:20 The Future of Dating Apps and User Experience 38:19 Building Community Through Events and Social Media 42:54 Navigating Social Media Marketing 46:00 Rapid Fire Insights on Dating and Relationships 53:33 Outro Keywords dating app, AI, product design, real-world connections, marketplace, user engagement, social media, social tech, startup, innovation

    54 min
  5. Mar 19

    The Creative Founder Mindset with Brady Jordan

    In this episode, Alexa Griffith interviews Brady Jordan, a creative director and entrepreneur, who shares his journey from aspiring software engineer to the founder of Clip Play Media and the photo app Y2Cam. Brady discusses the intersection of creativity and technology, the importance of storytelling in video production, and the challenges of self-employment. He emphasizes the need for resilience, adaptability, and a consumer-first approach in product development, while also exploring the significance of networking and community building in achieving success. Podcast Links Watch: ⁠⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠⁠⁠⁠ https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠⁠ More Links: ⁠⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠⁠ Find out more about the host, Alexa Griffith, at: Website: ⁠⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠ Find out more about the guest at: Website: https://www.bradyjordan.com/ Chapters 00:00 Introduction to Brady Jordan and His Journey 06:45 The Birth of Clip Play Media 14:58 Quality vs. Consistency in Content Creation 24:51 Y2Cam: A Solution to Frustration 30:51 Cost and Infrastructure of App Development 35:30 Navigating the Challenges of Self-Employment 42:51 Marketing Strategies for App Success 49:04 The Value-Based Approach to Creation

    59 min
  6. Feb 17

    Securing the Software Supply Chain with Justin Cappos

    Modern software is built on layers and layers of code. So how do we know we can trust it? In this episode of Alexa’s Input (AI), Alexa Griffith sits down with Justin Cappos, professor of computer science at NYU and a leading expert in software supply chain security, to unpack what trust really means in today’s digital infrastructure. From package managers and dependency chains to large-scale outages and AI systems built on inherited code, Justin explains why many security failures aren’t random accidents, they’re predictable consequences of weak process, misaligned incentives, and insecure design. They discuss: Why security only becomes visible when something breaks The difference between unavoidable failure and negligence How modern software supply chains amplify small mistakes The role of leadership and culture in preventing breaches Why verification systems like TUF and in-toto matter more than ever As AI accelerates development and increases system complexity, the need for verifiable trust only grows. This episode is a practical look at the invisible infrastructure that keeps modern software, and increasingly, modern AI, from collapsing under its own complexity. Podcast Links Watch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠⁠⁠ https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠ More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠ Website: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠ Find out more about the guest at: Website: https://engineering.nyu.edu/faculty/justin-cappos NYU page: https://ssl.engineering.nyu.edu/personalpages/jcappos/ Wikipedia: https://en.wikipedia.org/wiki/Justin_Cappos Chapters 00:00 Introduction to Justin Cappos and His Work 01:17 The Importance of Security in Software Systems 03:50 Understanding Security Breaches: Mistakes vs. System Design Problems 06:34 Cultural Factors in Security Failures 09:25 Justin's Journey in Software Security 12:03 The Role of Academia in Enterprise Security 14:10 Evaluating Enterprise Security Systems 16:58 Foundational Projects in Software Security 19:21 AI Security Concerns and Future Directions 24:59 The Need for MCP 2.0 28:57 Security Challenges with LLMs 32:33 Designing Secure AI Systems 37:14 Ethical Dilemmas in AI Decision-Making 40:17 The Role of AI in Open Source 43:44 Trust and Mindset in AI Security

    49 min
  7. Feb 16

    The Artificial Immune System with Wendy Chin, PureCipher CEO

    As AI systems grow more autonomous, the question is no longer just what they can do, but whether we can trust the data and models behind their decisions. In this episode of Alexa’s Input (AI), Alexa Griffith talks with Wendy Chin, CEO of PureCipher, about building what she calls an artificial immune system for AI, a framework designed to make data, models, and inference tamper-evident across the AI lifecycle. They unpack what data poisoning really means (training data, weights and biases, inference inputs), why small amounts of targeted poison can create outsized model misbehavior, and how generative AI lowers the barrier to sophisticated malware. The conversation expands into the security implications of agent-to-agent communication via MCP, digital twins, and why we don’t have the luxury of “shipping now and securing later.” It’s a wide-ranging discussion that moves from practical threat models to the philosophical frontier of what happens as AI becomes more human-like, and more autonomous. Podcast Links Watch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠⁠⁠ https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠ More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠ Website: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠ Find out more about the guest at: LinkedIn: https://www.linkedin.com/in/wendy-chin-ctg/ Website: https://www.purecipher.com/ Chapters 00:00 Introduction to AI Security 01:16 Understanding Data Poisoning 04:38 The Dangers of Malware in AI 07:46 AI's Moral Dilemmas and Decision Making 08:45 Building Empathy in AI 13:07 The Role of Good Data in AI Training 17:02 PureCypher's Artificial Immune System 22:34 Digital Twins and Their Implications 25:22 Nurturing AI Like a Child 30:53 Data Therapy for AI 36:13 The Future of AI and Human Interaction 38:45 The Dark Side of AI: Hacking and Security 45:03 Global Perspectives on AI Security 48:11 MCP Agents and Security Concerns 51:41 Philosophical Implications of AI and Human Connection 01:00:04 The Sci-Fi Future of AI and Humanity

    1h 6m
  8. Feb 16

    Shipping Agents, Not Vulnerabilities with Ian Webster, PromptFoo CEO

    As LLM apps evolve from simple chatbots to tool-using agents, the attack surface explodes, and the old security playbooks don’t hold. In this episode of Alexa’s Input (AI), Alexa Griffith sits down with Ian Webster, co-founder and CEO of PromptFoo, to break down what AI security actually looks like in practice: automated red teaming, prompt injection and jailbreak testing, evaluation workflows that scale, and why “guardrails alone” is not a security strategy. Ian shares how PromptFoo grew from a side project into a widely adopted open-source standard, what it means to raise multi-millions in a fast-moving market, and how enterprises are approaching the full vulnerability lifecycle, from finding issues to triage, remediation, and validation. Ian also discusses the “lethal trifecta” that makes agents fundamentally risky (untrusted input + sensitive data + exfil path), and why MCP security isn’t just about users and tools, it’s about dangerous tool combinations and rogue servers. Podcast Links Watch: ⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠ Read: ⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠ Listen:⁠⁠⁠ https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠ More: ⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠ Website: ⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠ Find out more about the guest at: PromptFoo Website: https://www.promptfoo.dev/ Github: https://github.com/promptfoo/promptfoo Ian’s LinkedIn: https://www.linkedin.com/in/ianww/ Chapters 00:00 Introduction to AI Security Challenges 02:06 Funding and Growth of PromptFu 06:16 The Genesis of PromptFu 11:05 Career Journey and Lessons Learned 12:53 Understanding AI Red Teaming 17:36 Recent AI Security Vulnerabilities 19:46 The Dual Nature of AI in Security 21:47 Understanding the Lethal Trifecta in AI Security 24:22 Exploring Model Context Protocol (MCP) and Its Security Implications 26:22 Common Security Issues in MCP Systems 28:17 The Role of Identity and Permissions in AI Security 30:00 Practical Implications of Using PromptFoo for Developers 31:33 Evaluating Language Models: Challenges and Techniques 36:34 The Limitations of Guardrails in AI Security 38:25 Best Practices for Engineers in AI Development 39:58 Future Trends in AI and Security 42:28 Everyday Applications of AI and Language Models

    45 min

Ratings & Reviews

5
out of 5
7 Ratings

About

Alexa’s Input is a podcast about how technology actually moves forward. Hosted by Alexa Griffith, it features conversations with engineers, founders, CEOs, and leaders shaping today’s tech landscape. Each episode digs into the decisions behind the systems — what’s being built, what’s being questioned, and why it matters now. Opinions are my own Linktree: https://linktr.ee/alexagriffith Website: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/ X: @lexal0u

You Might Also Like