The Future of Voice AI

Davit Baghdasaryan

In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? voice-ai-newsletter.krisp.ai

  1. 12/04/2025

    Real-world problems with STT | Klemen Simonic (Soniox) & Kwindla Kramer (Daily)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guests are Klemen Simonic, Co-Founder & CEO at Soniox, and Kwindla Hultman Kramer, Co-Founder & CEO at Daily. Klemen Simonic is the CEO and Co-Founder of Soniox, where he leads the development of advanced voice AI models built for real-world performance. He brings over 16 years of experience across industry and academia, with a deep focus on artificial intelligence. He has worked on cutting-edge AI systems at Facebook, Google, Stanford University, and the University of Ljubljana. Klemen has been developing AI technologies since his undergraduate years, spanning speech, language, and large-scale knowledge systems. Kwin is CEO and co-founder of Daily, a developer platform for real-time audio, video, and AI. He has been interested in large-scale networked systems and real-time video since his graduate student days at the MIT Media Lab. Before Daily, Kwin helped to found Oblong Industries, which built an operating system for spatial, multi-user, multi-screen, multi-device computing. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Voice AI adoption is slow because real-time transcription still breaks on the most basic parts of a customer call. * Real growth is happening quietly inside call centers, but teams won’t scale until transcription stops causing cascading errors. * Even the top models fail on emails, addresses, and alphanumerics, which are the single points of failure in most B2B workflows. * Consumer-grade demos hide the reality that long, multi-turn conversations still fall apart without rigorous context control. * POC to production fails not because of LLMs, but because engineering teams underestimate context management. * A universal multilingual model can outperform single-language models by transferring entity knowledge across languages. * Mixed-language conversations are the norm worldwide, and current systems break the moment a user switches language. * Latency, accuracy, and cost must be solved at the same time; optimizing only one kills the use case. * Feeding both sides of the conversation into STT gives models more context and improves accuracy. * Domain-specific accuracy matters far more than general accuracy, and most models still fail in specialized environments. * Industry “context boosting” tricks are hacks that break at scale; native learned context inside STT is the only path forward. * Punctuation and intonation directly shape LLM reasoning, and stripping them for speed creates silent failure modes. * Voice AI is shifting from speech-to-text to full speech understanding, and models that don’t evolve won’t survive. * The future points toward fused audio plus LLM architectures that remove the brittle STT handoff entirely. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    40 min
  2. 11/06/2025

    Accent AI’s 85+ NPS Impact in India | James Bednar and Biju Pillai (TTEC)

    In this special edition of the Future of Voice AI series of interviews, we're joined by industry vets to unpack: - How clarity became a measurable KPI for CX quality and trust - How TTEC identified and solved global voice challenges across regions - Real results: customer satisfaction, agent confidence, cost efficiency improvements and more This episode’s guests are TTEC’s James Bednar, VP of Innovation and Product, and Biju Pillai, VP of India Operations. As voice remains the most human and high-stakes channel, global contact centers like TTEC face a growing challenge: how to deliver effortless understanding across accents, environments, and expectations. In this live session, TTEC leaders and Krisp’s CEO unpack the business case for clarity — sharing how they transformed challenges into measurable wins, including how they turned fragmented communication into a unified standard across global operations. You’ll hear what worked, what didn’t, and how AI-driven voice clarity has become a core pillar of TTEC’s customer and agent experience strategy. Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways 1. Clarity drives measurable ROI. * After Krisp deployment, noise complaints dropped 76%, sales conversions rose 26%, and CSAT improved 8%. * These are not pilot numbers, they came from sustained production environments across thousands of agents. 2. Accent conversion unlocks new talent pools. * By eliminating accent barriers in real time, TTEC could hire for skill, not sound. “We don’t want to hire the right accent, we want to hire the right talent,” James said. * This reduced reliance on costly and inconsistent “voice coaching” programs, creating what Pillai called an “always-on coach.” 3. 80+ NPS from offshore delivery proves the point. * An India-based program reached 80+ NPS, with language-barrier reports cut in half (2.6% → 1.2%) and experience scores rising from 90.5% to 95.5%. * Each new Accent Conversion model release (v3.5 → v3.7) corresponded to higher NPS, peaking at 85 in September 2025. 4. Cost efficiency without quality compromise. * Offshore voice delivery using Krisp achieved ~70% cost savings versus onshore U.S. teams. “Clients that once said India isn’t where you go for voice are rethinking that.” 5. Agent wellbeing and empathy improved. * Agents reported lower fatigue, faster understanding, and higher confidence. Biju noted “calls now flow better—agents no longer overcompensate for accent or tone.” * That confidence translated into trust and empathy, making every conversation feel more human. 6. Next frontier: real-time translation and pacing intelligence. * With accent conversion now near full maturity, Krisp is launching Accent Conversion v4.0, tackling pacing and accent leakage. * Inbound accent conversion and real-time translation will soon close the loop to help both agents and customers understand each other. This isn’t just a story about cleaner audio. It’s about turning clarity into confidence, confidence into empathy, and empathy into measurable ROI. As James put it: “These use cases just work. They deliver what’s expected, with almost no effort to deploy.” This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    42 min
  3. 10/16/2025

    Voice AI with 100% function-calling accuracy | Will Bodewes (CEO at Phonely.ai)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Will Bodewes, CEO at Phonely.ai. Will Bodewes is the Co-founder and CEO of Phonely.ai, a Y Combinator–backed startup building conversational phone support powered by AI. A lifelong competitor and creator, he earned a mechanical engineering degree from UNH and launched his first company, Spoke Sound, soon after. Following AI research and travels across Africa, Asia, and the Pacific, Will combined his technical background and curiosity to take on one of tech’s toughest challenges: making AI sound human. Phonely provides AI-powered phone support agents for industries requiring fast, reliable, and human-like AI interactions. Its AI solutions reduce wait times, improve customer experiences, and enable seamless automated conversations. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Voice AI jumped from niche to movement in two years, with young builders driving it. * Reliability at scale beats clever prompts; buyers want systems that just work. * Time-to-value is the moat; months of coding kills deals. * Every AI agent succeeds only if it knows what to say, what to know, and what to do: conversation, context, and action. * Integrations are the choke point; the hard work is plumbing messy CRMs and legacy tools. * Training BPO teams to build on the platform scales better than flying in engineers. * LLMs are the latency bottleneck, so faster tokens = more human conversations. * Groq partnership delivered lower latency and beat big names on some Phonely benchmarks. * “Did the caller detect it wasn’t human?” is a better quality metric than WER. * Phonely claims 100% function-calling accuracy in production, which is what buyers actually feel. * Low ASR confidence should trigger human-like behavior (ask to spell names), not clunky links. * Capturing names, numbers, and addresses is the last-mile blocker; fix this or nothing else matters. * Cascading still wins for business logic; speech-to-speech isn’t reliably deployed in production. * Best near-term wins: customer support with tight FAQs, lead qual, and appointment setting. * Defined outcomes plus A/B testing lets agents match call-center KPIs at 50–70% lower cost. * Enterprise rollout will be gradual (2–3 years) until hallucination fear fades. * The next unlock is LLMs that talk like people while staying fast and precise. * Expect convergence where “voice-to-voice” and cascading blur, but LLMs keep the reasoning core. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    23 min
  4. 10/02/2025

    The Race to 300M AI Agents | David Yang (Co-Founder at Newo.ai)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is David Yang, Co-Founder Newo.ai. David Yang, Ph.D., is a Silicon Valley–based serial entrepreneur and co-founder of Newo.ai. He previously founded ABBYY, a global leader in AI and content intelligence whose technologies serve over 50 million users and thousands of enterprises in 200 countries. Over his career, Dr. Yang has launched more than a dozen companies, contributed to major advances in AI and workplace technology, and has been recognized by the World Economic Forum as one of the top 100 World Technology Pioneers. Newo.ai is a San Francisco–based AI technology company building human-like AI Agents that transform how businesses operate. Founded by AI entrepreneurs David Yang, Ph.D., and Luba Ovtsinnikova, the team brings a track record of launching more than 10 successful companies whose products are used by over 50 million people in 200 countries. Newo.ai’s mission is to unleash the superpowers of small and medium businesses by giving every entrepreneur an AI teammate that never sleeps, never gets tired, and helps turn the impossible into the inevitable. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Building agents at scale is the real moat; the need is hundreds to thousands of production-ready agents per month, not one-offs. * “Production in minutes” matters more than fancy demos; zero-touch setup wins. * Most SMBs will swap IVRs and voicemail for AI receptionists that actually book and drive revenue. * Websites will become conversational; voice and chat agents will greet, qualify, and convert visitors. * Industry templates (patients vs. guests, dental vs. hotels) let one agent fit ~90–95% of use cases out of the box. * Voice is the hardest and most important channel—latency, interruptions, accents, and noise make it 80% of the problem. * The real production hurdle for AI agents was latency; agents need to “think and talk” at once to feel human. * One bad call in ten kills trust and scalability; parallel “observer” agents that fact-check in real time are needed to prevent hallucinated bookings. * Adoption inflects when AI’s “lead success score” approaches human performance; businesses tolerate errors at human-like rates. * Omnichannel isn’t optional for SMB reception; phones, SMS, live chat, and social DMs all feed bookings. * New industries are lighting up weekly; speed of verticalization is a competitive weapon. * The success metric is parity with humans, not perfection; once the lead success score nears human levels, growth takes off. * The near future is practical and paid; AI receptionists that cost little and return 50x in booked revenue will win long before sci-fi visions do. * Long-term, David sees a chunk of the world’s knowledge work shifts to “non-biological” employees, forcing new ethics and norms. * David predicts that that 300 million of the world’s 1 billion knowledge workers could be AI-based in the future. * Humans and machines are moving toward a hybrid future where biological beings have non-biological implants and vice versa. * Early emotional AI like Morpheus was designed with synthetic “oxytocin” and “dopamine” and even used architecture (moving walls) to mirror emotional states. * Robotic pets and AI systems living alongside humans foreshadow non-biological members of society becoming normal. * AI systems aren’t deterministic, raising the need for new ethical frameworks beyond Asimov’s Three Laws. * Morality and shared values will need to be trained into AI, as decisions often fall into gray areas. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    28 min
  5. 09/25/2025

    Fullband 2025: CX Research, AI Agents, and the Frontlines of Voice AI

    In this special edition of the Future of Voice AI series, welcome leading voices on the state of voice AI in CX: - Nicole Kyle of CMP Research on CX market data and shifting priorities - Kwindla Hultman Kramer of Daily on building and scaling voice AI agents - Brent Stevenson of IntouchCX on AI adoption on the frontlines Fullband 2025 brought together research, technology, and frontline leaders to cut through the hype and show where voice AI is actually working today. Here’s the distilled recap. The State of Voice AI in CX Nicole Kyle, Managing Director & Co-Founder of CMP Research Nicole leads groundbreaking research on customer contact and shared why voice remains essential in CX and how priorities are shifting in an AI-driven era. 3 Takeaways: * Voice is still the biggest automation prize because it carries the most volume. * Interest in self-service is high, but adoption lags due to poor experiences. * Leaders are shifting from GenAI hype to use-case deployments and early agentic AI. Stat to remember:Only 3% of customers prefer conversational voice AI for self-service today, driven by quality gaps, not lack of interest. What you can do today:Pick one high-volume voice use case and lift quality: define success, measure completion rate and CSAT, and iterate until adoption rises. Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. The State of Voice AI Agents Kwindla Hultman Kramer, CEO & Co-Founder of Daily Kwindla is pioneering real-time AI agents for voice and video. He unpacked unpacked what it takes to build, scale, and deploy AI agents that actually work. 3 takeaways: * Enterprises moved from curiosity to concrete agent roadmaps in just 12 months. * Constrained, vertical agents (esp. outbound) are finding product–market fit faster than broad platforms. * 2024 solved plumbing (turn-taking, latency); 2025 is about natural conversations and reliable structured data capture. Stat to remember: One in three enterprises is already in production with AI agents. What you can do today:Choose one constrained workflow (e.g., outbound confirmation calls). Define latency and handoff goals, then launch and tune before scaling. The State of Voice AI in BPOs Brent Stevenson, Chief Experience Officer of IntouchCX Brent offers a frontline view of CX transformation, showing how BPOs adopt AI by fixing workflows and blending automation with human expertise. 3 takeaways: * Workflow design and governance are the real blockers, not technology. * BPOs are acting as “AI administrators,” with QA analysts repurposed into agent trainers and prompt engineers. * Agent assist is now table stakes, with translation and accent conversion expanding labor pools and market access. Stat to remember:Agent assist at IntouchCX delivered ~10% AHT reduction, 3–5% CSAT lift, and 20% faster agent ramp. What you can do today:Stand up an “AI QA” function to own prompts, tuning, and bot governance — manage AI like you manage human agents. Q&A The event was jam-packed, and we couldn’t get to every question live. Here are the ones we missed. 1. Nicole, the research shows a lot of shifting priorities. Which do you think will have the biggest long-term impact on how companies invest in voice AI? The need to increase customer adoption of self-service will have the biggest long-term impact on how companies invest in voice AI. Everything depends on the quality of the experience. If voice AI delivers a high-quality interaction, voice becomes the channel with the most to gain—from greater customer adoption of automation to significant cost savings through deflection. But if the solution falls short, it risks damaging the customer experience. It’s a classic case of high risk, high reward. 2. Nicole, where do you see the biggest gaps between executive priorities and the technology that’s actually available today? This isn’t a fun answer but honestly, knowledge base management and governance. Good knowledge is the key ingredient to making an AI (conversational, generative or agentic) function properly. And it’s hard for most customer contact and CX organizations to manage right now. They’re looking for AI solutions that can proactively add to and audit the knowledge base, but there’s a gap in the market right now. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    55 min
  6. 09/18/2025

    End-to-end integrated Voice AI | Neil Hammerton (CEO & Co-Founder, Natterbox)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Neil Hammerton, CEO & Co-Founder at Natterbox. Neil Hammerton is CEO of Natterbox. Neil co-founded the UK telecoms disrupter in 2010 with the aim of transforming the business telephony experience of firms and their customers. Today, Natterbox works with over 250 businesses around the world to improve data integration through CRM within Salesforce. Natterbox enables them to put the telephone at the heart of their customer services strategy and guarantee high standards across their customer services experience. Natterbox is the AI-powered contact center platform redefining how Salesforce-first businesses connect with customers. Drawing on 15+ years of contact center expertise, we help leading organizations to effortlessly incorporate AI into their contact center operations and seamlessly blend AI with their contact center workforce to deliver optimal customer experiences. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Deep CRM-native integration beats bolt-ons because it keeps full context across every call and channel. * Real-time summaries turn each call into structured data the next agent or bot can use on the spot. * Recording and transcribing every call is the foundation for smart routing, compliance, and coaching. * AI should own the simple, high-volume tasks while humans handle exceptions and emotion. * The biggest CX drag is “tell me your story again”; carry context forward and it disappears. * Wait times drop fastest when AI does first response and triage before a human ever picks up. * Let bots update Salesforce during the call so agents don’t burn time on after-call work. * Building your own telephony stack gives control over quality, latency, and feature pace. * Measure success by resolution and customer effort, not just bot containment or call deflection. * Most customers won’t dig through a website; they call—meet them with fast, guided answers. * AI without a clean handoff path back to humans will frustrate users and spike churn. * Automate the top three intents end-to-end first, prove value, then expand the surface area. * Use history plus live intent to route to the right bot or human in seconds, not minutes. * Keep transcripts and actions inside Salesforce so data is secure, searchable, and actionable. * Voice is still the highest-stakes channel; small gains here move CSAT, FCR, and churn in a big way. * Offload repetitive calls to AI and agents get happier, faster, and more effective. * “AI first, human-in-the-loop” is the practical path for the next 12–24 months—not full automation. * The win isn’t flashy AI; it’s consistent outcomes: faster answers, fewer transfers, better follow-through. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    21 min
  7. 09/11/2025

    Beyond Cascades to Speech-to-Speech | Anshul Shrivastava & Kumar Saurav (Co-Founders at Vodex.ai)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guests are Anshul Shrivastava, Co-Founder and CEO, and Kumar Saurav, Co-Founder and CTO, at Vodex.ai. Vodex specializes in Generative AI-powered voice agents that facilitate natural, humanlike conversations with customers. These virtual agents manage the initial phases of customer interactions, offering businesses a scalable and efficient way to handle inbound and outbound sales and collections calls. By personalizing conversations and providing real-time insights, Vodex helps businesses improve engagement and streamline processes. Anshul Shrivastava is the Founder and CEO of Vodex.ai, with 12+ years in the IT industry and a strong focus on AI innovation. He leads Vodex.ai in building global AI solutions, aiming to drive growth and deliver real impact for clients. Anshul views technology as a catalyst for progress and is passionate about shaping the future of AI. Kumar Saurav is the Co-Founder and CTO of Vodex.ai, where he drives the development of generative AI solutions for business. With 13+ years across IT, IoT, Robotics, and AI, he brings both technical depth and business insight to solving client challenges. At Vodex.ai, he focuses on AI-powered outbound call solutions that boost sales, service, and marketing performance, while sharing his expertise through writing and research. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Voice AI still hasn’t had its ChatGPT moment because people hate talking to bots that feel slow or robotic. * Latency is the deal breaker — anything slower than 300ms breaks the illusion of real conversation. * Cascading pipelines lose tone, emotion, and context, making bots sound flat and unreliable. * Speech-to-speech models are the real unlock, combining speed with emotional nuance. * Most voice AI agents are stitched together from ASR, LLM, TTS, and telco layers. * Vodex positions itself as the “Stripe of voice AI” with simple plug-and-play APIs. * Vertical focus matters, and collections is their strongest domain with strict FDCPA compliance. * Naturalness moves revenue, with one Arabic deployment lifting recovery from 45% to 81% in seven days. * Naturalness is not a “nice to have” — it directly drives revenue and customer trust. * The bar is rising fast; in two years robotic-but-functional bots will be unacceptable. * Proven sweet spots for voice AI right now: lead qualification, debt collection, healthcare scheduling, and follow-ups. * Vodex’s origin story shows the shift from slow custom builds to no-code, plug-and-play bots for non-technical users. * Context engineering and AI-on-AI testing are how they handle edge cases and reliability gaps. * The future of voice will run on small, task-specific speech models built for speed and accuracy. * Gen Z decision makers will push companies to embrace talking to systems instead of clicking around apps. * Vodex rejects cold-call spam, betting that contextual, consent-based conversations will define the industry. * Soon, every company will be expected to have a natural voice agent the same way every company is expected to have a website. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    32 min
  8. 09/04/2025

    Inside the Data: The State of Voice in CX Unpacked | Peter Ryan ( Ryan Strategic Advisory)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Peter Ryan, President and Principal Analyst at Ryan Strategic Advisory. Peter Ryan is recognized as one of the world’s leading experts in CX and BPO. Throughout his career, Peter has advised CX outsourcers, contact center clients, national governments, and industry associations on strategic matters like vertical market penetration, service delivery, best practices in technology deployment, and offshore positioning. Ryan Strategic Advisory provides market insight, brand development initiatives, and actionable data for organizations in the customer experience services ecosystem. With two decades of experience, Ryan Strategic Advisory supports outsourcing operators, technology providers, industry associations, and economic development agencies. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * The hype cycle around AI has made it hard for CX leaders to separate real progress from inflated promises. * Adoption of voice AI is moving from concept to mainstream, driven by accuracy, latency improvements, and reliability. * Customers care most about issue resolution, not whether the agent sounds robotic or perfectly human. * One bad phone experience, often caused by language or accent misunderstandings, can permanently lose a customer. * Nearly half of surveyed enterprises are already using AI-powered voice translation, showing trust in its growing value. * About a quarter are experimenting with or adopting AI accent conversion, a big leap from just a few years ago. * Accent technology is not just for customers; it reduces agent stress and helps retain frontline workers. * Better agent retention directly lowers costs tied to recruiting, training, and high attrition. * Frontline agents are often more enthusiastic about accent technology than executives, because it eases real pain in daily calls. * CX leaders see accent and translation tools as a way to improve loyalty by making communication effortless across borders. * Latency in AI responses is no longer the barrier it once was—customers tolerate small delays if accuracy is high. * The biggest risk with AI in CX is overpromising; pragmatic, real-world use cases drive adoption faster than hype. * Failed AI deployments are often rolled back, especially with voice bots that don’t meet expectations. * Real-world case studies are becoming essential for buyers to justify investments in a tight economic climate. * CX voice AI adoption has followed a clear path: noise cancellation first, then accent tools, now translation at scale. * The next wave of adoption depends on showing measurable business outcomes rather than futuristic demos. * AI in CX today is compared to Pentium processors in the 90s: a turning point that accelerates everything once it matures. * Companies that promise realistically and deliver consistently will win long-term trust in a crowded AI market. * The real test of AI in CX isn’t novelty—it’s whether it helps customers resolve issues faster, cheaper, and with less friction. Check out the last week’s article to dive deeper into the data discussed in this episode. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    23 min

About

In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? voice-ai-newsletter.krisp.ai