The Future of Voice AI

Davit Baghdasaryan

In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? voice-ai-newsletter.krisp.ai

  1. VOR 1 TAG

    The Race to 300M AI Agents | David Yang (Co-Founder at Newo.ai)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is David Yang, Co-Founder Newo.ai. David Yang, Ph.D., is a Silicon Valley–based serial entrepreneur and co-founder of Newo.ai. He previously founded ABBYY, a global leader in AI and content intelligence whose technologies serve over 50 million users and thousands of enterprises in 200 countries. Over his career, Dr. Yang has launched more than a dozen companies, contributed to major advances in AI and workplace technology, and has been recognized by the World Economic Forum as one of the top 100 World Technology Pioneers. Newo.ai is a San Francisco–based AI technology company building human-like AI Agents that transform how businesses operate. Founded by AI entrepreneurs David Yang, Ph.D., and Luba Ovtsinnikova, the team brings a track record of launching more than 10 successful companies whose products are used by over 50 million people in 200 countries. Newo.ai’s mission is to unleash the superpowers of small and medium businesses by giving every entrepreneur an AI teammate that never sleeps, never gets tired, and helps turn the impossible into the inevitable. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Building agents at scale is the real moat; the need is hundreds to thousands of production-ready agents per month, not one-offs. * “Production in minutes” matters more than fancy demos; zero-touch setup wins. * Most SMBs will swap IVRs and voicemail for AI receptionists that actually book and drive revenue. * Websites will become conversational; voice and chat agents will greet, qualify, and convert visitors. * Industry templates (patients vs. guests, dental vs. hotels) let one agent fit ~90–95% of use cases out of the box. * Voice is the hardest and most important channel—latency, interruptions, accents, and noise make it 80% of the problem. * The real production hurdle for AI agents was latency; agents need to “think and talk” at once to feel human. * One bad call in ten kills trust and scalability; parallel “observer” agents that fact-check in real time are needed to prevent hallucinated bookings. * Adoption inflects when AI’s “lead success score” approaches human performance; businesses tolerate errors at human-like rates. * Omnichannel isn’t optional for SMB reception; phones, SMS, live chat, and social DMs all feed bookings. * New industries are lighting up weekly; speed of verticalization is a competitive weapon. * The success metric is parity with humans, not perfection; once the lead success score nears human levels, growth takes off. * The near future is practical and paid; AI receptionists that cost little and return 50x in booked revenue will win long before sci-fi visions do. * Long-term, David sees a chunk of the world’s knowledge work shifts to “non-biological” employees, forcing new ethics and norms. * David predicts that that 300 million of the world’s 1 billion knowledge workers could be AI-based in the future. * Humans and machines are moving toward a hybrid future where biological beings have non-biological implants and vice versa. * Early emotional AI like Morpheus was designed with synthetic “oxytocin” and “dopamine” and even used architecture (moving walls) to mirror emotional states. * Robotic pets and AI systems living alongside humans foreshadow non-biological members of society becoming normal. * AI systems aren’t deterministic, raising the need for new ethical frameworks beyond Asimov’s Three Laws. * Morality and shared values will need to be trained into AI, as decisions often fall into gray areas. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    28 Min.
  2. 25. SEPT.

    Fullband 2025: CX Research, AI Agents, and the Frontlines of Voice AI

    In this special edition of the Future of Voice AI series, welcome leading voices on the state of voice AI in CX: - Nicole Kyle of CMP Research on CX market data and shifting priorities - Kwindla Hultman Kramer of Daily on building and scaling voice AI agents - Brent Stevenson of IntouchCX on AI adoption on the frontlines Fullband 2025 brought together research, technology, and frontline leaders to cut through the hype and show where voice AI is actually working today. Here’s the distilled recap. The State of Voice AI in CX Nicole Kyle, Managing Director & Co-Founder of CMP Research Nicole leads groundbreaking research on customer contact and shared why voice remains essential in CX and how priorities are shifting in an AI-driven era. 3 Takeaways: * Voice is still the biggest automation prize because it carries the most volume. * Interest in self-service is high, but adoption lags due to poor experiences. * Leaders are shifting from GenAI hype to use-case deployments and early agentic AI. Stat to remember:Only 3% of customers prefer conversational voice AI for self-service today, driven by quality gaps, not lack of interest. What you can do today:Pick one high-volume voice use case and lift quality: define success, measure completion rate and CSAT, and iterate until adoption rises. Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. The State of Voice AI Agents Kwindla Hultman Kramer, CEO & Co-Founder of Daily Kwindla is pioneering real-time AI agents for voice and video. He unpacked unpacked what it takes to build, scale, and deploy AI agents that actually work. 3 takeaways: * Enterprises moved from curiosity to concrete agent roadmaps in just 12 months. * Constrained, vertical agents (esp. outbound) are finding product–market fit faster than broad platforms. * 2024 solved plumbing (turn-taking, latency); 2025 is about natural conversations and reliable structured data capture. Stat to remember: One in three enterprises is already in production with AI agents. What you can do today:Choose one constrained workflow (e.g., outbound confirmation calls). Define latency and handoff goals, then launch and tune before scaling. The State of Voice AI in BPOs Brent Stevenson, Chief Experience Officer of IntouchCX Brent offers a frontline view of CX transformation, showing how BPOs adopt AI by fixing workflows and blending automation with human expertise. 3 takeaways: * Workflow design and governance are the real blockers, not technology. * BPOs are acting as “AI administrators,” with QA analysts repurposed into agent trainers and prompt engineers. * Agent assist is now table stakes, with translation and accent conversion expanding labor pools and market access. Stat to remember:Agent assist at IntouchCX delivered ~10% AHT reduction, 3–5% CSAT lift, and 20% faster agent ramp. What you can do today:Stand up an “AI QA” function to own prompts, tuning, and bot governance — manage AI like you manage human agents. Q&A The event was jam-packed, and we couldn’t get to every question live. Here are the ones we missed. 1. Nicole, the research shows a lot of shifting priorities. Which do you think will have the biggest long-term impact on how companies invest in voice AI? The need to increase customer adoption of self-service will have the biggest long-term impact on how companies invest in voice AI. Everything depends on the quality of the experience. If voice AI delivers a high-quality interaction, voice becomes the channel with the most to gain—from greater customer adoption of automation to significant cost savings through deflection. But if the solution falls short, it risks damaging the customer experience. It’s a classic case of high risk, high reward. 2. Nicole, where do you see the biggest gaps between executive priorities and the technology that’s actually available today? This isn’t a fun answer but honestly, knowledge base management and governance. Good knowledge is the key ingredient to making an AI (conversational, generative or agentic) function properly. And it’s hard for most customer contact and CX organizations to manage right now. They’re looking for AI solutions that can proactively add to and audit the knowledge base, but there’s a gap in the market right now. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    55 Min.
  3. 18. SEPT.

    End-to-end integrated Voice AI | Neil Hammerton (CEO & Co-Founder, Natterbox)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Neil Hammerton, CEO & Co-Founder at Natterbox. Neil Hammerton is CEO of Natterbox. Neil co-founded the UK telecoms disrupter in 2010 with the aim of transforming the business telephony experience of firms and their customers. Today, Natterbox works with over 250 businesses around the world to improve data integration through CRM within Salesforce. Natterbox enables them to put the telephone at the heart of their customer services strategy and guarantee high standards across their customer services experience. Natterbox is the AI-powered contact center platform redefining how Salesforce-first businesses connect with customers. Drawing on 15+ years of contact center expertise, we help leading organizations to effortlessly incorporate AI into their contact center operations and seamlessly blend AI with their contact center workforce to deliver optimal customer experiences. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Deep CRM-native integration beats bolt-ons because it keeps full context across every call and channel. * Real-time summaries turn each call into structured data the next agent or bot can use on the spot. * Recording and transcribing every call is the foundation for smart routing, compliance, and coaching. * AI should own the simple, high-volume tasks while humans handle exceptions and emotion. * The biggest CX drag is “tell me your story again”; carry context forward and it disappears. * Wait times drop fastest when AI does first response and triage before a human ever picks up. * Let bots update Salesforce during the call so agents don’t burn time on after-call work. * Building your own telephony stack gives control over quality, latency, and feature pace. * Measure success by resolution and customer effort, not just bot containment or call deflection. * Most customers won’t dig through a website; they call—meet them with fast, guided answers. * AI without a clean handoff path back to humans will frustrate users and spike churn. * Automate the top three intents end-to-end first, prove value, then expand the surface area. * Use history plus live intent to route to the right bot or human in seconds, not minutes. * Keep transcripts and actions inside Salesforce so data is secure, searchable, and actionable. * Voice is still the highest-stakes channel; small gains here move CSAT, FCR, and churn in a big way. * Offload repetitive calls to AI and agents get happier, faster, and more effective. * “AI first, human-in-the-loop” is the practical path for the next 12–24 months—not full automation. * The win isn’t flashy AI; it’s consistent outcomes: faster answers, fewer transfers, better follow-through. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    21 Min.
  4. 11. SEPT.

    Beyond Cascades to Speech-to-Speech | Anshul Shrivastava & Kumar Saurav (Co-Founders at Vodex.ai)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guests are Anshul Shrivastava, Co-Founder and CEO, and Kumar Saurav, Co-Founder and CTO, at Vodex.ai. Vodex specializes in Generative AI-powered voice agents that facilitate natural, humanlike conversations with customers. These virtual agents manage the initial phases of customer interactions, offering businesses a scalable and efficient way to handle inbound and outbound sales and collections calls. By personalizing conversations and providing real-time insights, Vodex helps businesses improve engagement and streamline processes. Anshul Shrivastava is the Founder and CEO of Vodex.ai, with 12+ years in the IT industry and a strong focus on AI innovation. He leads Vodex.ai in building global AI solutions, aiming to drive growth and deliver real impact for clients. Anshul views technology as a catalyst for progress and is passionate about shaping the future of AI. Kumar Saurav is the Co-Founder and CTO of Vodex.ai, where he drives the development of generative AI solutions for business. With 13+ years across IT, IoT, Robotics, and AI, he brings both technical depth and business insight to solving client challenges. At Vodex.ai, he focuses on AI-powered outbound call solutions that boost sales, service, and marketing performance, while sharing his expertise through writing and research. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Voice AI still hasn’t had its ChatGPT moment because people hate talking to bots that feel slow or robotic. * Latency is the deal breaker — anything slower than 300ms breaks the illusion of real conversation. * Cascading pipelines lose tone, emotion, and context, making bots sound flat and unreliable. * Speech-to-speech models are the real unlock, combining speed with emotional nuance. * Most voice AI agents are stitched together from ASR, LLM, TTS, and telco layers. * Vodex positions itself as the “Stripe of voice AI” with simple plug-and-play APIs. * Vertical focus matters, and collections is their strongest domain with strict FDCPA compliance. * Naturalness moves revenue, with one Arabic deployment lifting recovery from 45% to 81% in seven days. * Naturalness is not a “nice to have” — it directly drives revenue and customer trust. * The bar is rising fast; in two years robotic-but-functional bots will be unacceptable. * Proven sweet spots for voice AI right now: lead qualification, debt collection, healthcare scheduling, and follow-ups. * Vodex’s origin story shows the shift from slow custom builds to no-code, plug-and-play bots for non-technical users. * Context engineering and AI-on-AI testing are how they handle edge cases and reliability gaps. * The future of voice will run on small, task-specific speech models built for speed and accuracy. * Gen Z decision makers will push companies to embrace talking to systems instead of clicking around apps. * Vodex rejects cold-call spam, betting that contextual, consent-based conversations will define the industry. * Soon, every company will be expected to have a natural voice agent the same way every company is expected to have a website. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    32 Min.
  5. 4. SEPT.

    Inside the Data: The State of Voice in CX Unpacked | Peter Ryan ( Ryan Strategic Advisory)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Peter Ryan, President and Principal Analyst at Ryan Strategic Advisory. Peter Ryan is recognized as one of the world’s leading experts in CX and BPO. Throughout his career, Peter has advised CX outsourcers, contact center clients, national governments, and industry associations on strategic matters like vertical market penetration, service delivery, best practices in technology deployment, and offshore positioning. Ryan Strategic Advisory provides market insight, brand development initiatives, and actionable data for organizations in the customer experience services ecosystem. With two decades of experience, Ryan Strategic Advisory supports outsourcing operators, technology providers, industry associations, and economic development agencies. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * The hype cycle around AI has made it hard for CX leaders to separate real progress from inflated promises. * Adoption of voice AI is moving from concept to mainstream, driven by accuracy, latency improvements, and reliability. * Customers care most about issue resolution, not whether the agent sounds robotic or perfectly human. * One bad phone experience, often caused by language or accent misunderstandings, can permanently lose a customer. * Nearly half of surveyed enterprises are already using AI-powered voice translation, showing trust in its growing value. * About a quarter are experimenting with or adopting AI accent conversion, a big leap from just a few years ago. * Accent technology is not just for customers; it reduces agent stress and helps retain frontline workers. * Better agent retention directly lowers costs tied to recruiting, training, and high attrition. * Frontline agents are often more enthusiastic about accent technology than executives, because it eases real pain in daily calls. * CX leaders see accent and translation tools as a way to improve loyalty by making communication effortless across borders. * Latency in AI responses is no longer the barrier it once was—customers tolerate small delays if accuracy is high. * The biggest risk with AI in CX is overpromising; pragmatic, real-world use cases drive adoption faster than hype. * Failed AI deployments are often rolled back, especially with voice bots that don’t meet expectations. * Real-world case studies are becoming essential for buyers to justify investments in a tight economic climate. * CX voice AI adoption has followed a clear path: noise cancellation first, then accent tools, now translation at scale. * The next wave of adoption depends on showing measurable business outcomes rather than futuristic demos. * AI in CX today is compared to Pentium processors in the 90s: a turning point that accelerates everything once it matures. * Companies that promise realistically and deliver consistently will win long-term trust in a crowded AI market. * The real test of AI in CX isn’t novelty—it’s whether it helps customers resolve issues faster, cheaper, and with less friction. Check out the last week’s article to dive deeper into the data discussed in this episode. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    23 Min.
  6. 21. AUG.

    Voice AI for Frontline Workers | Assaf Asbag (Chief Product & Technology Officer at aiOla)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Assaf Asbag, Chief Technology and Product Officer at aiOla. Assaf Asbag is the CPTO at aiOla, leading AI-driven product innovation and enterprise solutions. He previously served as VP of AI at Playtika, where he built the AI division into a key growth engine. Assaf’s background includes advanced algorithm work at Applied Materials and leadership across engineering and data science teams. He holds B.Sc. and M.Sc. degrees in Electrical and Computer Engineering with a focus on machine learning from Ben-Gurion University, making him a recognized expert in AI and technology strategy. aiOla's patented models and technology supports over 100 languages and discerns jargon, abbreviations, and acronyms, demonstrating a low error rate even in noisy environments. aiOla's purpose-built technology converts manual processes in critical industries into data-driven, paperless, AI-powered workflows through cutting-edge speech recognition. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Turning spoken language into structured data in noisy, multilingual, and jargon-heavy environments is the real differentiator for enterprise voice AI. * Standard ASR models fail in frontline industries due to heavy accents, domain-specific vocabulary, and constant background noise. * Zero-shot keyword spotting from large jargon lists without fine-tuning can drastically cut setup time for specialized speech recognition. * Building proprietary, noise-heavy training datasets is essential for robust ASR performance in the real world. * Synthetic data generation that blends realistic noise with text-to-speech can cheaply scale model adaptation for niche environments. * Real-time processing is critical to making voice the primary human–technology interface, especially for operational workflows. * Voice AI has massive untapped potential among the world’s billion-plus frontline workers, far beyond current call center focus. * Incomplete or missing documentation is a hidden cost that voice-first tools can solve by capturing richer, structured information on the spot. * Effective enterprise AI solutions often require both a core product and flexible integration layers (SDK, API, or full app). * Trustworthy AI for voice will require guardrails, watermarking, bias detection, and context-aware filtering. * The next leap in conversational AI will be personalized, real-time adaptive systems rather than today’s generic emotion mimicking. * Designing for multimodal interaction (voice, text, UI) will be as important as model accuracy for user adoption. * AI revolutions historically create more jobs than they displace, but require new roles in monitoring, reliability, and context engineering. * Future speech AI should emulate human listening: diagnosing issues, correcting in real-time, and adapting based on cues like pace, volume, and accent. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    22 Min.
  7. 7. AUG.

    What to expect in 2025 | Jack Piunti (GTM Lead for Communications at ElevenLabs)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Jack Piunti, GTM Lead for Communications at ElevenLabs. Jack Piunti is the GTM lead for Communications at ElevenLabs, where he oversees go-to-market strategy across CPaaS, CCaaS, UCaaS, and customer experience. With a strong background in consultative technology partnerships and startup growth, Jack brings deep expertise in AI-driven communications. Prior to ElevenLabs, he spent six years at Twilio, helping shape enterprise adoption of real-time voice technologies. He is passionate about the future of connected applications and the role of AI in transforming how we communicate. ElevenLabs is a voice AI company offering ultra-realistic text-to-speech, speech-to-text, voice cloning, multilingual dubbing, and conversational AI tools. Founded in 2022, it enables creators and developers to build voice apps and generate lifelike, emotionally rich speech in 70+ languages. Its latest models support expressive cues and multi-speaker dialogue. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * Most AI failures in conversation don't come from the language model, but from inaccurate speech-to-text at the start. * Bad transcription of critical details like names or codes breaks the entire user experience and can’t easily be recovered. * Accurate speech-to-text is now a make-or-break factor for building reliable AI agents. * Voice will soon replace typing as the main way humans interact with machines because it's more natural and efficient. * Enterprises don’t want to stitch together multiple AI vendors, they want end-to-end platforms that simplify the stack and reduce latency. * Demos often look impressive, but very few companies can scale real-time voice tech reliably in production environments. * AI voice agents that sound expressive aren't enough — turn-taking and accuracy are still bigger challenges. * Most companies ignore accessibility in AI, but modeling things like stuttering actually improves agent behavior. * Streaming speech and voice models will unlock more lifelike, responsive AI agents — and it’s coming fast. * Audio AI needs deep expertise beyond AI, including sound engineering and context-aware modeling of human speech. * There’s a growing trend of AI companies going beyond voice to control the full audio experience, including music and sound effects. * The way voice models are trained is fundamentally different from language models and requires much cleaner training data. * Many agentic AI builders today are forced to cobble together solutions from different vendors, which creates delay and complexity. * True real-time voice AI must handle language switching, emotional cues, and speech disfluencies automatically to feel natural. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    26 Min.
  8. 31. JULI

    Solve First, Then Automate | Bryce Cressy (VP of Strategic Solutions at Nutun)

    In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? This episode’s guest is Bryce Cressy, VP of Strategic Solutions at Nutun. Bryce Cressy is the VP of Strategic Solutions at Nutun, where he leads innovation, AI integration, and process optimization across global CX and collections programs. With deep expertise in partnerships and outsourcing, he helps clients futureproof their contact center operations by combining human talent with transformative technology. Based in South Africa, Bryce is a vocal advocate for the region’s rise as a high-skill BPO hub, and works closely with enterprise leaders in the US and UK to design tailored, tech-forward customer experiences. Nutun is a global BPO headquartered in South Africa, specializing in customer experience and debt collection services for clients in the US, UK, Australia, and beyond. With 30 years of industry experience and a strong foundation in collections, Nutun blends skilled human talent with cutting-edge AI to deliver high-impact, scalable solutions. Nutun is redefining offshore CX by combining local expertise, robust infrastructure, and a commitment to continuous innovation. Recap Video Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates. Takeaways * AI only works when solving specific, targeted problems; using it as a blanket solution guarantees failure. * The term "agentic AI" is being overused without a shared definition, creating more confusion than clarity. * South Africa's time zone, infrastructure, educated talent pool, and English fluency give it a global CX advantage. * Contact center jobs are now aspirational in South Africa, offering career paths from agent to executive. * Voice still dominates support channels, but without Voice AI, BPOs risk becoming obsolete. * Escalation design is the most critical aspect of Voice AI adoption; bad handoffs will break customer trust. * Voice bots should never trap customers in AI-only loops without access to a human. * Companies afraid of AI hallucinations start with agent-assist tools, not bots—it's a low-risk entry point. * Clear audio is make-or-break for AI accuracy, especially in noisy environments like collections. * IVR menus are outdated; conversational routing with AI voice agents is the new standard. * Smart BPOs are flipping the model, letting humans hand off to bots for routine tasks, not the other way around. * Voice AI isn't just a cost play, it's a CX differentiator that drives loyalty and efficiency. * Many vendors sound the same; what matters is whether their tech solves a real, measurable problem. * AI voice agents won't kill human support, it will triage it—handling volume while preserving empathy. * Customers need to know a human is always available or they'll lose confidence in the brand. * The future of BPOs lies in combining process consulting with selective, surgical AI integration. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

    15 Min.

Info

In the Future of Voice AI series of interviews, I ask three questions to my guests: - What problems do you currently see in Enterprise Voice AI? - How does your company solve these problems? - What solutions do you envision in the next 5 years? voice-ai-newsletter.krisp.ai

Das gefällt dir vielleicht auch