Two Voice Devs

Mark and Allen

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

  1. Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

    AUG 29

    Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

    In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI? [00:00:00] Introduction [00:00:45] Google's new experimental TTS model for Gemini [00:01:55] Demo of single-speaker TTS in Google's AI Studio [00:03:05] Code walkthrough for single-speaker TTS [00:04:30] Lack of fine-grained control compared to SSML [00:05:15] Using text cues to shape the TTS output [00:06:20] Demo of multi-speaker TTS with a script [00:09:50] Code walkthrough for multi-speaker TTS [00:11:30] The model is tuned for TTS, not general conversation [00:12:10] Using a separate LLM to generate a script for the TTS model [00:13:30] Code walkthrough of the two-function approach with LangChainJS [00:16:15] LangChainJS integration details [00:19:00] Is Speech Markdown still relevant? [00:21:20] Latency issues with the current TTS model [00:22:00] Caching strategies for TTS [00:23:30] Voice as the natural UI for AI [00:25:30] Outro #Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

    26 min
  2. Episode 252 - GPT-5 First Look: Evolution, Not Revolution

    AUG 15

    Episode 252 - GPT-5 First Look: Evolution, Not Revolution

    Join Allen and Mark as they take a first look at the newly released GPT-5 from OpenAI. They dive into the details of what's new, what's changed, and what's missing, frequently comparing it to other models like Google's Gemini. From the new mini and nano models to the pricing wars with competitors, they cover the landscape of the latest LLM offerings. They also discuss the new features for developers, including verbosity settings and constrained outputs with context-free grammars, and what this means for the future of AI development. Is GPT-5 the leap forward everyone was expecting, or a sign that the rapid pace of AI evolution is starting to plateau? Tune in to find out! [00:00:00] Introduction and the hype around GPT-5 [00:01:00] Overview of GPT-5, mini, and nano models [00:02:00] The new "thinking" model and smart routing [00:03:00] Simplifying models for developers [00:04:00] Reasoning levels vs. Gemini's "thinking budget" [00:06:00] Pricing wars and new models [00:07:00] OpenAI's new open source models [00:08:00] New verbosity setting for developers [00:09:00] Constrained outputs and context-free grammars [00:12:00] Using LLMs to translate to well-defined data structures [00:14:00] Reducing hallucinations and medical applications [00:16:00] Knowledge cutoff dates for the new models [00:18:00] Coding with GPT-5 and IDE integration [00:19:00] More natural conversations with ChatGPT [00:21:00] Missing audio and image modalities vs. Gemini [00:22:00] Community reaction to the GPT-5 release [00:24:00] The future of LLMs: Maturing and plateauing [00:26:00] The need for better developer tools and agentic computing #GPT5 #OpenAI #LLM #AI #ArtificialIntelligence #Developer #TechTalk #Podcast #AIDEvelopment #MachineLearning #FutureOfAI #AGI #GoogleGemini #TwoVoiceDevs

    28 min
  3. Episode 251 - AI Agents: Frameworks and Concepts

    AUG 12

    Episode 251 - AI Agents: Frameworks and Concepts

    Join Mark and Allen in this episode of Two Voice Devs as they explore the fascinating world of AI agents. They break down what agents are, how they work, and what sets them apart from earlier AI technologies. The discussion covers key concepts like "context engineering," and the essential components of an agentic system, including prompts, RAG, memory, tools, and structured outputs. Using a practical example of a prescription management chatbot for veterans, they demonstrate how agents can handle complex tasks. They compare various frameworks for building agents, specifically focusing on OpenAI's Agent SDK (for TypeScript) and Microsoft's Semantic Kernel (for C#). They also touch on other popular frameworks like LangGraph and Google's Agent Developer Kit. Tune in for a detailed comparison of how OpenAI's Agent SDK and Microsoft's Semantic Kernel handle state, tools, and the overall agent lifecycle, and learn what the future holds for these intelligent systems. [00:00:00] - Introduction [00:01:02] - What is an AI Agent? [00:03:12] - Context Engineering and its components [00:06:02] - The role of the Agent Controller [00:08:01] - Agent Mode vs. Agent AI [00:09:36] - Use Case: Prescription Management Chatbot [00:13:42] - Handling Large Lists of Data [00:16:15] - Tools and State Management [00:21:05] - Filtering and Searching with Tools [00:27:08] - Displaying Information and Iterating through lists [00:30:10] - The power of LLMs in Agentic Systems [00:35:18] - Sub-agents and the future of agentic systems [00:38:25] - Comparing different Agent Frameworks [00:39:00] - Wrap up #AIAgents #TwoVoiceDevs #ContextEngineering #OpenAIAgentSDK #SemanticKernel #LangGraph #GoogleADK #LLMs #GenAI #AI #Developer #Podcast #TypeScript #CSharp

    39 min
  4. JUL 24

    Episode 249 - Cracking Copilot and the Mysteries of Microsoft 365

    In this episode, guest host Andrew Connell, a Microsoft MVP of 21 years, joins Allen to unravel the complexities of Microsoft's AI strategy, particularly within the enterprise. They explore the world of Microsoft 365 Copilot, distinguishing it from the broader AI landscape and consumer tools like ChatGPT. Andrew provides an insider's look at how Copilot functions within a secure, private "enclave," leveraging a "Semantic Index" of your organization's data to provide relevant, contextual answers. The conversation then shifts to the developer experience. Discover the different ways developers can extend and customize Copilot, from low-code solutions in Copilot Studio to creating powerful "declarative agents" with JSON and even building "custom engine agents" where you can bring your own models and infrastructure. If you've ever wondered what Microsoft's AI story is for businesses and internal developers, this episode provides a comprehensive and honest overview. Timestamps: [00:00:01] - Introducing guest host Andrew Connell [00:00:54] - What is a Microsoft 365 developer? [00:01:40] - Andrew's journey into the Microsoft ecosystem [00:05:00] - 21 years as a Microsoft MVP [00:06:15] - Enterprise Cloud vs. Developer Cloud [00:08:06] - Microsoft's AI focus for the enterprise [00:10:57] - What is Microsoft 365 Copilot? [00:13:07] - How Copilot ensures data privacy with a "secure enclave" [00:14:58] - Understanding the Semantic Index [00:16:31] - Is Copilot a Retrieval Augmented Generation (RAG) system? [00:17:23] - Responsible AI in the Copilot stack [00:19:19] - The developer story for extending Copilot [00:22:43] - Building declarative agents with JSON and YAML [00:25:05] - Using actions and tools with agents [00:27:00] - How agents are deployed via Microsoft Teams [00:32:48] - Where does Copilot actually run? [00:36:20] - Key takeaways from Microsoft Build [00:41:20] - The spectrum of development: low-code to full-code [00:43:00] - Full control with Custom Engine Agents [00:49:30] - Where to find Andrew Connell online Hashtags: #Microsoft #AI #Copilot #Microsoft365 #Azure #SharePoint #MicrosoftTeams #MVP #Developer #Podcast #Tech #EnterpriseSoftware #CloudComputing #ArtificialIntelligence #Agents #LowCode #NoCode #RAG

    52 min
  5. JUL 17

    Episode 248 - AI Showdown: Gemini CLI vs. Claude Code CLI

    Join Allen Firstenberg and guest host Isaac Johnson, a Google Developer Expert with a deep background in DevOps and SRE, as they dive into the world of command-line AI assistants. In this episode, they compare and contrast two powerful tools: Anthropic's Claude Code CLI and Google's Gemini CLI. Isaac shares his journey from coding with Fortran in the 90s to becoming a GDE, and explains why he often prefers the focused, context-aware power of a CLI tool over crowded IDE integrations. They discuss the pros and cons of each approach, from ease of use and learning curves to the critical importance of using version control as a safety net. The conversation then gets practical with a live demo where both Claude and Gemini are tasked with generating system architecture diagrams for a real-world project. Discover the differences in speed, cost, output, and user experience. Plus, learn how to customize Gemini's behavior with `GEMINI.md` files and explore fascinating use cases beyond just writing code, including podcast production, image generation, and more. [00:00:30] - Introducing the topic: AI assistants in the command line. [00:01:00] - Guest Isaac Johnson's extensive background in tech. [00:03:00] - Why use a CLI tool instead of an IDE plugin? [00:07:30] - Pro Tip: Always use Git with AI coding tools! [00:09:30] - The cost of AI: Comparing Claude's and Gemini's pricing. [00:12:15] - The benefits of Gemini CLI being open source. [00:17:30] - Live Demo: Claude Code CLI generates a system diagram. [00:21:30] - Live Demo: Gemini CLI tackles the same task. [00:27:30] - Customizing your AI with system prompts (`GEMINI.md`). [00:31:30] - Beyond Code: Using CLI tools for podcasting and media generation. [00:40:30] - Where to find and connect with Isaac Johnson. #AI #DeveloperTools #CLI #Gemini #Claude #GoogleCloud #Anthropic #TwoVoiceDevs #TechPodcast #SoftwareDevelopment #DevOps #SRE #AIassistant #Coding #Programming #FirebaseStudio #Imagen #Veo

    42 min
  6. JUL 10

    Episode 247 - Apple's AI Gets Serious

    John Gillilan, our official Apple correspondent, returns to Two Voice Devs to unpack the major announcements from Apple's latest Worldwide Developer Conference (WWDC). After failing to ship the ambitious "Apple Intelligence" features promised last year, how did Apple address the elephant in the room? We dive deep into the new "Foundation Models Framework," which gives developers unprecedented access to on-device LLMs. We explore how features like structured data output with the "Generable" macro, "Tools" for app integration, and trainable "Adapters" are changing the game for developers. We also touch on the revamped speech-to-text, "Visual Intelligence," "Swift Assist" in Xcode, and the mysterious "Private Cloud Compute." Join us as we analyze Apple's AI strategy, the internal reorgs shaping their product future, and the competitive landscape with Google and OpenAI. [00:00:00] Welcome back, John Gillilan! [00:01:00] What was WWDC like from an insider's perspective? [00:06:00] Apple's big miss: What happened to last year's AI promises? [00:12:00] The new Foundation Models Framework [00:16:00] Structured data output with the "Generable" macro [00:19:00] Extending the LLM with "Tools" [00:22:00] Fine-tuning with trainable "Adapters" [00:28:00] Modernized on-device Speech-to-Text [00:29:00] "Visual Intelligence" and app integration [00:32:00] The powerful "call model" block in Shortcuts [00:36:00] Swift Assist and BYO-Model in Xcode [00:39:00] Inside Apple's big AI reorg [00:42:00] The Jony Ive / OpenAI hardware mystery [00:45:00] How Apple, Google, and OpenAI will compete and collaborate #Apple #WWDC #AI #AppleIntelligence #FoundationModels #LLM #OnDeviceAI #Swift #iOSDev #Developer #TechPodcast #TwoVoiceDevs #Siri #SwiftAssist #OpenAI #GoogleGemini #GoogleAndroid

    49 min

Ratings & Reviews

About

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.