Two Voice Devs

Mark and Allen

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

  1. 25 juin

    Set the scene with Gemini TTS

    Roll tape and prompt! In this episode of Two Voice Devs, Allen and Mark explore how Google’s new advanced prompting guidelines turn developers into voice directors for Gemini Text-to-Speech. Instead of coding rigid SSML tags, you can now establish a scene, write stage directions, and give "director's notes" to shape a base voice's gender, accent, style, and pacing. Allen showcases a web app where he directs a single base voice—to play two entirely different characters: a rough Brooklyn cab driver and a classic Southern belle. The hosts discuss using natural language audio tags as cues for laughter, sighs, gasps, and more, and how these theatrical controls are coming alive in real-time with Gemini Live and Gemini 3.1 Flash TTS. Learn more: * https://ai.google.dev/gemini-api/docs/speech-generation [00:00:05] Welcome to Two Voice Devs [00:00:27] Intro to Gemini Text-to-Speech and Advanced Prompting [00:01:57] Moving Beyond SSML to Flexible Base Voices [00:03:07] Prompting Genders and Accents (The Storytelling Analogy) [00:04:40] Web App Demo: Zephyr as a Brooklyn Cab Driver vs. Southern Belle [00:06:50] Building Multi-Voice Conversations with Stage Directions [00:08:41] Using Natural Language Audio Tags for Expressive Cues [00:11:02] Gemini Live Integration and Dynamic Tone Selection [00:12:27] Model Details: Gemini 3.1 Flash TTS Preview and Release Info [00:13:53] Wrap-up and Call for Feedback Hashtags: #GeminiTTS #TextToSpeech #GenerativeAI #GoogleDeepMind #GeminiLive #GeminiFlash #AIStudio #DeveloperTools #SpeechSynthesis #VoiceFirst #AdvancedPrompting Episode 275

    15 min
  2. 11 juin

    Project Solara: Welcome to Agent-First Hardware

    After months of conferences and busy schedules, Mark Tucker and Allen Firstenberg return to discuss Microsoft’s surprising Build conference announcement: Project Solara. Moving from the legacy voice-first consumer world of Amazon Alexa and Google Assistant, Microsoft is pioneering a secure, business-focused "Agent-first" platform. In this episode, we unpack Microsoft's two new concept devices, a desktop smart display and a wearable camera-equipped badge, and explore the Android Open Source Project (AOSP)-based platform behind them: the Microsoft Device Ecosystem Platform (MDEP). We discuss how Project Solara integrates enterprise security standards like Intune, Windows Hello for Business, and Entra ID to allow agents to act on behalf of authenticated users. We also dive into the future-proof promise of "Just In Time UI" (Generative UI) which dynamically adapts interfaces to any form factor, and explore how these agentic tools could liberate deskless workers from being "slaves to a slab of glass." More Info: * https://commandline.microsoft.com/project-solara-build-2026/ Timestamps: [00:00:00] Intro & Catching Up [00:00:49] Transitioning from Voice-First (Alexa/Assistant) to Agent-First [00:01:35] Designing for Echo Show and Google Assistant vs. GenAI [00:02:37] Project Solara: Custom Agentic Devices for Business [00:03:09] Google Glass & the Early Spark for Enterprise Use Cases [00:04:30] Smart Displays and Wearable Badge Concept Hardware [00:05:12] Built on Android (AOSP) vs. Google's Android XR [00:05:46] Security: Microsoft MDEP, Intune, and Alexa for Business [00:07:10] Bring Your Own Agent (BYOA) on Azure [00:08:41] Just-In-Time UI & Generative UI [00:12:09] Developer Availability and Future Outlook [00:13:26] Rethinking Computers: Lessons from Google Glass & Assistant [00:14:32] Wrap Up and Future Form Factors (Watches, Rings, Glasses) #ProjectSolara #MicrosoftBuild #AgentFirst #VoiceFirst #MDEP #GenerativeUI #GenUI #AOSP #BYOA #EnterpriseTech #TwoVoiceDevs Episode 274

    16 min
  3. Episode 270 - Beyond the Big Three: Open Models, Agents, & the Future of Devs

    5 mars

    Episode 270 - Beyond the Big Three: Open Models, Agents, & the Future of Devs

    In part two of this insightful conversation, Allen and Sam Witteveen dive deep into the rapidly expanding world of AI models beyond the "big three." They explore the impact of open-weight and Chinese models like DeepSeek, Mistral, and Qwen, discussing their impressive efficiency and coding capabilities. The conversation shifts to the rise of agentic workflows and how tools like Claude Code are fundamentally changing the day-to-day lives of developers. They also tackle the tough questions: Are junior developers being replaced? Is AI just the next level of abstraction in programming? Finally, they cover the enterprise side of AI, from on-premise deployments to the evolving landscape of prompt engineering and observability frameworks like LangChain. Timestamps: [00:00:00] Introduction [00:00:49] Exploring Open Weights and Chinese Models [00:03:41] The Value of "Thinking" Models and Distillation [00:06:41] Running Models Locally [00:08:34] The Shift Towards Agentic Workflows [00:12:17] How AI is Changing the Role of Developers [00:29:04] AI as the Next Level of Abstraction [00:35:00] Best Models for Tool Calling and Coding [00:39:04] On-Premise Models and Enterprise Solutions [00:44:49] The Future of Prompt Engineering and LangChain [00:48:37] Outro and Where to Find Sam Hashtags: #TwoVoiceDevs #AI #OpenWeights #DeepSeek #Mistral #Qwen #ClaudeCode #Gemini #LangChain #SoftwareEngineering #AgenticAI #MachineLearning

    49 min
  4. Episode 269 - The "Big Three" AI Models and Training Evolution

    3 mars

    Episode 269 - The "Big Three" AI Models and Training Evolution

    In Part 1 of a two-part series, guest host Sam Witteveen joins Allen to catch up and dive deep into the rapidly evolving world of AI models. Sam shares his fascinating journey from being a successful pop songwriter to becoming a Machine Learning Google Developer Expert (GDE) and running the massive Machine Learning Singapore meetup. The conversation shifts to the latest AI developments, exploring the "Big Three" model builders—Anthropic, OpenAI, and Google. Sam and Allen discuss the frenetic pace of new model releases, changes to the Gemini 3 API, and how developers navigate the trade-offs between intelligence, latency, and cost. Finally, they pull back the curtain on how these models are actually trained today. Discover why models are no longer trying to be "fact machines" and how post-training breakthroughs, code execution sandboxes, and Reinforcement Learning (RL) environments are dramatically improving AI capabilities. Stay tuned for the end of the episode, where they hint at what's coming in Part 2! Timestamps: [00:00:00] Introduction and catching up [00:01:33] Sam's fascinating journey from pop music to machine learning [00:05:23] Running the massive Machine Learning Singapore meetup [00:07:42] Stumbling into YouTube and teaching AI with Google Colab [00:12:38] Analyzing the "Big Three" AI models and rapid release cycles [00:17:52] Gemini 3 API updates, Flash models, and thinking levels [00:22:00] Tool use, knowledge cutoffs, and why LLMs aren't fact machines [00:26:00] How post-training and code sandboxes revolutionized AI [00:32:00] Scaling Reinforcement Learning (RL) environments for design [00:34:04] Structured outputs and the return to predictable rules [00:36:43] Tune in next time for more! And where to find Sam online Hashtags: #TwoVoiceDevs #AI #MachineLearning #DeepLearning #LLM #GoogleGemini #Gemini #OpenAI #ChatGPT #Anthropic #Claude #ReinforcementLearning #RAG #Developers #SamWitteveen

    38 min

Notes et avis

3
sur 5
2 notes

À propos

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.