Two Voice Devs

Mark and Allen

3,0 (2)
Technologies
Chaque semaine

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

25 juin

Set the scene with Gemini TTS

Roll tape and prompt! In this episode of Two Voice Devs, Allen and Mark explore how Google’s new advanced prompting guidelines turn developers into voice directors for Gemini Text-to-Speech. Instead of coding rigid SSML tags, you can now establish a scene, write stage directions, and give "director's notes" to shape a base voice's gender, accent, style, and pacing. Allen showcases a web app where he directs a single base voice—to play two entirely different characters: a rough Brooklyn cab driver and a classic Southern belle. The hosts discuss using natural language audio tags as cues for laughter, sighs, gasps, and more, and how these theatrical controls are coming alive in real-time with Gemini Live and Gemini 3.1 Flash TTS. Learn more: * https://ai.google.dev/gemini-api/docs/speech-generation [00:00:05] Welcome to Two Voice Devs [00:00:27] Intro to Gemini Text-to-Speech and Advanced Prompting [00:01:57] Moving Beyond SSML to Flexible Base Voices [00:03:07] Prompting Genders and Accents (The Storytelling Analogy) [00:04:40] Web App Demo: Zephyr as a Brooklyn Cab Driver vs. Southern Belle [00:06:50] Building Multi-Voice Conversations with Stage Directions [00:08:41] Using Natural Language Audio Tags for Expressive Cues [00:11:02] Gemini Live Integration and Dynamic Tone Selection [00:12:27] Model Details: Gemini 3.1 Flash TTS Preview and Release Info [00:13:53] Wrap-up and Call for Feedback Hashtags: #GeminiTTS #TextToSpeech #GenerativeAI #GoogleDeepMind #GeminiLive #GeminiFlash #AIStudio #DeveloperTools #SpeechSynthesis #VoiceFirst #AdvancedPrompting Episode 275

15 min
11 juin

Project Solara: Welcome to Agent-First Hardware

After months of conferences and busy schedules, Mark Tucker and Allen Firstenberg return to discuss Microsoft’s surprising Build conference announcement: Project Solara. Moving from the legacy voice-first consumer world of Amazon Alexa and Google Assistant, Microsoft is pioneering a secure, business-focused "Agent-first" platform. In this episode, we unpack Microsoft's two new concept devices, a desktop smart display and a wearable camera-equipped badge, and explore the Android Open Source Project (AOSP)-based platform behind them: the Microsoft Device Ecosystem Platform (MDEP). We discuss how Project Solara integrates enterprise security standards like Intune, Windows Hello for Business, and Entra ID to allow agents to act on behalf of authenticated users. We also dive into the future-proof promise of "Just In Time UI" (Generative UI) which dynamically adapts interfaces to any form factor, and explore how these agentic tools could liberate deskless workers from being "slaves to a slab of glass." More Info: * https://commandline.microsoft.com/project-solara-build-2026/ Timestamps: [00:00:00] Intro & Catching Up [00:00:49] Transitioning from Voice-First (Alexa/Assistant) to Agent-First [00:01:35] Designing for Echo Show and Google Assistant vs. GenAI [00:02:37] Project Solara: Custom Agentic Devices for Business [00:03:09] Google Glass & the Early Spark for Enterprise Use Cases [00:04:30] Smart Displays and Wearable Badge Concept Hardware [00:05:12] Built on Android (AOSP) vs. Google's Android XR [00:05:46] Security: Microsoft MDEP, Intune, and Alexa for Business [00:07:10] Bring Your Own Agent (BYOA) on Azure [00:08:41] Just-In-Time UI & Generative UI [00:12:09] Developer Availability and Future Outlook [00:13:26] Rethinking Computers: Lessons from Google Glass & Assistant [00:14:32] Wrap Up and Future Form Factors (Watches, Rings, Glasses) #ProjectSolara #MicrosoftBuild #AgentFirst #VoiceFirst #MDEP #GenerativeUI #GenUI #AOSP #BYOA #EnterpriseTech #TwoVoiceDevs Episode 274

16 min
4 juin

New Horizons for Android: XR, MCP, and Agents

Allen and Mike record live from Google I/O in the Builders podcast space. They discuss their impressions of this year's conference, the evolution of I/O over the years, and the big announcements from the keynote. Key topics include Gemini's "any output from any input" vision, how the new NanoBanana and Omni models are different than Imagen and Veo, the state of Android XR development, and the introduction of App Functions (Android MCP) for better AI agent integration. They also share their thoughts on the new Gemini app UI and what they hope to see in the world of wearables by next year. More info: * Android XR Developer Program: https://developer.android.com/develop/xr/catalyst [00:00:11] Live from Google I/O Builders Podcast Space [00:00:37] Reflections on I/O over the years [00:02:01] Gemini's "Any Input to Any Output" Vision [00:02:54] What's the big deal with NanoBanana and Omni? [00:03:41] Android XR and the future of intelligent eyewear [00:06:02] New Android developer tools and AI coding agents [00:08:29] App Functions and Android MCP [00:13:08] Spark, Halo, and AI agents on Android [00:15:07] The new Gemini app UI and design feedback [00:17:26] Looking ahead: Hopes for I/O 2027 and wearables #GoogleIO #GeminiAI #AndroidXR #AndroidMCP #AppFunctions #GoogleGlass #TwoVoiceDevs #AIAgents #AndroidDev #Wearables #AppFunctions #NanoBanana #GeminiOmni

19 min
28 mai

Google I/O 2026: GenUI, Glass, and Android XR

Allen and Noble are live from Google I/O! This episode breaks down the biggest keynote news: agentic coding in Search, the power of Generative UI, and the future of "intelligent eyewear." They share what these changes mean for the venerable Google Search, what works (and what doesn't) with the new Google Glass, and how Android XR fits into the picture. From wearable AI to interactive search, find out what's here and what's coming this fall. More info: * Agentic Coding in Search: https://blog.google/products-and-platforms/products/search/search-io-2026/#agentic-coding * Android XR Developer Program: https://developer.android.com/develop/xr/catalyst [00:00:00] Introduction from Google I/O [00:01:32] Agentic Coding in Google Search [00:04:00] Generative UI: Beyond the Chatbot [00:08:19] The Three Pillars: Models, Coding, and Agents [00:11:00] Intelligent Eyewear and the Return of Glass [00:13:09] Hands-on with the AI Sandbox [00:15:44] The Human Impact of Real-Time Translation [00:16:47] Android XR and the Developer Experience [00:18:36] Developer Opportunities and Early Access #GoogleIO #IO26 #AndroidXR #GeminiAI #GenerativeUI #GoogleGlass #IntelligentEyewear #GoogleSearch #AgenticAI #TechPodcast #TwoVoiceDevs #AI #IOCreatorStudio #GoogleForDevelopers Episode 272

20 min
14 mai

Live from Next 2026: The Year of the Agent

Allen and Alice are on the ground at Google Cloud Next, breaking down the biggest shifts in the AI landscape. This episode explores the transition from focusing on models to building agents with the launch of the Gemini Enterprise Agent Platform. They discuss the new TPU v8 hardware, the power of the Model Context Protocol (MCP) for Workspace integration, and how tools like Workspace Studio are making agent development accessible to everyone. Plus, a look at the incredible AI-powered Wizard of Oz experience at the Sphere! Timestamps: [00:00:12] Live from Day Two of Google Cloud Next [00:01:13] New Hardware: TPU v8 for Training and Inference [00:02:53] Gemini's Current State and Future Models [00:04:27] Vertex AI Rebrands as Gemini Enterprise Agent Platform [00:06:14] Building Reliable Agents: Identity, Registry, and Observability [00:07:18] Powering Agents with Model Context Protocol (MCP) [00:11:06] Workspace Studio: Automation for Everyone [00:15:00] Immersive Experiences at the Sphere [00:17:12] Final Thoughts and Where to Follow Hashtags: #GoogleCloudNext #Gemini #AIAgents #VertexAI #TPU #MCP #WorkspaceStudio #TwoVoiceDevs #GenAI #QueenOfSpreadsheets

18 min
5 mars

Episode 270 - Beyond the Big Three: Open Models, Agents, & the Future of Devs

In part two of this insightful conversation, Allen and Sam Witteveen dive deep into the rapidly expanding world of AI models beyond the "big three." They explore the impact of open-weight and Chinese models like DeepSeek, Mistral, and Qwen, discussing their impressive efficiency and coding capabilities. The conversation shifts to the rise of agentic workflows and how tools like Claude Code are fundamentally changing the day-to-day lives of developers. They also tackle the tough questions: Are junior developers being replaced? Is AI just the next level of abstraction in programming? Finally, they cover the enterprise side of AI, from on-premise deployments to the evolving landscape of prompt engineering and observability frameworks like LangChain. Timestamps: [00:00:00] Introduction [00:00:49] Exploring Open Weights and Chinese Models [00:03:41] The Value of "Thinking" Models and Distillation [00:06:41] Running Models Locally [00:08:34] The Shift Towards Agentic Workflows [00:12:17] How AI is Changing the Role of Developers [00:29:04] AI as the Next Level of Abstraction [00:35:00] Best Models for Tool Calling and Coding [00:39:04] On-Premise Models and Enterprise Solutions [00:44:49] The Future of Prompt Engineering and LangChain [00:48:37] Outro and Where to Find Sam Hashtags: #TwoVoiceDevs #AI #OpenWeights #DeepSeek #Mistral #Qwen #ClaudeCode #Gemini #LangChain #SoftwareEngineering #AgenticAI #MachineLearning

49 min
3 mars

Episode 269 - The "Big Three" AI Models and Training Evolution

In Part 1 of a two-part series, guest host Sam Witteveen joins Allen to catch up and dive deep into the rapidly evolving world of AI models. Sam shares his fascinating journey from being a successful pop songwriter to becoming a Machine Learning Google Developer Expert (GDE) and running the massive Machine Learning Singapore meetup. The conversation shifts to the latest AI developments, exploring the "Big Three" model builders—Anthropic, OpenAI, and Google. Sam and Allen discuss the frenetic pace of new model releases, changes to the Gemini 3 API, and how developers navigate the trade-offs between intelligence, latency, and cost. Finally, they pull back the curtain on how these models are actually trained today. Discover why models are no longer trying to be "fact machines" and how post-training breakthroughs, code execution sandboxes, and Reinforcement Learning (RL) environments are dramatically improving AI capabilities. Stay tuned for the end of the episode, where they hint at what's coming in Part 2! Timestamps: [00:00:00] Introduction and catching up [00:01:33] Sam's fascinating journey from pop music to machine learning [00:05:23] Running the massive Machine Learning Singapore meetup [00:07:42] Stumbling into YouTube and teaching AI with Google Colab [00:12:38] Analyzing the "Big Three" AI models and rapid release cycles [00:17:52] Gemini 3 API updates, Flash models, and thinking levels [00:22:00] Tool use, knowledge cutoffs, and why LLMs aren't fact machines [00:26:00] How post-training and code sandboxes revolutionized AI [00:32:00] Scaling Reinforcement Learning (RL) environments for design [00:34:04] Structured outputs and the return to predictable rules [00:36:43] Tune in next time for more! And where to find Sam online Hashtags: #TwoVoiceDevs #AI #MachineLearning #DeepLearning #LLM #GoogleGemini #Gemini #OpenAI #ChatGPT #Anthropic #Claude #ReinforcementLearning #RAG #Developers #SamWitteveen

38 min
19 févr.

Episode 268 - The New @langchain/google Package

Allen has been busy! This week, he unveils the new `@langchain/google` package for LangChain JS. This major update consolidates five previous libraries into a single, standardized, and powerful tool for developers working with Gemini and Vertex AI. Allen walks Mark through the motivation behind the change, the focus on backward compatibility, and the exciting new features like simplified multimodal input/output and text-to-speech support. If you're building with Google AI and JavaScript, this is the update you've been waiting for. [00:00:57] The confusion of previous packages [00:02:52] Creating a unified package [00:03:45] Introducing @langchain/google [00:04:35] Backward compatibility [00:06:48] Multimodal inputs [00:07:54] Standardizing output and image generation [00:08:58] Text-to-Speech support [00:11:29] Simplifying parameters and reasoning [00:14:55] Future roadmap #LangChain #Gemini #NanoBanana #TextToSpeech #GoogleAI #JavaScript #TypeScript #VertexAI #OpenSource #AI #WebDevelopment #TwoVoiceDevs

18 min

Tout afficher (276)

Bande-annonce

Episode 10 Teaser

We have something special planned for our 10th episode! Curious what it might be?

S1, É10

•

29 s

sur 5

2 notes

No langchain

10/04/2023

soslooooooooow

Reading the readme may be better use of your time if you want to learn about langchain

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

Création

Mark and Allen
Années d’activité

2020 - 2026
Épisodes

276
Classification

Tous publics
Site web de l’émission

Two Voice Devs

Two Voice Devs

Set the scene with Gemini TTS

Project Solara: Welcome to Agent-First Hardware

New Horizons for Android: XR, MCP, and Agents

Google I/O 2026: GenUI, Glass, and Android XR

Live from Next 2026: The Year of the Agent

Episode 270 - Beyond the Big Three: Open Models, Agents, & the Future of Devs

Episode 269 - The "Big Three" AI Models and Training Evolution

Episode 268 - The New @langchain/google Package

Bande-annonce

Episode 10 Teaser

Notes et avis

No langchain

À propos

Informations

Two Voice Devs

Épisodes

Set the scene with Gemini TTS

Project Solara: Welcome to Agent-First Hardware

New Horizons for Android: XR, MCP, and Agents

Google I/O 2026: GenUI, Glass, and Android XR

Live from Next 2026: The Year of the Agent

Episode 270 - Beyond the Big Three: Open Models, Agents, & the Future of Devs

Episode 269 - The "Big Three" AI Models and Training Evolution

Episode 268 - The New @langchain/google Package

Bande-annonce

Notes et avis

À propos

Informations