Fresh From the Labs

Pioneer Square Labs

Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.

  1. AUG 19

    Rust, Rollouts & Reality Checks: GPT-5’s Bumpy Debut, Agentic Browsers, and 95% Pilot Flops

    This week on Fresh from the Labs, Shilpa, Kevin, and Jared kick things off with a conversion: Kevin has officially joined the Rust cult (beard pending). From there, we dive into three big themes shaping how builders actually ship with AI right now: GPT-5 in the wild. A launch that felt…bumpy. We unpack autorouter misfires, sudden model deprecations, and why prompting matters more than ever with thinking/verbosity “knobs.” Kevin compares day-to-day coding performance against Anthropic’s Opus 4.1 and Claude Code—great for devs, less magical for non-technical workflows. Agentic browsing vs. reality. Perplexity’s eyebrow-raising $35B Chrome bid sparks a broader debate: is the future in a browser or the OS? Jared’s two-week test drive of Comet delivered slick automations (cart-filling errands) but clashed with classic “just let me Google docs” moments, plus awkward multi-account gaps. We talk cryptographic request signing for agents, potential micro-payments to publishers, and why an “MCP upgrade” path could beat brittle click-automation. Enterprise truth serum. A new MIT study claims ~95% of corporate GenAI pilots fail. We break down the why: chained-probability error rates, tool-calling flakiness, procurement drag, and shadow usage of public chatbots. The near-term win? Multiplicative copilots in the tools people already live in (think Microsoft Copilot) over moonshot agents. Plus: glamping on Orcas Island, off-grid backpacking, and squeezing in those late-summer bike rides. Tune in for hard-earned takes on what’s hype, what’s here, and what’s next.

    39 min
  2. GPT-5 Emergency Podcast! GPT‑5 First Impressions, Opus 4.1’s Reality Check, and Windsurf’s Culture Clash

    AUG 7

    GPT-5 Emergency Podcast! GPT‑5 First Impressions, Opus 4.1’s Reality Check, and Windsurf’s Culture Clash

    (created with GPT-5...) We hit record three hours after OpenAI’s GPT‑5 live stream, with a shoutout to former PSL engineer Brian Fioca on stage, and dive straight into the big takeaway: daily‑driver frontier at a non‑frontier price. Benchmarks look great, but vibes and real‑world performance steal the show—agentic and MMLU jumps, faster tokens, a huge context window that keeps long tasks on track, and noticeably fewer hallucinations. The headline isn’t just capability; it’s capability per dollar. If ChatGPT’s ~700M users wake up to a default upgrade that’s cheaper than prior flagships, the experience gap for “most people” collapses overnight. We go hands‑on with coding flows that actually changed how we work this week. In Cursor, GPT‑5 felt stateful without much prompt fuss—scratchpads, self‑tracking, and fewer guard‑rail tangles. Kevin details a zero‑to‑working‑product sprint: queue a night’s worth of tasks, have the model generate an implementation guide, a to‑do plan, and a smoke‑test checklist per epic, then wake up to a massive PR across a Rust backend and React desktop app. The new loop is “send, sleep, review, retry”—and now it’s fast and cheap enough to do repeatedly. We also run a pragmatic face‑off. Opus 4.1 is impressive, but slow and pricey. GPT‑5 felt snappier, delivered stronger results on the same tasks, and hit a price point that makes “try again” affordable. That flips the calculus: unless a model is meaningfully better, cost drags utility, and “intelligence per dollar” becomes the metric that matters. For the first time in a while, a frontier model is both top‑tier and broadly economical. Beyond OpenAI, the release flood was real: a fresh wave of open‑source options you can run locally (pay once for hardware; inference becomes electricity), Google’s Jules CLI assistant hitting GA, and rumblings of a Cursor CLI. Plus a wow‑moment from Google’s Genie 3: real‑time, prompt‑to‑play worlds with minute‑scale coherence and persistence—paint stays on the wall when you return, water splashes look eerily physical, and the “inception” demo hints at interfaces well beyond chat. Then the Windsurf saga’s latest twist. After the OpenAI deal reportedly cratered over Microsoft data‑access sensitivities, Google scooped leadership and IP, Cognition acquired the remainder and accelerated equity—and then publicly set a “80 hours, six days” cultural bar, offering nine months’ severance for those opting out. We debate radical candor versus needless PR self‑owns, and what this roller coaster signals for employee expectations, acqui‑hire risk, option value, and term‑sheet protections in a tightening talent market.

    31 min
  3. JUL 23

    Windsurf Whiplash: Google’s Talent Grab, Cognition’s Clean-Up, and AI’s Math Gold

    In this episode, Shilpa, Kevin, and Jared unpack the turbulent sequence of events around Windsurf, first the reported collapse of an anticipated OpenAI deal amid Microsoft data-access sensitivities, then Google’s selective leadership and IP licensing move, and finally Cognition’s rapid-fire acquisition of the remaining team. We explore what this “strategic decapitation” style of transaction signals for employee expectations, option value, and emerging term sheet protections, while carefully distinguishing between reported facts and interpretation. That segues into a broader look at the tightening AI talent market: how corporate constraints, antitrust scrutiny, and a narrowed IPO window reshape exit math; why Section 174’s restoration (allowing immediate expensing of U.S. R&D again) may advantage deep-pocketed incumbents; and the downstream implications for early-stage hiring, compensation, and founder signaling. We then analyze the milestone of OpenAI’s and Google’s models achieving International Math Olympiad gold-level performance, what’s genuinely new (end-to-end natural language reasoning, multi-hour coherence, emergent refusal when out of depth), and why selective abstention may matter as much as raw accuracy for hallucination reduction. The team examines long-horizon “think + tool” workflows, shrinking core model footprints via externalized fact retrieval, and how product interfaces must evolve beyond chat metaphors as reasoning spans hours. Finally, we reflect on what Olympiad-grade reasoning might mean for education, assessment integrity, and the division of cognitive labor, plus a lighter close: Millennium Force at Cedar Point, a reclaimed planer from a Rotary sale, and a standout Seattle donut run. A dense week of AI strategy, economics, and capability inflection, packaged for builders calibrating what’s signal versus noise. Tune in and tell us where you think the next structural shift lands.

    41 min
  4. JUN 16

    AI Editor Wars: Windsurf vs. Cursor, Anthropic's Power Play, and The "Gladys" Experiment

    This week on Fresh From the Labs, Shilpa, Kevin, and Jared dive into the swirling drama and strategic plays shaping the AI landscape. First up, the Windsurf and Anthropic saga: with rumors of an OpenAI acquisition of Windsurf, Anthropic has reportedly given Windsurf just five days to remove all Anthropic models. We unpack the implications of this aggressive move, from competitive intelligence concerns (is OpenAI training on Anthropic's models via Windsurf?) to the potential for a fragmented AI editor market where model access becomes a key differentiator, potentially harming developers and open source. The conversation then shifts to the evolving user experience (UX) for AI products. Kevin shares his experiences building a voice-first email agent, highlighting the fine line between a magical, futuristic experience and a frustrating one, and the surprisingly good performance of OpenAI's latest real-time voice API. Jared introduces his "Gladys pattern" experiment – building AI systems where agents think they are emailing each other to manage tasks. This explores a UX beyond chat, aiming for "ambient agents" that work implicitly, and we discuss the fascinating (and flowery) system prompts that bring these agentic personalities to life, plus the challenges of evaluating such behavioral systems. Finally, we touch on OpenAI's new meeting recording and summarization functionality built directly into ChatGPT. While a convenient feature for users, it underscores OpenAI's relentless push into product territory, raising questions about the future of standalone SaaS tools as AI platforms consolidate interfaces and subsume niche functionalities.

    51 min

Ratings & Reviews

5
out of 5
4 Ratings

About

Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.

You Might Also Like