How AI Is Built

#055 Embedding Intelligence: AI's Move to the Edge

Nicolay here,

while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.

Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.

His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.

Key Insight: The Real World Action Gap

LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.

This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.

Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.

💡 Core Concepts

Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification

ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio

Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains

⏱ Important Moments

Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control

Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations

Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information

Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands

8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression

On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems

🛠 Tools & Tech

Whisper: github.com/openai/whisper

Moonshine: github.com/usefulsensors/moonshine

TinyML Book: oreilly.com/library/view/tinyml/9781492052036

Stanford Edge ML: github.com/petewarden/stanford-edge-ml

📚 Resources

Looking to Listen Paper: looking-to-listen.github.io

Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635

Connect: pete@usefulsensors.com | petewarden.com | usefulsensors.com

Beta Opportunity: Moonshine browser implementation for client-side speech processing in

JavaScript