1 hr 8 min

Suhail Doshi: The Future of Computer Vision The Gradient: Perspectives on AI

- Technology

Episode 123
I spoke with Suhail Doshi about:
* Why benchmarks aren’t prepared for tomorrow’s AI models
* How he thinks about artists in a world with advanced AI tools
* Building a unified computer vision model that can generate, edit, and understand pixels.
Suhail is a software engineer and entrepreneur known for founding Mixpanel, Mighty Computing, and Playground AI (they’re hiring!).
Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (00:54) Ad read — MLOps conference
* (01:30) Suhail is *not* in pivot hell but he *is* all-in on 50% AI-generated music
* (03:45) AI and music, similarities to Playground
* (07:50) Skill vs. creative capacity in art
* (12:43) What we look for in music and art
* (15:30) Enabling creative expression
* (18:22) Building a unified computer vision model, underinvestment in computer vision
* (23:14) Enhancing the aesthetic quality of images: color and contrast, benchmarks vs user desires
* (29:05) “Benchmarks are not prepared for how powerful these models will become”
* (31:56) Personalized models and personalized benchmarks
* (36:39) Engaging users and benchmark development
* (39:27) What a foundation model for graphics requires
* (45:33) Text-to-image is insufficient
* (46:38) DALL-E 2 and Imagen comparisons, FID
* (49:40) Compositionality
* (50:37) Why Playground focuses on images vs. 3d, video, etc.
* (54:11) Open source and Playground’s strategy
* (57:18) When to stop open-sourcing?
* (1:03:38) Suhail’s thoughts on AGI discourse
* (1:07:56) Outro
Links:
* Playground homepage
* Suhail on Twitter

Get full access to The Gradient at thegradientpub.substack.com/subscribe