Talk Python To Me

Parallel Python at Anyscale with Ray

When OpenAI trained GPT-3, they didn't roll their own orchestration layer. They used Ray, an open source Python framework born out of the same Berkeley research lab lineage that gave us Apache Spark. And here's the twist: Ray was originally built for reinforcement learning research, then quietly faded as RL hit a wall. Until ChatGPT showed up. Suddenly reinforcement learning was back, as the post-training step that turns a raw language model into something genuinely useful.

Edward Oakes and Richard Liaw, two founding engineers behind Ray and Anyscale, join me on Talk Python to tell that story. We'll trace Ray from its RISE Lab origins at UC Berkeley to powering some of the largest training runs in the world. We'll talk about what Ray actually is, a distributed execution engine for AI workloads, and how a few lines of Python become work running across hundreds of GPUs. We'll cover Ray Data for multimodal pipelines, the dashboard, the VS Code remote debugger, KubRay for Kubernetes, and where Ray fits alongside Dask, multiprocessing, and asyncio.

If you've ever stared at a single-machine Python script and thought, "there has to be a better way to scale this", this one's for you

Episode sponsors

Sentry Error Monitoring, Code talkpython26
AgentField AI
Talk Python Courses

Links from the show

Guests
Richard Liaw: github.com
Edward Oakes: github.com

Ray: www.ray.io
Example code (we used for walk-through): docs.ray.io
Getting Started with Ray: docs.ray.io
Ray Libraries: docs.ray.io
kuberay: github.com

Watch this episode on YouTube: youtube.com
Episode #547 deep-dive: talkpython.fm/547
Episode transcripts: talkpython.fm

Theme Song: Developer Rap
🥁 Served in a Flask 🎸: talkpython.fm/flasksong

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy