40 min

Distributed Computing In Python Made Easy With Ray The Python Podcast.init

- Technology

Summary
Distributed computing is a powerful tool for increasing the speed and performance of your applications, but it is also a complex and difficult undertaking. While performing research for his PhD, Robert Nishihara ran up against this reality. Rather than cobbling together another single purpose system, he built what ultimately became Ray to make scaling Python projects to multiple cores and across machines easy. In this episode he explains how Ray allows you to scale your code easily, how to use it in your own projects, and his ambitions to power the next wave of distributed systems at Anyscale. If you are running into scaling limitations in your Python projects for machine learning, scientific computing, or anything else, then give this a listen and then try it out!

Announcements

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, fast object storage, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
Your host as usual is Tobias Macey and today I’m interviewing Robert Nishihara about Ray, a framework for building and running distributed applications and machine learning

Interview

Introductions
How did you get introduced to Python?
Can you start by describing what Ray is and how the project got started?

How did the environment of the RISE lab factor into the early design and development of Ray?

What are some of the main use cases that you were initially targeting with Ray?

Now that it has been publicly available for some time, what are some of the ways that it is being used which you didn’t originally anticipate?

What are the limitations for the types of workloads that can be run with Ray, or any edge cases that developers should be aware of?
For someone who is building on top of ray, what is involved in either converting an existing application to take advantage of Ray’s parallelism, or creating a greenfield project with it?
Can you describe how Ray itself is implemented and how it has evolved since you first began working on it?
How does the clustering and task distriubtion mechanism in Ray work?
How does the increased parallelism that Ray offers help with machine learning workloads?

Are there any types of ML/AI that are easier to do in this context?

What are some of the additional layers or libraries that have been built on top of the functionality of Ray?
What are some of the most interesting, challenging, or complex aspects of building and maintaining Ray?
You and your co-founders recently announced the formation of Anyscale to support the future development of Ray. What is your business model and how are you approaching the governance of Ray and its ecosystem?
What are some of the most interesting or unexpected projects that you have seen built with Ray?
What are some cases where Ray is the wrong choice?
What do you have planned for the future of Ray and Anyscale?

Keep In Touch

Website
@robertnishihara on Twitter
robertnishihara on GitHub

Picks

Tobias

D&D Castle Ravenloft board game
One Deck Dungeon

Robert

The Everything Store

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned somethi