Shopify’s Journey to Planet-Scale Observability - OpenObservability Talks S5E09

Shopify operates at massive scale, running thousands of services and processing billions of events per second. To tackle the challenges of observability at this scale, they built Observe—an in-house observability stack that makes use of open-source tools and specifications. In fact, they replaced an older vendors-based system, in an awe-inspiring migration project. But why build their own stack? Which open source tools did they use? How did they shape the user experience to their needs?
Joining us to unpack Shopify’s journey is Elijah McPherson, an engineering leader with deep expertise in observability and distributed systems. Elijah led the complete rebuild of Shopify’s observability stack and now also oversees jobs, caching, search, and ClickHouse infrastructure. Tune in to hear firsthand insights from one of the most innovative purpose-built observability implementations in production today!
The episode was live-streamed on 11 February 2025 and the video is available at https://www.youtube.com/watch?v=rBfTjlXKJW0
OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.
We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and chime in with your comments and questions on the live chat.
https://www.youtube.com/@openobservabilitytalks
https://www.twitch.tv/openobservability
Show Notes:
00:46 - Episode and guest intro
03:43 - Why rebuild the observability stack in house
05:47 - Cost and vendor lock-in
07:09 - Tailoring observability for the organizational processes
10:27 - How to build a team to build in-house observability
13:37 - The importance of product sense in internal platforms
18:05 - The functionality of Shopify’s observability platform
25:15 - The Open Source stack used at Shopify observability
29:50 - Extending open source Grafana to Shopify’s needs
36:23 - Adopting open standards
42:26 - observability into business health
45:16 - how to run a migration project for a live production platform
53:15 - final tips and best practices
56:41 - which organizations should develop in-house observability
Resources:
Episode: Scaling Platform Engineering: Shopify’s Blueprint: https://medium.com/p/f18e97140681
Shopify Observe - lectures: https://www.linkedin.com/posts/elijahmcpherson_observe-activity-7258195493657223168-mOGS/
Socials:
Twitter: https://twitter.com/OpenObserv
YouTube: https://www.youtube.com/@openobservabilitytalks
Dotan Horovits
============
Twitter:
@horovits
LinkedIn:
www.linkedin.com/in/horovits
Mastodon: @horovits@fosstodon
BlueSky: @horovits.bsky.social
Elijah McPherson
===============
Twitter: https://twitter.com/ElijahMcPherson
LinkedIn: https://www.linkedin.com/in/elijahmcpherson/
Information
- Show
- FrequencyMonthly
- Published27 February 2025 at 06:00 UTC
- Length1 hr
- Season5
- Episode9
- RatingClean