Shopify’s Journey to Planet-Scale Observability - OpenObservability Talks S5E09

OpenObservability Talks

Shopify operates at massive scale, running thousands of services and processing billions of events per second. To tackle the challenges of observability at this scale, they built Observe—an in-house observability stack that makes use of open-source tools and specifications. In fact, they replaced an older vendors-based system, in an awe-inspiring migration project. But why build their own stack? Which open source tools did they use? How did they shape the user experience to their needs?

Joining us to unpack Shopify’s journey is Elijah McPherson, an engineering leader with deep expertise in observability and distributed systems. Elijah led the complete rebuild of Shopify’s observability stack and now also oversees jobs, caching, search, and ClickHouse infrastructure. Tune in to hear firsthand insights from one of the most innovative purpose-built observability implementations in production today!

The episode was live-streamed on 11 February 2025 and the video is available at https://www.youtube.com/watch?v=rBfTjlXKJW0

OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.

We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and chime in with your comments and questions on the live chat.

⁠⁠https://www.youtube.com/@openobservabilitytalks⁠  

https://www.twitch.tv/openobservability⁠

Show Notes:

00:46 - Episode and guest intro

03:43 - Why rebuild the observability stack in house

05:47 - Cost and vendor lock-in

07:09 - Tailoring observability for the organizational processes

10:27 - How to build a team to build in-house observability

13:37 - The importance of product sense in internal platforms

18:05 - The functionality of Shopify’s observability platform

25:15 - The Open Source stack used at Shopify observability

29:50 - Extending open source Grafana to Shopify’s needs

36:23 - Adopting open standards

42:26 - observability into business health

45:16 - how to run a migration project for a live production platform

53:15 - final tips and best practices

56:41 - which organizations should develop in-house observability

Resources:

  • Episode: Scaling Platform Engineering: Shopify’s Blueprint: https://medium.com/p/f18e97140681 

  • Shopify Observe - lectures: https://www.linkedin.com/posts/elijahmcpherson_observe-activity-7258195493657223168-mOGS/ 

Socials:

Twitter:⁠ https://twitter.com/OpenObserv⁠

YouTube: ⁠https://www.youtube.com/@openobservabilitytalks⁠

Dotan Horovits

============

Twitter:
@horovits

LinkedIn:
www.linkedin.com/in/horovits

Mastodon: @horovits@fosstodon

BlueSky: @horovits.bsky.social


Elijah McPherson

===============

Twitter: https://twitter.com/ElijahMcPherson

LinkedIn: https://www.linkedin.com/in/elijahmcpherson/

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada