Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources.
What You’ll Learn:
- How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems.
- What the “Push-Down Permission Management” security model is and how to implement it across different data systems.
- How to maintain consistent governance across various query engines like Spark, Trino, and Flink.
- Why interoperability, flexibility and open source ecosystem are becoming an important dynamics of data infrastructure rather than performance benchmarking.
- How to evaluate new data tools based on their real-world adoption rather than the social media hype.
If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts instructions on how to do this here [insert link].
Lisa Cao is a Product Manager at DataStrato, specializing in AI/ML product partnerships and developer relations. With deep expertise in data catalog technologies and open-source ecosystems, she plays a key role in developing Apache Gravitino, an ASF incubating project that provides a unified governance and security layer for diverse data systems. Her work in developing extensible catalog frameworks has helped organizations manage complex data environments across multiple platforms.
Episode Highlights:
- What is Apache Gravitino? (01:24)
- Unifying AI/ML and Big Data Stack (03:15)
- Simplifying Data Governance (10:49)
- The Gravitino’s Query Engine Solution (21:34)
- Navigating the Fast-Paced World of Data Engineering (24:41)
Lisa talks about how fast the data engineering space is moving and shares some insights to catching up;
- Don’t try to learn everything at once.
- Don't get too deep into every tool
- Look for real-world adoption
She warns against the social media hype that can amplify the messaging around new tools, making it seem everyone is using it, when in reality, that can’t be easily seen.
Episode Resources:
- Apache Gravitino website
The Data Engineering Show is handcrafted by our friends over at: fame.so
Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.
Check out our three most downloaded episodes:
- Zach Wilson on What Makes a Great Data Engineer
- Joe Reis and Matt Housley on The Fundamentals of Data Engineering
- Bill Inmon, The Godfather of Data Warehousing
Information
- Show
- FrequencyMonthly
- Published8 April 2025 at 10:00 UTC
- Length24 min
- Episode41
- RatingClean