50 min

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh Data Engineering Podcast

    • Teknologi

Summary

Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.


Announcements


Hello and welcome to the Data Engineering Podcast, the show about modern data management
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack-
Your host is Tobias Macey and today I'm interviewing Toby Mao about SQLMesh, an open source DataOps framework designed to scale data transformations with ease of collaboration and validation built in


Interview


Introduction
How did you get involved in the area of data management?
Can you describe what SQLMesh is and the story behind it?


DataOps is a term that has been co-opted and overloaded. What are the concepts that you are trying to convey with that term in the context of SQLMesh?

What are the rough edges in existing toolchains/workflows that you are trying to address with SQLMesh?


How do those rough edges impact the productivity and effectiveness of teams using those

Can you describe how SQLMesh is implemented?


How have the design and goals evolved since you first started working on it?

What are the lessons that you have learned from dbt which have informed the design and functionality of SQLMesh?
For teams who have already invested in dbt, what is the migration path from or integration with dbt?
You have some built-in integration with/awareness of orchestrators (currently Airflow). What are the benefits of making the transformation tool aware of the orchestrator?
What do you see as the potential benefits of integration with e.g. data-diff?
What are the second-order benefits of using a tool such as SQLMesh that addresses the more mechanical aspects of managing transformation workfows and the associated dependency chains?
What are the most interesting, innovative, or unexpected ways that you have seen SQLMesh used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on SQLMesh?
When is SQLMesh the wrong choice?
What do you have planned for the future of SQLMesh?


Contact Info


tobymao on GitHub
@captaintobs on Twitter
Website


Parting Question


From your perspective, what is the biggest gap in the tooling or technology for data management today?


Closing Announcements


Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers


Links


SQLMesh
Tobiko Data
SAS
AirBnB Minerva
SQLGlot
Cron
AST == Abstract Syntax Tree
Pandas
Terraform
dbt


Podcast Episode

SQLFluff


Podcast.__init__ Episode



The intro and outro music is from The Hug by The Freak Fandango

Summary

Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.


Announcements


Hello and welcome to the Data Engineering Podcast, the show about modern data management
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack-
Your host is Tobias Macey and today I'm interviewing Toby Mao about SQLMesh, an open source DataOps framework designed to scale data transformations with ease of collaboration and validation built in


Interview


Introduction
How did you get involved in the area of data management?
Can you describe what SQLMesh is and the story behind it?


DataOps is a term that has been co-opted and overloaded. What are the concepts that you are trying to convey with that term in the context of SQLMesh?

What are the rough edges in existing toolchains/workflows that you are trying to address with SQLMesh?


How do those rough edges impact the productivity and effectiveness of teams using those

Can you describe how SQLMesh is implemented?


How have the design and goals evolved since you first started working on it?

What are the lessons that you have learned from dbt which have informed the design and functionality of SQLMesh?
For teams who have already invested in dbt, what is the migration path from or integration with dbt?
You have some built-in integration with/awareness of orchestrators (currently Airflow). What are the benefits of making the transformation tool aware of the orchestrator?
What do you see as the potential benefits of integration with e.g. data-diff?
What are the second-order benefits of using a tool such as SQLMesh that addresses the more mechanical aspects of managing transformation workfows and the associated dependency chains?
What are the most interesting, innovative, or unexpected ways that you have seen SQLMesh used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on SQLMesh?
When is SQLMesh the wrong choice?
What do you have planned for the future of SQLMesh?


Contact Info


tobymao on GitHub
@captaintobs on Twitter
Website


Parting Question


From your perspective, what is the biggest gap in the tooling or technology for data management today?


Closing Announcements


Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers


Links


SQLMesh
Tobiko Data
SAS
AirBnB Minerva
SQLGlot
Cron
AST == Abstract Syntax Tree
Pandas
Terraform
dbt


Podcast Episode

SQLFluff


Podcast.__init__ Episode



The intro and outro music is from The Hug by The Freak Fandango

50 min

Mest populära poddar inom Teknologi

Acquired
Ben Gilbert and David Rosenthal
Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
SvD Tech brief
Svenska Dagbladet
Darknet Diaries
Jack Rhysider
Hard Fork
The New York Times