259 episodes

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Data Engineering Podcast Tobias Macey

    • Technology
    • 4.7 • 91 Ratings

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

    The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

    The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam

    Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish interface contracts for communicating across those boundaries. The recent move toward the data mesh as a formalized architecture that builds on this design provides the language that data teams need to make this a more organized effort. In this episode Abhi Sivasailam shares his experience designing and implementing a data mesh solution with his team at Flexport, and the importance of defining and enforcing data contracts that are implemented at those domain boundaries.

    • 56 min
    Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

    Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig

    Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate their careers. Ashish Mrig currently leads the data analytics platform for Wayfair, as well as running a local data engineering meetup. In this episode he shares his career journey, the challenges related to management of data professionals, and the platform design that he and his team have built to power analytics at a large company. He also provides some excellent insights into the factors that play into the build vs. buy decision at different organizational sizes.

    • 52 min
    Automated Data Quality Management Through Machine Learning With Anomalo

    Automated Data Quality Management Through Machine Learning With Anomalo

    Data quality control is a requirement for being able to trust the various reports and machine learning models that are relying on the information that you curate. Rules based systems are useful for validating known requirements, but with the scale and complexity of data in modern organizations it is impractical, and often impossible, to manually create rules for all potential errors. The team at Anomalo are building a machine learning powered platform for identifying and alerting on anomalous and invalid changes in your data so that you aren't flying blind. In this episode founders Elliot Shmukler and Jeremy Stanley explain how they have architected the system to work with your data warehouse and let you know about the critical issues hiding in your data without overwhelming you with alerts.

    • 1 hr 2 min
    An Introduction To Data And Analytics Engineering For Non-Programmers

    An Introduction To Data And Analytics Engineering For Non-Programmers

    Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now it is being used to power consumer facing services, influence organizational behaviors, and build sophisticated machine learning systems. Given this increased level of importance it has become necessary for everyone in the business to treat data as a product in the same way that software applications have driven the early 2000s. In this episode Brian McMillan shares his work on the book "Building Data Products" and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed.

    • 50 min
    Open Source Reverse ETL For Everyone With Grouparoo

    Open Source Reverse ETL For Everyone With Grouparoo

    Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own implementation of it. While struggling with the work of automating data integration workflows with marketing, sales, and support tools Brian Leonard accidentally discovered this need himself and turned it into the open source framework Grouparoo. In this episode he explains why he decided to turn these efforts into an open core business, how the platform is implemented, and the benefits of having an open source contender in the landscape of operational analytics products.

    • 44 min
    Data Observability Out Of The Box With Metaplane

    Data Observability Out Of The Box With Metaplane

    Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used so that you can proactively identify and fix errors in your workflows. In this episode Metaplane founder Kevin Hu shares his working definition of the term and explains the work that he and his team are doing to cut down on the time to adoption for this new set of practices. He discusses the factors that influenced his decision to start with the data warehouse, the potential shortcomings of that approach, and where he plans to go from there. This is a great exploration of what it means to treat your data platform as a living system and apply state of the art engineering to it.

    • 50 min

Customer Reviews

4.7 out of 5
91 Ratings

91 Ratings

ASobering ,

Such a wealth of knowledge! 🧠

Got a question about anything “data engineering?”

Tobias has got you covered. 😎

Whether you’re well established as an engineer, or just getting started in your career, this is a must-listen podcast for you! Tobias does an incredible job leading engaging conversations with industry leaders who’ve actually experienced success themselves and every. single. episode. is jam-packed with helpful takeaways. Highly recommend listening and subscribing!

lixja ,

Niche

This podcast makes me feel sane when I’m working late running spark sql queries like a monkey

N Thalanki ,

Sets the standard for ALL data podcasts

I have been hooked on this podcast. Tobias is phenomenal. Most data professionals such as company founders are astoundingly poor at explaining what they do. Tobias skillfully extracts this by asking great questions.

His ability to ask great questions and probe deep is what sets him apart.

I have become disillusioned with podcasts like DM Radio or the Architect Show. The hosts are lazy and don’t probe. They are happy to rehash marketing trope about zeta bytes of data, about how data is the new oil etc. but leave us with no new insights.

Top Podcasts In Technology

Lex Fridman
Jason Calacanis
Tristan Harris and Aza Raskin, The Center for Humane Technology
NPR
Jack Rhysider
Jason Calacanis

You Might Also Like

Tobias Macey
Software Engineering Daily
Michael Kennedy (@mkennedy)
Michael Kennedy and Brian Okken
Real Python
se-radio@computer.org