This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
The Importance Of Data Contracts As The Interface For Data Integration With Abhi Sivasailam
Data platforms are exemplified by a complex set of connections that are subject to a set of constantly evolving requirements. In order to make this a tractable problem it is necessary to define boundaries for communication between concerns, which brings with it the need to establish interface contracts for communicating across those boundaries. The recent move toward the data mesh as a formalized architecture that builds on this design provides the language that data teams need to make this a more organized effort. In this episode Abhi Sivasailam shares his experience designing and implementing a data mesh solution with his team at Flexport, and the importance of defining and enforcing data contracts that are implemented at those domain boundaries.
Building And Managing Data Teams And Data Platforms In Large Organizations With Ashish Mrig
Data engineering is a relatively young and rapidly expanding field, with practitioners having a wide array of experiences as they navigate their careers. Ashish Mrig currently leads the data analytics platform for Wayfair, as well as running a local data engineering meetup. In this episode he shares his career journey, the challenges related to management of data professionals, and the platform design that he and his team have built to power analytics at a large company. He also provides some excellent insights into the factors that play into the build vs. buy decision at different organizational sizes.
Automated Data Quality Management Through Machine Learning With Anomalo
Data quality control is a requirement for being able to trust the various reports and machine learning models that are relying on the information that you curate. Rules based systems are useful for validating known requirements, but with the scale and complexity of data in modern organizations it is impractical, and often impossible, to manually create rules for all potential errors. The team at Anomalo are building a machine learning powered platform for identifying and alerting on anomalous and invalid changes in your data so that you aren't flying blind. In this episode founders Elliot Shmukler and Jeremy Stanley explain how they have architected the system to work with your data warehouse and let you know about the critical issues hiding in your data without overwhelming you with alerts.
An Introduction To Data And Analytics Engineering For Non-Programmers
Applications of data have grown well beyond the venerable business intelligence dashboards that organizations have relied on for decades. Now it is being used to power consumer facing services, influence organizational behaviors, and build sophisticated machine learning systems. Given this increased level of importance it has become necessary for everyone in the business to treat data as a product in the same way that software applications have driven the early 2000s. In this episode Brian McMillan shares his work on the book "Building Data Products" and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed.
Open Source Reverse ETL For Everyone With Grouparoo
Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own implementation of it. While struggling with the work of automating data integration workflows with marketing, sales, and support tools Brian Leonard accidentally discovered this need himself and turned it into the open source framework Grouparoo. In this episode he explains why he decided to turn these efforts into an open core business, how the platform is implemented, and the benefits of having an open source contender in the landscape of operational analytics products.
Data Observability Out Of The Box With Metaplane
Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used so that you can proactively identify and fix errors in your workflows. In this episode Metaplane founder Kevin Hu shares his working definition of the term and explains the work that he and his team are doing to cut down on the time to adoption for this new set of practices. He discusses the factors that influenced his decision to start with the data warehouse, the potential shortcomings of that approach, and where he plans to go from there. This is a great exploration of what it means to treat your data platform as a living system and apply state of the art engineering to it.