118 episodes

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Data Engineering Podcast Tobias Macey

    • Technology

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

    Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117

    Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117

    Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to test, leading to a significant amount of technical debt which contributes to slower iteration cycles. In this episode James Campbell describes how he helped create the Great Expectations framework to help you gain control and confidence in your data delivery workflows, the challenges of validating and monitoring the quality and accuracy of your data, and how you can use it in your own environments to improve your ability to move fast.

    • 46 min
    Replatforming Production Dataflows - Episode 116

    Replatforming Production Dataflows - Episode 116

    Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering. This is an interesting discussion about the ongoing maintenance and decision making required to keep your business data up to date and accurate.

    • 39 min
    Planet Scale SQL For The New Generation Of Applications - Episode 115

    Planet Scale SQL For The New Generation Of Applications - Episode 115

    The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. In this episode Karthik Ranganathan explains how Yugabyte is architected, their motivations for being fully open source, and how they simplify the process of scaling your application from greenfield to global.

    • 1 hr 1 min
    Change Data Capture For All Of Your Databases With Debezium - Episode 114

    Change Data Capture For All Of Your Databases With Debezium - Episode 114

    Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a way to track changes as they happen. Debezium is an open source platform for reliable change data capture that you can use to build supplemental systems for everything from maintaining audit trails to real-time updates of your data warehouse. In this episode Gunnar Morling and Randall Hauch explain why it got started, how it works, and some of the myriad ways that you can use it. If you have ever struggled with implementing your own change data capture pipeline, or understanding when it would be useful then this episode is for you.

    • 53 min
    Building The DataDog Platform For Processing Timeseries Data At Massive Scale - Episode 113

    Building The DataDog Platform For Processing Timeseries Data At Massive Scale - Episode 113

    DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim Semenov works on their data engineering team and joins the podcast in this episode to discuss the challenges that he works through, the systems that DataDog has built to power their business, and how their teams are organized to allow for rapid growth and massive scale. Getting an inside look at the companies behind the services we use is always useful, and this conversation was no exception.

    • 45 min
    Building The Materialize Engine For Interactive Streaming Analytics In SQL - Episode 112

    Building The Materialize Engine For Interactive Streaming Analytics In SQL - Episode 112

    Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for fast reads on quickly changing data. To address that need more completely the team at Materialize has created an engine that allows for building queryable views of your data as it is continually updated from the stream of changes being generated by your applications. In this episode Frank McSherry, chief scientist of Materialize, explains why it was created, what use cases it enables, and how it works to provide fast queries on continually updated data.

    • 48 min

Customer Reviews

Anantv ,

Interesting topics

In depth discussion on data engineering topics. I liked the server-less pipelines podcast a lot. Very insightful.

Minnow61 ,

Great useful info

Tobias has so many great guests talking about relevant topics. I lead a data engineering team, and have used principle from this podcast many times in my everyday work.

Dmitry212 ,

Great topics and vendors

Great in depth interviews in the data and analytics space!

Top Podcasts In Technology

Listeners Also Subscribed To