156 episodes

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Data Engineering Podcast Tobias Macey

    • Technology
    • 4.3 • 6 Ratings

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

    Better Data Quality Through Observability With Monte Carlo

    Better Data Quality Through Observability With Monte Carlo

    In order for analytics and machine learning projects to be useful, they require a high degree of data quality. To ensure that your pipelines are healthy you need a way to make them observable. In this episode Barr Moses and Lior Gavish, co-founders of Monte Carlo, share the leading causes of what they refer to as data downtime and how it manifests. They also discuss methods for gaining visibility into the flow of data through your infrastructure, how to diagnose and prevent potential problems, and what they are building at Monte Carlo to help you maintain your data's uptime.

    • 55 min
    Rapid Delivery Of Business Intelligence Using Power BI

    Rapid Delivery Of Business Intelligence Using Power BI

    Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go from information to action by providing an interface that encourages rapid iteration. In this episode Rob Collie shares his enthusiasm for the Power BI platform and how it stands out from other options. He explains how he helped to build the platform during his time at Microsoft, and how he continues to support users through his work at Power Pivot Pro. Rob shares some useful insights gained through his consulting work, and why he considers Power BI to be the best option on the market today for business analytics.

    • 1 hr 2 min
    Self Service Real Time Data Integration Without The Headaches With Meroxa

    Self Service Real Time Data Integration Without The Headaches With Meroxa

    Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and up to date. Building a real-time pipeline for your data lakes and data warehouses is a non-trivial effort, requiring a substantial investment of time and energy. Meroxa is a new platform that aims to automate the heavy lifting of change data capture, monitoring, and data loading. In this episode founders DeVaris Brown and Ali Hamidi explain how their tenure at Heroku informed their approach to making data integration self service, how the platform is architected, and how they have designed their system to adapt to the continued evolution of the data ecosystem.

    • 1 hr
    Speed Up And Simplify Your Streaming Data Workloads With Red Panda

    Speed Up And Simplify Your Streaming Data Workloads With Red Panda

    Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine. In this episode he explains how they engineered a drop-in replacement for Kafka, replicating the numerous APIs, that can scale more easily and deliver consistently low latencies with a much lower hardware footprint. He also shares some of the areas of innovation that they have found to help foster the next wave of streaming applications while working within the constraints of the existing Kafka interfaces. This was a fascinating conversation with an energetic and enthusiastic engineer and founder about the challenges and opportunities in the realm of streaming data.

    • 59 min
    Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

    Cutting Through The Noise And Focusing On The Fundamentals Of Data Engineering With The Data Janitor

    Data engineering is a constantly growing and evolving discipline. There are always new tools, systems, and design patterns to learn, which leads to a great deal of confusion for newcomers. Daniel Molnar has dedicated his time to helping data professionals get back to basics through presentations at conferences and meetups, and with his most recent endeavor of building the Pipeline Data Engineering Academy. In this episode he shares advice on how to cut through the noise, which principles are foundational to building a successful career as a data engineer, and his approach to educating the next generation of data practitioners. This was a useful conversation for anyone working with data who has found themselves spending too much time chasing the latest trends and wishes to develop a more focused approach to their work.

    Distributed In Memory Processing And Streaming With Hazelcast

    Distributed In Memory Processing And Streaming With Hazelcast

    In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the Hazelcast team has also built a streaming platform for reliable high throughput data transmission. In this episode Dale Kim shares how Hazelcast is implemented, the use cases that it enables, and how it complements on-disk data management systems.

    • 44 min

Customer Reviews

4.3 out of 5
6 Ratings

6 Ratings

GreatStuff123 ,

The missing Data Engineering Podcast!

I just found out about this podcast while browsing Twitter and seeing that the host of another of my favourite podcasts (Tobias Macey from Podcast.__Init__) had a new podcast on data engineering.

With the demise of several older Hadoop podcasts and O'Reilley's more buisiness-focused data podcast, a new series like this one was sorely needed for discussions of current data architectures and pipelines.

Thanks and keep up the good work Tobias, I've already learned so much after binging the first several podcasts! Looking forward to the next interviews.

Top Podcasts In Technology

Listeners Also Subscribed To