121 Folgen

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Data Engineering Podcast Tobias Macey

    • Technologie

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

    Data Infrastructure Automation For Private SaaS At Snowplow - Episode 120

    Data Infrastructure Automation For Private SaaS At Snowplow - Episode 120

    One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Snowplow Analytics the complexity is compounded by the need to manage multiple instances of their platform across customer environments. In this episode Josh Beemster, the technical operations lead at Snowplow, explains how they manage automation, deployment, monitoring, scaling, and maintenance of their streaming analytics pipeline for event data. He also shares the challenges they face in supporting multiple cloud environments and the need to integrate with existing customer systems. If you are daunted by the needs of your data infrastructure then it's worth listening to how Josh and his team are approaching the problem.

    • 49 Min.
    Data Modeling That Evolves With Your Business Using Data Vault - Episode 119

    Data Modeling That Evolves With Your Business Using Data Vault - Episode 119

    Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling strategy that provides them with flexibility and speed. Data Vault is an approach that allows for evolving a data model in place without requiring destructive transformations and massive up front design to answer valuable questions. In this episode Kent Graziano shares his journey with data vault, explains how it allows for an agile approach to data warehousing, and explains the core principles of how to use it. If you're struggling with unwieldy dimensional models, slow moving projects, or challenges integrating new data sources then listen in on this conversation and then give data vault a try for yourself.

    • 1 Std. 6 Min.
    The Benefits And Challenges Of Building A Data Trust - Episode 118

    The Benefits And Challenges Of Building A Data Trust - Episode 118

    Every business collects data in some fashion, but sometimes the true value of the collected information only comes when it is combined with other data sources. Data trusts are a legal framework for allowing businesses to collaboratively pool their data. This allows the members of the trust to increase the value of their individual repositories and gain new insights which would otherwise require substantial effort in duplicating the data owned by their peers. In this episode Tom Plagge and Greg Mundy explain how the BrightHive platform serves to establish and maintain data trusts, the technical and organizational challenges they face, and the outcomes that they have witnessed. If you are curious about data sharing strategies or data collaboratives, then listen now to learn more!

    • 56 Min.
    Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117

    Pay Down Technical Debt In Your Data Pipeline With Great Expectations - Episode 117

    Data pipelines are complicated and business critical pieces of technical infrastructure. Unfortunately they are also complex and difficult to test, leading to a significant amount of technical debt which contributes to slower iteration cycles. In this episode James Campbell describes how he helped create the Great Expectations framework to help you gain control and confidence in your data delivery workflows, the challenges of validating and monitoring the quality and accuracy of your data, and how you can use it in your own environments to improve your ability to move fast.

    • 46 Min.
    Replatforming Production Dataflows - Episode 116

    Replatforming Production Dataflows - Episode 116

    Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering. This is an interesting discussion about the ongoing maintenance and decision making required to keep your business data up to date and accurate.

    • 39 Min.
    Planet Scale SQL For The New Generation Of Applications - Episode 115

    Planet Scale SQL For The New Generation Of Applications - Episode 115

    The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth. YugabyteDB is an open source database designed to support planet scale workloads with high data density and full ACID compliance. In this episode Karthik Ranganathan explains how Yugabyte is architected, their motivations for being fully open source, and how they simplify the process of scaling your application from greenfield to global.

    • 1 Std. 1 Min.

Kundenrezensionen

dätä ,

Interesting topics

Good source for everything data engineering related. The interviews provide helpful insights into the current tool landscape as well as best practices.

Top‑Podcasts in Technologie

Zuhörer haben auch Folgendes abonniert: