27 min

DEW #131: dbt model contract, Instacart ads modularization in LakeHouse Architecture, Jira to automate Glue tables, Server-Side Tracking Data Engineering Weekly

    • Technology

Welcome to another episode of Data Engineering Weekly. Aswin and I select 3 to 4 articles from each edition of Data Engineering Weekly and discuss them from the author’s and our perspectives.

On DEW #131, we selected the following article



Ramon Marrero: DBT Model Contracts - Importance and Pitfalls

dbt introduces model contract with 1.5 release. There were a few critics of the dbt model implementation, such as The False Promise of dbt Contracts. I found the argument made in the false promise of the dbt contract surprising, especially the below comments.

As a model owner, if I change the columns or types in the SQL, it's usually intentional. - My immediate no reaction was, Hmm, Not really.

However, as with any initial system iteration, the dbt model contract implementation has pros and cons. I’m sure it will evolve as the adoption increases. The author did an amazing job writing a balanced view of dbt model contract.

https://medium.com/geekculture/dbt-model-contracts-importance-and-pitfalls-20b113358ad7



Instacart: How Instacart Ads Modularized Data Pipelines With Lakehouse Architecture and Spark

Instacart writes about its journey of building its ads measurement platform. A couple of thing stands out for me in the blog.


The Event store is moving from S3/ parquet storage to DeltaLake storage—a sign of LakeHouse format adoption across the board.


Instacart adoption of Databricks ecosystem along with Snowflake.


The move to rewrite SQL into a composable Spark SQL pipeline for better readability and testing.



https://tech.instacart.com/how-instacart-ads-modularized-data-pipelines-with-lakehouse-architecture-and-spark-e9863e28488d



Timo Dechau: The extensive guide for Server-Side Tracking

The blog is an excellent overview of server-side event tracking. The author highlights how the event tracking is always close to the UI flow than the business flow and all the possible things wrong with frontend event tracking. A must-read article if you’re passionate about event tracking like me.

https://hipsterdatastack.substack.com/p/the-extensive-guide-for-server-side



This Schema change could’ve been a JIRA ticket!!!

I found the article excellent workflow automation on top of the familiar ticketing system, JIRA. The blog narrates the challenges with Glue Crawler and how selectively applying the db changes management using JIRA help to overcome its technical debt of running 6+ hours custom crawler.

https://medium.com/credit-saison-india/using-jira-to-automate-updations-and-additions-of-glue-tables-58d39adf9940

Welcome to another episode of Data Engineering Weekly. Aswin and I select 3 to 4 articles from each edition of Data Engineering Weekly and discuss them from the author’s and our perspectives.

On DEW #131, we selected the following article



Ramon Marrero: DBT Model Contracts - Importance and Pitfalls

dbt introduces model contract with 1.5 release. There were a few critics of the dbt model implementation, such as The False Promise of dbt Contracts. I found the argument made in the false promise of the dbt contract surprising, especially the below comments.

As a model owner, if I change the columns or types in the SQL, it's usually intentional. - My immediate no reaction was, Hmm, Not really.

However, as with any initial system iteration, the dbt model contract implementation has pros and cons. I’m sure it will evolve as the adoption increases. The author did an amazing job writing a balanced view of dbt model contract.

https://medium.com/geekculture/dbt-model-contracts-importance-and-pitfalls-20b113358ad7



Instacart: How Instacart Ads Modularized Data Pipelines With Lakehouse Architecture and Spark

Instacart writes about its journey of building its ads measurement platform. A couple of thing stands out for me in the blog.


The Event store is moving from S3/ parquet storage to DeltaLake storage—a sign of LakeHouse format adoption across the board.


Instacart adoption of Databricks ecosystem along with Snowflake.


The move to rewrite SQL into a composable Spark SQL pipeline for better readability and testing.



https://tech.instacart.com/how-instacart-ads-modularized-data-pipelines-with-lakehouse-architecture-and-spark-e9863e28488d



Timo Dechau: The extensive guide for Server-Side Tracking

The blog is an excellent overview of server-side event tracking. The author highlights how the event tracking is always close to the UI flow than the business flow and all the possible things wrong with frontend event tracking. A must-read article if you’re passionate about event tracking like me.

https://hipsterdatastack.substack.com/p/the-extensive-guide-for-server-side



This Schema change could’ve been a JIRA ticket!!!

I found the article excellent workflow automation on top of the familiar ticketing system, JIRA. The blog narrates the challenges with Glue Crawler and how selectively applying the db changes management using JIRA help to overcome its technical debt of running 6+ hours custom crawler.

https://medium.com/credit-saison-india/using-jira-to-automate-updations-and-additions-of-glue-tables-58d39adf9940

27 min

Top Podcasts In Technology

Podcast o technologii
Kanał o technologii
Bo czemu nie?
Krzysztof Kołacz
Lex Fridman Podcast
Lex Fridman
Acquired
Ben Gilbert and David Rosenthal
Nadgryzieni - Rozmowy (nie tylko) o Tech
Wojtek Pietrusiewicz
Technologicznie
Pucek / Kuźniar • by Voice House