Running Airflow at the scale of a national retailer means more than just scheduling. It means giving non-engineers a path to ship DAGs, and classifying thousands of runs to know which ones need attention. In this episode, Mateus Ferreira, Senior Data Engineer at Luiza Labs (the technology arm of Magazine Luiza, one of Brazil's largest retailers), joins Marc to talk about the patterns his team uses to run 2,000+ Airflow pipelines across more than four petabytes of data. Key Takeaways: 00:00 Introduction01:11 Mateus introduces himself and Luiza Labs, the technology arm of Magazine Luiza (Magalu), one of Brazil's largest retailers (founded 1957). 1,000+ physical stores, multi-region operations, and a data team that has to handle the variability that comes with all of it.04:33 Lu Brain, Magalu's AI initiative built around their character Lu, and how AI fits into the data work.06:47 The data reliability engineering channel where AI summarizes Airflow errors with confidence scores and posts a suggested fix in chat.08:30 How Airflow became the heart of orchestration. Coming from Control-M in banking, then GCP, then consolidating on Cloud Composer to centralize roughly 2,000 pipelines.14:23 The YAML wrapper that lets non-engineers ship DAGs. Reads namespace, tables, and Spark options. Handles CDC, JDBC full, and JDBC incremental collection types with checkpoints. All changes go through data reliability engineering.17:20 Why metadata is the most valuable asset in the AI era, and how the wrapper makes data lineage observable across 2,000 pipelines.18:26 The Data Reliability Engineering team. A 10-person group that is the window to the company, handling maintenance, validation, corrections, and optimization for the business unit pipelines.20:09 Operating at four petabytes of data.21:24 Why they built custom Spark operators. Cost drove the move off the DataprocOperator. The custom operator exposes Spark driver and executor sizing as Airflow parameters and generates the Kubernetes manifest.24:36 The monitoring dashboard built on the Airflow metadata DB. A timeline view that shows how many DAGs run each hour, used to spread scheduling across the day.26:37 Classifying DAGs by their last five runs: success, partially correct, intermittent, total failure. A reusable observability pattern.29:57 How to reach Mateus, and a closing thought in Portuguese on appreciating the good old times while you are living them. Resources Mentioned: Apache Airflow (airflow.apache.org)Magalu Cloud / MGCLuiza Labs (luizalabs.com) and Magazine Luiza / MagaluAstro Observe (https://www.astronomer.io/product)Mateus Ferreira on LinkedIn (linkedin.com/in/mateusmferreira) Thanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow