The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. The Future of Airflow Telemetry at Metyis with Bolke de Bruin

    4 DAYS AGO

    The Future of Airflow Telemetry at Metyis with Bolke de Bruin

    Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust.  In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right. Key Takeaways: (03:20) The role of foundations in establishing credibility and sustainability. (04:52) Why data collection is critical to open-source project direction. (07:24) Lessons learned from previous approaches to user data collection. (10:23) The current state of telemetry in the project. (10:53) Community trust as a prerequisite for technical implementation. (12:54) The importance of managing sensitive data within trusted ecosystems. (16:37) Ethical considerations in balancing participation and access. (18:45) Forward-looking ideas for improving workflow design and usability. Resources Mentioned: Bolke de Bruin https://www.linkedin.com/in/bolke/ Metyis | LinkedIn https://www.linkedin.com/company/metyis/ Metyis | Website http://www.metyis.com Apache Airflow https://airflow.apache.org/ Airflow Summit https://airflowsummit.org/ Airflow Dev List https://lists.apache.org/list.html?dev@airflow.apache.org https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/     https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/     https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22 min
  2. Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

    10 JUL

    Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

    Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his team built a drag-and-drop DAG editor for non-coders, contributions which helped shape the Airflow 3.0 Ul and introduced features like external XCom control and bulk APls.  Key Takeaways: (02:30) Day-to-day responsibilities building platforms that simplify orchestration. (05:27) Factors that make onboarding into large open-source projects accessible. (07:35) The value of improved user interfaces for task state visibility and control. (09:49) Enabling faster debugging by exposing internal data through APIs. (13:00) Balancing frontend design goals with backend functionality. (14:19) Creating workflow editors that lower the barrier to entry. (16:54) Supporting a variety of task types within a visual DAG builder. (19:32) Common infrastructure challenges faced by orchestration users. (20:37) Addressing dependency management across distributed environments. Resources Mentioned: Shubham Raj https://www.linkedin.com/in/shubhamrajofficial/ Cloudera | LinkedIn https://www.linkedin.com/company/cloudera/ Cloudera | Website https://www.cloudera.com/ Apache Airflow https://airflow.apache.org/ 2023 Airflow Summit https://airflowsummit.org/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22 min
  3. Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing

    7 JUL

    Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing

    Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows. In this episode, we speak with Yunhao Qing, Software Engineer at Lyft, about building a governed data-engineering platform powered by Airflow that balances flexibility, standardization and scale. Key Takeaways: (03:17) Supporting internal teams with a centralized orchestration platform. (04:54) Migrating to a managed service to reduce infrastructure overhead. (06:04) Embedding platform-level governance into custom components. (08:02) Consolidating and regulating the creation of custom code. (09:48) Identifying and correcting inefficient workflow patterns. (11:17) Replacing manual workarounds with native platform features. (14:32) Preparing teams for major version upgrades. (16:03) Leveraging asset-based scheduling for smarter triggers. (18:13) Envisioning GenAI and semantic search for future productivity. Resources Mentioned: Yunhao Qing https://www.linkedin.com/in/yunhao-qing Lyft | LinkedIn https://www.linkedin.com/company/lyft/ Lyft | Website https://www.lyft.com/ Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ Kubernetes https://kubernetes.io/ https://www.astronomer.io/events/roadshow/london/    https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    20 min
  4. Transforming Customer Education in Data Engineering at Astronomer with Marc Lamberti

    26 JUN

    Transforming Customer Education in Data Engineering at Astronomer with Marc Lamberti

    Understanding the complexities of Apache Airflow can be daunting for newcomers and seasoned data engineers. But with the right guidance, mastering the tool becomes an achievable milestone. In this episode, Marc Lamberti, Head of Customer Education at Astronomer, joins us to share his journey from Udemy instructor to driving education at Astronomer, and how he's helping over 100,000 learners demystify Airflow. Key Takeaways: (02:36) Early exposure to Airflow while addressing inefficiencies in data workflows. (04:10) Common barriers to implementing open source tools in enterprise settings. (06:18) The shift from part-time teaching to a full-time focus on Airflow education. (07:53) A modular, guided approach to structuring educational content. (09:57) The value of highlighting underused Airflow features for broader adoption. (12:35) Certifications as a method to assess readiness and uncover knowledge gaps. (13:25) Coverage of essential Airflow concepts in the Fundamentals exam. (16:07) The DAG Authoring exam’s emphasis on practical, advanced features. (20:08) A call for more visible integration of Airflow with AI workflows. Resources Mentioned: Marc Lamberti https://www.linkedin.com/in/marclamberti/ Astronomer | LinkedIn https://www.linkedin.com/company/astronomer/ Astronomer Academy https://academy.astronomer.io/ Airflow Fundamentals Certification https://www.astronomer.io/certification/ DAG Authoring Certification https://academy.astronomer.io/plan/astronomer-certification-dag-authoring-for-apache-airflow-exam The Complete Hands-On Introduction to Airflow https://www.udemy.com/course/the-complete-hands-on-course-to-master-apache-airflow/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_Beta_Prof_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Beta&utm_content=deal4584&utm_term=_._ag_162511579404_._ad_696197165418_._kw__._de_c_._dm__._pl__._ti_dsa-1677053911088_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21168154305&gbraid=0AAAAADROdO3MpljfP-gssiYSmDEPdhZV9&gclid=Cj0KCQjw097CBhDIARIsAJ3-nxdjZA6G5-Y0-akk6Huksy2PLb04t92J4iNfUSIbMdrSAla_tb-o2N8aArOeEALw_wcB&couponCode=PMNVD3025 https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22 min
  5. Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

    20 JUN

    Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

    The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams. In this episode, we speak with Alberto Crespi, Data Architect at lastminute.com, who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approach. Key Takeaways: (02:17) Defining responsibilities within data architecture teams. (04:15) Consolidating multiple orchestrators into a single solution. (07:00) Scaling Airflow environments with shared infrastructure and DevOps practices. (10:59) Managing dependencies and readiness using SQL sensors. (14:23) Enhancing visibility and response through Slack-integrated monitoring. (19:28) Extending Airflow’s flexibility to run legacy systems. (22:28) Integrating transformation tools into orchestrated pipelines. (25:54) Enabling non-engineers to contribute to pipeline development. (27:33) Fostering adoption through collaboration and communication. Resources Mentioned: Alberto Crespi https://www.linkedin.com/in/crespialberto/ lastminute.com | Website https://lastminute.com Apache Airflow https://airflow.apache.org/ dbt Labs https://www.getdbt.com/ Astronomer Cosmos https://github.com/astronomer/astronomer-cosmos GitLabSlack https://slack.com/ Kubernetes https://kubernetes.io/ Confluence https://www.atlassian.com/software/confluence Slack https://slack.com/ https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/    https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/    https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    30 min
  6. The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla

    12 JUN

    The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla

    Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape. In this episode, Anu Pabla, Principal Engineer at The ODP Corporation, joins us to discuss her journey from legacy orchestration patterns to AI-native pipelines and why she sees Airflow as the future of AI workload orchestration. Key Takeaways: (03:43) Engaging with external technology communities fosters innovation. (05:05) Mentoring early-career engineers builds confidence in a complex tech landscape. (07:51) Orchestration patterns continue to evolve with modern data needs. (08:41) Managing AI workflows requires structured and flexible orchestration. (10:35) High-quality, meaningful data remains foundational across use cases. (15:08) Community-driven open source tools offer lasting value. (16:59) Self-healing systems support both legacy and AI pipelines. (20:20) Orchestration platforms can drive future AI-native workloads. Resources Mentioned: Anu Pabla https://www.linkedin.com/in/atomicap/ The ODP Corporation https://www.linkedin.com/company/the-odp-corporation/ The ODP Corporation | Website https://www.theodpcorp.com/homepage Apache Airflow https://airflow.apache.org/ LlamaIndex https://www.llamaindex.ai/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    23 min
  7. Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel

    5 JUN

    Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel

    The orchestration layer is foundational to building robust AI- and ML-powered data pipelines, especially in complex hybrid enterprise environments. IBM’s partnership with Astronomer reflects a strategic alignment to simplify and scale Airflow-based workflows across industries. In this episode, we’re joined by IBM’s Senior Product Manager, BJ Adesoji, and GTM PM and Growth Leader, Ryan Yackel. We discuss how IBM customers are using Airflow in production, the challenges they face at scale and what the new IBM–Astronomer collaboration unlocks. Key Takeaways: (03:09) The growing importance of orchestration tools in enterprise environments. (04:48) How organizations are expanding orchestration beyond traditional use cases. (05:24) Common patterns across industries adopting orchestration platforms. (07:16) Why orchestration is essential for supporting business-critical workloads. (10:00) The role of orchestration in compliance and regulatory processes. (13:02) Challenges enterprises face when managing orchestration infrastructure. (14:58) Opportunities to simplify and centralize orchestration at scale. (19:11) The value of integrating orchestration with broader data toolchains. (20:54) How AI is shaping the future of orchestrated data workflows. Resources Mentioned: BJ Adesoji https://www.linkedin.com/in/bj-soji/ Ryan Yackel https://www.linkedin.com/in/ryanyackel/ IBM | LinkedIn https://www.linkedin.com/company/databand-ai/ IBM Databand https://www.ibm.com/products/databand IBM DataStage https://www.ibm.com/products/datastage IBM watsonx.governance https://www.ibm.com/products/watsonx-governance IBM Knowledge Catalog https://www.ibm.com/products/knowledge-catalog Apache Airflow https://airflow.apache.org/ watsonx Orchestrate https://www.ibm.com/products/watsonx-orchestrate Domino https://domino.ai/ Astronomer https://www.astronomer.io/ Snowflake https://www.snowflake.com/en/ dbt Labs https://www.getdbt.com/ Amazon SageMaker https://aws.amazon.com/sagemaker/ Cloudera https://www.cloudera.com/ MongoDB https://www.mongodb.com/ https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    25 min
  8. Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich

    29 MAY

    Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich

    Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich, Data Developer for Data Science at Wix, shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team. In this episode, Gil explains how this internal framework simplifies DAG creation, improves documentation accuracy, and enables consistent task generation for machine learning pipelines. He also shares lessons from complex DAG optimization and maintaining testable code. Key Takeaways: (03:23) Code duplication creates long-term problems. (08:16) Frameworks bring order to complex pipelines. (09:41) Shared functions cut down repetitive code. (17:18) Auto-generated docs stay accurate by design. (22:40) On-demand DAGs support real-time workflows. (25:08) Task-level sensors improve run efficiency. (27:40) Combine local runs with automated tests. (30:09) Clean code helps teams scale faster. Resources Mentioned: Gil Reich https://www.linkedin.com/in/gilreich/ Wix | LinkedIn https://www.linkedin.com/company/wix-com/ Wix | Website https://www.wix.com/ DS DAG Framework https://airflowsummit.org/slides/2024/92-refactoring-dags.pdf Apache Airflow https://airflow.apache.org/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    31 min

About

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada