The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira

    1 DAY AGO

    Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira

    A strong data-driven mindset underpins how fintech teams scale analytics, infrastructure and decision-making across the business. In this episode, Jaime Oliveira, Lead Data Engineer at Uphold, joins us to discuss how Uphold structures its data organization and orchestration strategy. Jaime shares how the team uses Airflow and dbt to support analytics, reporting and data activation while evolving their approach as the stack grows. Key Takeaways: 00:00 Introduction. 01:23 A data-driven mindset supports product development and business decisions. 02:55 Diverse ingestion pipelines enable scalable analytics. 04:18 A single orchestration platform simplifies analytics workflows. 05:17 Early experience with orchestration tools shapes engineering practices. 08:16 Analytics orchestration works best when aligned with transformation workflows. 09:25 Infrastructure choices involve tradeoffs in testing, visibility and overhead. 16:39 More collaborative workflow tools could improve accessibility and autonomy. Resources Mentioned: Jaime Oliveira https://www.linkedin.com/in/jaime-oliveira-b075855a/ Uphold | LinkedIn https://www.linkedin.com/company/upholdinc/ Uphold | Website https://uphold.com Apache Airflow https://airflow.apache.org dbt https://www.getdbt.com Snowflake https://www.snowflake.com Kubernetes https://kubernetes.io Astronomer Cosmos https://astronomer.github.io/astronomer-cosmos Cosmos e-book https://www.astronomer.io/ebooks/orchestrating-dbt-with-airflow-using-cosmos/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    19 min
  2. Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi

    29 JAN

    Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi

    Building reliable data pipelines at scale requires more than writing code. It depends on thoughtful design, infrastructure trade-offs and an understanding of how orchestration platforms evolve over time. In this episode, Airflow best practices shaped by real-world implementation are examined. Bhavani Ravi, Independent Software Consultant and Apache Airflow Champion, shares lessons on pipeline design, architectural decisions and the evolution of the Airflow ecosystem in modern data environments. Key Takeaways: 00:00 Introduction. 01:30 Independent consulting supports effective Airflow adoption. 02:38 Early challenges shaped modern Airflow practices. 03:21 Airflow setup has become significantly simpler. 04:30 New features expanded workflow capabilities. 06:03 Frequent releases support long-term sustainability. 07:34 Community and providers strengthen the ecosystem. 10:03 Pipeline design should come before coding. 10:55 Decoupling logic requires careful trade-offs. 13:30 Plugins extend Airflow into new use cases. Resources Mentioned: Bhavani Ravi https://www.linkedin.com/in/bhavanicodes/ Apache Airflow https://airflow.apache.org/ Kubernetes https://kubernetes.io/ Azure Fabric https://learn.microsoft.com/en-us/fabric/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    17 min
  3. Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang

    22 JAN

    Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang

    Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems. In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions. Key Takeaways: 00:00 Introduction. 01:17 Large-scale data platforms require low-latency processing capabilities. 02:08 Batch workloads can complement streaming pipelines for additional use cases. 03:45 An orchestration framework can act as the core coordination layer. 06:12 Batch processing enables workloads that streaming alone cannot support. 08:50 Ecosystem maturity and observability are key orchestration considerations. 10:15 Built-in run history and logs make failures easier to diagnose. 14:20 Platform users can monitor workflows without managing orchestration logic. 17:08 Identity, secrets and scheduling present ongoing optimization challenges. 19:59 Configuration history and change visibility improve operational reliability. Resources Mentioned: Han Zhang https://www.linkedin.com/in/zhanghan177 Conviva | Website http://www.conviva.com Apache Airflow https://airflow.apache.org/ Celery https://docs.celeryq.dev/ Temporal https://temporal.io/ Kubernetes https://kubernetes.io/ LDAP https://ldap.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    22 min
  4. Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik

    15 JAN

    Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik

    Data platforms are moving from batch-first pipelines to near real-time systems where orchestration, observability, scalability and governance all have to work together. In this episode, Arun Karthik, Director, Data Solutions Engineering at Condé Nast Technology Lab, joins us to share how data engineering evolves from relational databases and ETL into distributed processing, modern orchestration with Apache Airflow and managed Airflow with Astronomer. Key Takeaways: 00:00 Introduction. 02:13 Early data systems rely heavily on relational databases and batch-oriented processing models. 07:01 Scheduling requirements evolve beyond fixed time windows as dependencies increase. 10:14 Ease of use and developer experience influence adoption of orchestration frameworks. 13:22 Operating open source orchestration tools requires ongoing engineering effort. 14:45 Managed services help teams reduce infrastructure and maintenance responsibilities. 17:27 Observability improves confidence in pipeline execution and system health. 19:12 Governance considerations grow in importance as data platforms mature. 20:46 Building data systems requires balancing speed, reliability and long-term sustainability. Resources Mentioned: Arun Karthik https://www.linkedin.com/in/earunkarthik/ Condé Nast Technology Lab | LinkedIn https://www.linkedin.com/company/conde-nast-technology-lab/ Condé Nast Technology Lab | Website https://www.condenast.com/ Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ Apache Spark https://spark.apache.org/ Apache Hadoop https://hadoop.apache.org/ Jenkins https://www.jenkins.io/ dbt Labs https://www.getdbt.com/product/what-is-dbt Amazon Web Services https://aws.amazon.com/free/?trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&ef_id=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE:G:s&s_kwcid=AL!4422!3!785574063524!e!!g!!amazon%20web%20services!23291338728!189486861095&gad_campaignid=23291338728&gbraid=0AAAAADjHtp813XNbg7azDj5QMwJPbGNqZ&gclid=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    24 min
  5. The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

    11/12/2025

    The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

    The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards. Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML ops, regulatory compliance and large-scale data orchestration. He also shares insights into upgrading to Airflow 3 and the importance of balancing flexibility with security in a healthcare environment. Key Takeaways: 00:00 Introduction. 04:21 The role of Airflow in managing ETL pipelines and ML retraining. 06:23 Using AWS SageMaker for ML training and deployment. 07:47 Why Airflow’s versatility makes it ideal for MLOps. 10:50 The importance of documentation and best practices for engineering teams. 13:44 Automating anonymization of user data for compliance. 15:30 The benefits of remote execution in Airflow 3 for regulated industries. 18:16 Quality-of-life improvements and desired features in future Airflow versions. Resources Mentioned: Max Calehuff https://www.linkedin.com/in/maxwell-calehuff/ Vivian Health | LinkedIn https://www.linkedin.com/company/vivianhealth/ Vivian Health | Website https://www.vivian.com Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ AWS SageMaker https://www.google.com/aclk?sa=L&ai=DChsSEwj3-fbz1tiQAxWXlKYDHXUBBVoYACICCAEQABoCdGI&ae=2&aspm=1&co=1&ase=2&gclid=Cj0KCQiA5abIBhCaARIsAM3-zFWbfj2olUvX4dqoiYNaE3q2fMf_ZifRjmbKNQCVX7D6ZMClaUXUkFkaAuwmEALw_wcB&cid=CAASQuRoMccxWhBvMq-1Uez3XOZti1ul7mTDotKvSMoDHv0q2xCsyS2FzMptO5dJf3tmfkLRu22TtD8ChTmdjvs6YetTjQ&cce=2&category=acrcp_v1_35&sig=AOD64_2xE2xolEEVbpDb56qXQluxTzs-Aw&q&nis=4&adurl&ved=2ahUKEwj7le3z1tiQAxWXcvUHHfZePbAQ0Qx6BAgUEAE dbtLabs https://www.getdbt.com/ Cosmos https://github.com/astronomer/astronomer-cosmos Split https://www.split.io/ Snowflake https://www.snowflake.com/en/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    20 min
  6. Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

    04/12/2025

    Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

    The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight. In this episode, András Gombosi, Senior Engineering Manager of Data Infra and Analytics Engineering, and Paul Vickers, Principal Engineer, both at Intercom, share how they built one of the largest Airflow deployments in production and enabled self-serve data platforms across teams. Key Takeaways: 00:00 Introduction. 04:24 Community input encourages confident adoption of a common platform. 08:50 Self-serve workflows require consistent guardrails and review. 09:25 Internal infrastructure support accelerates scalable deployments. 13:26 Batch LLM processing benefits from a configuration-driven design. 15:20 Standardized development environments enable effective AI-assisted work. 19:58 Applied AI enhances internal analysis and operational enablement. 27:27 Strong test coverage and staged upgrades protect stability. 30:36 Proactive observability and on-call ownership improve outcomes. Resources Mentioned: András Gombosi https://www.linkedin.com/in/andrasgombosi/ Paul Vickers https://www.linkedin.com/in/paul-vickers-a22b76a3/ Intercom | LinkedIn https://www.linkedin.com/company/intercom/ Intercom | Website https://www.intercom.com Apache Airflow https://airflow.apache.org/ dbtLabs https://www.getdbt.com/ Snowflake Cortex AI https://www.snowflake.com/en/product/features/cortex/ Datadog https://www.datadoghq.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    34 min
  7. How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

    20/11/2025

    How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

    Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability. In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry. Key Takeaways: 00:00 Introduction. 06:19 Custom scripts made sharing and reuse difficult. 09:29 Workflows are manually triggered with user traceability. 10:38 Customization supports varied compute requirements. 12:48 Persistent volumes allow tasks to share large amounts of data. 14:25 Custom operators separate logic from infrastructure. 16:43 Modified triggers connect dependent workflows. 18:36 UI plugins enable file uploads and secure access. Resources Mentioned: Anja MacKenzie https://www.linkedin.com/in/anja-mackenzie/ Covestro | LinkedIn https://www.linkedin.com/company/covestro/ Covestro | Website https://www.covestro.com Apache Airflow https://airflow.apache.org/ Kubernetes https://kubernetes.io/ Airflow KubernetesPodOperator https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html Astronomer https://www.astronomer.io/ Airflow Academy by Marc Lamberti https://www.udemy.com/user/lockgfg/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_GammaCatchall_NonP_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Gamma&utm_content=deal4584&utm_term=_._ag_169801645584_._ad_700876640602_._kw__._de_c_._dm__._pl__._ti_dsa-1456167871416_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21341313808&gbraid=0AAAAADROdO1_-I2TMcVyU8F3i1jRXJ24K&gclid=Cj0KCQjwvJHIBhCgARIsAEQnWlC1uYHIRm3y9Q8rPNSuVPNivsxogqfczpKHwhmNho2uKZYC-y0taNQaApU2EALw_wcB Airflow Documentation https://airflow.apache.org/docs/ Airflow Plugins https://airflow.apache.org/docs/apache-airflow/1.10.9/plugins.html Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    23 min
  8. Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

    13/11/2025

    Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

    The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments. In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team leverages Airflow for ETF calculations, data validation and workflow reliability within tightly controlled release cycles. Key Takeaways: 00:00 Introduction. 03:24 The orchestrator ensures secure and auditable workflows. 05:13 Validations before and after computation prevent errors. 08:24 Release freezes shape prioritization and delivery plans. 11:14 Migration plans must respect managed service constraints. 13:04 Versioning, backfills and event triggers increase reliability. 15:08 UI and integration improvements simplify operations. 18:05 New contributors should start small and seek help. Resources Mentioned: Valentyn Druzhynin https://www.linkedin.com/in/valentyn-druzhynin/ AgileEngine | LinkedIn https://www.linkedin.com/company/agileengine/ AgileEngine | Website https://agileengine.com/ Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ AWS Managed Airflow https://aws.amazon.com/managed-workflows-for-apache-airflow/ Google Cloud Composer (Managed Airflow) https://cloud.google.com/composer Airflow Summit https://airflowsummit.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    21 min

About

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

You Might Also Like