The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

    11H AGO

    The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

    The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards. Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML ops, regulatory compliance and large-scale data orchestration. He also shares insights into upgrading to Airflow 3 and the importance of balancing flexibility with security in a healthcare environment. Key Takeaways: 00:00 Introduction. 04:21 The role of Airflow in managing ETL pipelines and ML retraining. 06:23 Using AWS SageMaker for ML training and deployment. 07:47 Why Airflow’s versatility makes it ideal for MLOps. 10:50 The importance of documentation and best practices for engineering teams. 13:44 Automating anonymization of user data for compliance. 15:30 The benefits of remote execution in Airflow 3 for regulated industries. 18:16 Quality-of-life improvements and desired features in future Airflow versions. Resources Mentioned: Max Calehuff https://www.linkedin.com/in/maxwell-calehuff/ Vivian Health | LinkedIn https://www.linkedin.com/company/vivianhealth/ Vivian Health | Website https://www.vivian.com Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ AWS SageMaker https://www.google.com/aclk?sa=L&ai=DChsSEwj3-fbz1tiQAxWXlKYDHXUBBVoYACICCAEQABoCdGI&ae=2&aspm=1&co=1&ase=2&gclid=Cj0KCQiA5abIBhCaARIsAM3-zFWbfj2olUvX4dqoiYNaE3q2fMf_ZifRjmbKNQCVX7D6ZMClaUXUkFkaAuwmEALw_wcB&cid=CAASQuRoMccxWhBvMq-1Uez3XOZti1ul7mTDotKvSMoDHv0q2xCsyS2FzMptO5dJf3tmfkLRu22TtD8ChTmdjvs6YetTjQ&cce=2&category=acrcp_v1_35&sig=AOD64_2xE2xolEEVbpDb56qXQluxTzs-Aw&q&nis=4&adurl&ved=2ahUKEwj7le3z1tiQAxWXcvUHHfZePbAQ0Qx6BAgUEAE dbtLabs https://www.getdbt.com/ Cosmos https://github.com/astronomer/astronomer-cosmos Split https://www.split.io/ Snowflake https://www.snowflake.com/en/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    20 min
  2. Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

    DEC 4

    Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

    The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight. In this episode, András Gombosi, Senior Engineering Manager of Data Infra and Analytics Engineering, and Paul Vickers, Principal Engineer, both at Intercom, share how they built one of the largest Airflow deployments in production and enabled self-serve data platforms across teams. Key Takeaways: 00:00 Introduction. 04:24 Community input encourages confident adoption of a common platform. 08:50 Self-serve workflows require consistent guardrails and review. 09:25 Internal infrastructure support accelerates scalable deployments. 13:26 Batch LLM processing benefits from a configuration-driven design. 15:20 Standardized development environments enable effective AI-assisted work. 19:58 Applied AI enhances internal analysis and operational enablement. 27:27 Strong test coverage and staged upgrades protect stability. 30:36 Proactive observability and on-call ownership improve outcomes. Resources Mentioned: András Gombosi https://www.linkedin.com/in/andrasgombosi/ Paul Vickers https://www.linkedin.com/in/paul-vickers-a22b76a3/ Intercom | LinkedIn https://www.linkedin.com/company/intercom/ Intercom | Website https://www.intercom.com Apache Airflow https://airflow.apache.org/ dbtLabs https://www.getdbt.com/ Snowflake Cortex AI https://www.snowflake.com/en/product/features/cortex/ Datadog https://www.datadoghq.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    34 min
  3. How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

    NOV 20

    How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

    Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability. In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry. Key Takeaways: 00:00 Introduction. 06:19 Custom scripts made sharing and reuse difficult. 09:29 Workflows are manually triggered with user traceability. 10:38 Customization supports varied compute requirements. 12:48 Persistent volumes allow tasks to share large amounts of data. 14:25 Custom operators separate logic from infrastructure. 16:43 Modified triggers connect dependent workflows. 18:36 UI plugins enable file uploads and secure access. Resources Mentioned: Anja MacKenzie https://www.linkedin.com/in/anja-mackenzie/ Covestro | LinkedIn https://www.linkedin.com/company/covestro/ Covestro | Website https://www.covestro.com Apache Airflow https://airflow.apache.org/ Kubernetes https://kubernetes.io/ Airflow KubernetesPodOperator https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html Astronomer https://www.astronomer.io/ Airflow Academy by Marc Lamberti https://www.udemy.com/user/lockgfg/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_GammaCatchall_NonP_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Gamma&utm_content=deal4584&utm_term=_._ag_169801645584_._ad_700876640602_._kw__._de_c_._dm__._pl__._ti_dsa-1456167871416_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21341313808&gbraid=0AAAAADROdO1_-I2TMcVyU8F3i1jRXJ24K&gclid=Cj0KCQjwvJHIBhCgARIsAEQnWlC1uYHIRm3y9Q8rPNSuVPNivsxogqfczpKHwhmNho2uKZYC-y0taNQaApU2EALw_wcB Airflow Documentation https://airflow.apache.org/docs/ Airflow Plugins https://airflow.apache.org/docs/apache-airflow/1.10.9/plugins.html Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow

    23 min
  4. Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

    NOV 13

    Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

    The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments. In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team leverages Airflow for ETF calculations, data validation and workflow reliability within tightly controlled release cycles. Key Takeaways: 00:00 Introduction. 03:24 The orchestrator ensures secure and auditable workflows. 05:13 Validations before and after computation prevent errors. 08:24 Release freezes shape prioritization and delivery plans. 11:14 Migration plans must respect managed service constraints. 13:04 Versioning, backfills and event triggers increase reliability. 15:08 UI and integration improvements simplify operations. 18:05 New contributors should start small and seek help. Resources Mentioned: Valentyn Druzhynin https://www.linkedin.com/in/valentyn-druzhynin/ AgileEngine | LinkedIn https://www.linkedin.com/company/agileengine/ AgileEngine | Website https://agileengine.com/ Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ AWS Managed Airflow https://aws.amazon.com/managed-workflows-for-apache-airflow/ Google Cloud Composer (Managed Airflow) https://cloud.google.com/composer Airflow Summit https://airflowsummit.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    21 min
  5. How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar

    NOV 6

    How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar

    The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat. In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen governance, reduce compliance risk and improve customer experience. Key Takeaways: 00:00 Introduction. 01:53 A focused analytics platform reduces compliance risk in life sciences. 07:31 A centralized warehouse orchestrated by Airflow strengthens governance. 09:12 Managed orchestration keeps attention on analytics and outcomes. 10:32 A modern transformation stack enables scalable modeling and operations. 11:51 Event-driven pipelines improve data freshness and responsiveness. 14:13 Asset-oriented scheduling and versioning enhance reliability and change control. 16:53 Observability and SLAs build confidence in data quality and freshness. 21:04 Priorities include partitioned assets and streamlined developer tooling. Resources Mentioned: Shankar Mahindar https://www.linkedin.com/in/shankar-mahindar-83a61b137/ Redica Systems | LinkedIn https://www.linkedin.com/company/redicasystems/ Redica Systems | Website https://redica.com Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ Snowflake https://www.snowflake.com/ AWS https://aws.amazon.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    24 min
  6. How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov

    OCT 30

    How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov

    The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data. In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entity extraction and fuzzy matching, linking the UK Register of Members’ Financial Interests with Companies House, and how this work cuts weeks of manual analysis to minutes. Key Takeaways: 00:00 Introduction. 02:12 What computational journalism means for day-to-day newsroom work. 05:22 Why a shared orchestration platform supports consistent, scalable workflows. 08:30 Tradeoffs of one centralized platform versus many separate instances. 11:52 Using pipelines to structure messy sources for faster analysis. 14:14 Turning recurring disclosures into usable data for investigations. 16:03 Applying lightweight ML and matching to reveal entities and links. 18:46 How automation reduces manual effort and shortens time to insight. 20:41 Practical improvements that make backfilling and reliability easier. Resources Mentioned: Zdravko Hvarlingov https://www.linkedin.com/in/zdravko-hvarlingov-3aa36016b/ Financial Times | LinkedIn https://www.linkedin.com/company/financial-times/ Financial Times | Website https://www.ft.com/ Apache Airflow https://airflow.apache.org/ UK Register of Members’ Financial Interests https://www.parliament.uk/mps-lords-and-offices/standards-and-financial-interests/parliamentary-commissioner-for-standards/registers-of-interests/register-of-members-financial-interests/ UK Companies House https://www.gov.uk/government/organisations/companies-house Doppler https://www.doppler.com/ Kubernetes https://kubernetes.io/ Airflow Kubernetes Executor https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html GitHub https://github.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    24 min
  7. Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

    OCT 23

    Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

    The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration. Key Takeaways: 00:00 Introduction. 05:28 Challenges of decentralization. 06:45 YAML-based generator standardizes pipelines and dependencies. 12:28 Declarative assets and sensors align cross-DAG dependencies. 17:29 Task-level callbacks enable auto-recovery and clear ownership. 21:39 Standardized building blocks simplify upgrades and maintenance. 24:52 Platform focus frees domain work. 26:49 Container-only standardization prevents sprawl. Resources Mentioned: Oscar Ligthart https://www.linkedin.com/in/oscar-ligthart/ Rodrigo Loredo https://www.linkedin.com/in/rodrigo-loredo-410a16134/ Vinted | LinkedIn https://www.linkedin.com/company/vinted/ Vinted | Website https://www.vinted.com/?srsltid=AfmBOor87MGR_eLOauCO93V9A-aLDaAhGYx9cnu_oN8s1SAXMlCRuhW7 Apache Airflow https://airflow.apache.org/ Kubernetes https://kubernetes.io/ dbt https://www.getdbt.com/ Google Cloud Vertex AI https://cloud.google.com/vertex-ai Airflow Datasets & Assets (concepts) https://www.astronomer.io/docs/learn/airflow-datasets Airflow Summit https://airflowsummit.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    30 min
  8. Transforming Data Pipelines at XENA Intelligence with Naseem Shah

    OCT 16

    Transforming Data Pipelines at XENA Intelligence with Naseem Shah

    The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed Amazon review analysis with LLMs. Key Takeaways: 00:00 Introduction. 03:28 The importance of building initial products that support growth and investment. 06:16 The process of adopting new tools to improve reliability and efficiency. 09:29 Approaches to learning complex technologies through practice and fundamentals. 13:57 Trade-offs small teams face when balancing performance and costs. 18:40 Using AI-driven approaches to generate insights from large datasets. 22:38 How unstructured data can be transformed into actionable information. 25:55 Moving from manual tasks to fully automated workflows. 28:05 Orchestration as a foundation for scaling advanced use cases. Resources Mentioned: Naseem Shah https://www.linkedin.com/in/naseemshah/ Xena Intelligence | LinkedIn https://www.linkedin.com/company/xena-intelligence/ Xena Intelligence | Website https://xenaintelligence.com/ Apache Airflow https://airflow.apache.org/ Google Cloud Composer https://cloud.google.com/composer Techstars https://www.techstars.com/ Docker https://www.docker.com/ AWS SQS https://aws.amazon.com/sqs/ PostgreSQL https://www.postgresql.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    29 min
5
out of 5
20 Ratings

About

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

You Might Also Like