The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

    -10 Ч

    Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi

    The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams. In this episode, we speak with Alberto Crespi, Data Architect at lastminute.com, who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approach. Key Takeaways: (02:17) Defining responsibilities within data architecture teams. (04:15) Consolidating multiple orchestrators into a single solution. (07:00) Scaling Airflow environments with shared infrastructure and DevOps practices. (10:59) Managing dependencies and readiness using SQL sensors. (14:23) Enhancing visibility and response through Slack-integrated monitoring. (19:28) Extending Airflow’s flexibility to run legacy systems. (22:28) Integrating transformation tools into orchestrated pipelines. (25:54) Enabling non-engineers to contribute to pipeline development. (27:33) Fostering adoption through collaboration and communication. Resources Mentioned: Alberto Crespi https://www.linkedin.com/in/crespialberto/ lastminute.com | Website https://lastminute.com Apache Airflow https://airflow.apache.org/ dbt Labs https://www.getdbt.com/ Astronomer Cosmos https://github.com/astronomer/astronomer-cosmos GitLabSlack https://slack.com/ Kubernetes https://kubernetes.io/ Confluence https://www.atlassian.com/software/confluence Slack https://slack.com/ https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/    https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/    https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    30 мин.
  2. The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla

    12 ИЮН.

    The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla

    Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape. In this episode, Anu Pabla, Principal Engineer at The ODP Corporation, joins us to discuss her journey from legacy orchestration patterns to AI-native pipelines and why she sees Airflow as the future of AI workload orchestration. Key Takeaways: (03:43) Engaging with external technology communities fosters innovation. (05:05) Mentoring early-career engineers builds confidence in a complex tech landscape. (07:51) Orchestration patterns continue to evolve with modern data needs. (08:41) Managing AI workflows requires structured and flexible orchestration. (10:35) High-quality, meaningful data remains foundational across use cases. (15:08) Community-driven open source tools offer lasting value. (16:59) Self-healing systems support both legacy and AI pipelines. (20:20) Orchestration platforms can drive future AI-native workloads. Resources Mentioned: Anu Pabla https://www.linkedin.com/in/atomicap/ The ODP Corporation https://www.linkedin.com/company/the-odp-corporation/ The ODP Corporation | Website https://www.theodpcorp.com/homepage Apache Airflow https://airflow.apache.org/ LlamaIndex https://www.llamaindex.ai/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    23 мин.
  3. Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel

    5 ИЮН.

    Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel

    The orchestration layer is foundational to building robust AI- and ML-powered data pipelines, especially in complex hybrid enterprise environments. IBM’s partnership with Astronomer reflects a strategic alignment to simplify and scale Airflow-based workflows across industries. In this episode, we’re joined by IBM’s Senior Product Manager, BJ Adesoji, and GTM PM and Growth Leader, Ryan Yackel. We discuss how IBM customers are using Airflow in production, the challenges they face at scale and what the new IBM–Astronomer collaboration unlocks. Key Takeaways: (03:09) The growing importance of orchestration tools in enterprise environments. (04:48) How organizations are expanding orchestration beyond traditional use cases. (05:24) Common patterns across industries adopting orchestration platforms. (07:16) Why orchestration is essential for supporting business-critical workloads. (10:00) The role of orchestration in compliance and regulatory processes. (13:02) Challenges enterprises face when managing orchestration infrastructure. (14:58) Opportunities to simplify and centralize orchestration at scale. (19:11) The value of integrating orchestration with broader data toolchains. (20:54) How AI is shaping the future of orchestrated data workflows. Resources Mentioned: BJ Adesoji https://www.linkedin.com/in/bj-soji/ Ryan Yackel https://www.linkedin.com/in/ryanyackel/ IBM | LinkedIn https://www.linkedin.com/company/databand-ai/ IBM Databand https://www.ibm.com/products/databand IBM DataStage https://www.ibm.com/products/datastage IBM watsonx.governance https://www.ibm.com/products/watsonx-governance IBM Knowledge Catalog https://www.ibm.com/products/knowledge-catalog Apache Airflow https://airflow.apache.org/ watsonx Orchestrate https://www.ibm.com/products/watsonx-orchestrate Domino https://domino.ai/ Astronomer https://www.astronomer.io/ Snowflake https://www.snowflake.com/en/ dbt Labs https://www.getdbt.com/ Amazon SageMaker https://aws.amazon.com/sagemaker/ Cloudera https://www.cloudera.com/ MongoDB https://www.mongodb.com/ https://www.astronomer.io/events/roadshow/london/  https://www.astronomer.io/events/roadshow/new-york/  https://www.astronomer.io/events/roadshow/sydney/  https://www.astronomer.io/events/roadshow/san-francisco/  https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    25 мин.
  4. Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich

    29 МАЯ

    Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich

    Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich, Data Developer for Data Science at Wix, shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team. In this episode, Gil explains how this internal framework simplifies DAG creation, improves documentation accuracy, and enables consistent task generation for machine learning pipelines. He also shares lessons from complex DAG optimization and maintaining testable code. Key Takeaways: (03:23) Code duplication creates long-term problems. (08:16) Frameworks bring order to complex pipelines. (09:41) Shared functions cut down repetitive code. (17:18) Auto-generated docs stay accurate by design. (22:40) On-demand DAGs support real-time workflows. (25:08) Task-level sensors improve run efficiency. (27:40) Combine local runs with automated tests. (30:09) Clean code helps teams scale faster. Resources Mentioned: Gil Reich https://www.linkedin.com/in/gilreich/ Wix | LinkedIn https://www.linkedin.com/company/wix-com/ Wix | Website https://www.wix.com/ DS DAG Framework https://airflowsummit.org/slides/2024/92-refactoring-dags.pdf Apache Airflow https://airflow.apache.org/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    31 мин.
  5. Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero

    22 МАЯ

    Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero

    Legacy architecture and AI workloads pose unique challenges at scale, especially in a global enterprise with complex data systems. In this episode, we explore strategies to proactively monitor and optimize pipelines while minimizing downstream failures. Adonis Castillo Cordero, Senior Automation Manager at Procter & Gamble, joins us to share actionable best practices for dependency mapping, anomaly detection and architecture simplification using Apache Airflow. Key Takeaways: (03:13) Integrating legacy data systems into modern architecture. (05:51) Designing workflows for real-time data processing. (07:57) Mapping dependencies early to avoid pipeline failures. (09:02) Building automated monitoring into orchestration frameworks. (12:09) Detecting anomalies to prevent performance bottlenecks. (15:24) Monitoring data quality to catch silent failures. (17:02) Prioritizing responses based on impact severity. (18:55) Simplifying dashboards to highlight critical metrics. Resources Mentioned: Adonis Castillo Cordero https://www.linkedin.com/in/adoniscc/ Procter & Gamble | LinkedIn https://www.linkedin.com/company/procter-and-gamble/ Procter & Gamble | Website http://www.pg.com Apache Airflow https://airflow.apache.org/ OpenLineage https://openlineage.io/ Azure Monitor https://azure.microsoft.com/en-us/products/monitor/ AWS Lookout for Metrics https://aws.amazon.com/lookout-for-metrics/ Monte Carlo https://www.montecarlodata.com/ Great Expectations https://greatexpectations.io/ https://www.astronomer.io/events/roadshow/london/    https://www.astronomer.io/events/roadshow/new-york/    https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/    https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22 мин.
  6. Building an End-to-End Data Observability System at Netflix with Joseph Machado

    15 МАЯ

    Building an End-to-End Data Observability System at Netflix with Joseph Machado

    Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy. Joseph Machado, Senior Data Engineer at Netflix, joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems. Key Takeaways: . (03:14) Supporting data privacy and engineering efficiency within data systems. (10:41) Validating outputs with reconciliation checks to catch transformation issues. (16:06) Applying standardized patterns for auditing, validating and publishing data. (19:28) Capturing historical check results to monitor system health and improvements. (21:29) Treating data quality and availability as separate monitoring concerns. (26:26) Using containerization strategies to streamline pipeline executions. (29:47) Leveraging orchestration platforms for better visibility and retry capability. (31:59) Managing business pressure without sacrificing data quality practices. (35:46) Starting simple with quality checks and evolving toward more complex frameworks. Resources Mentioned: Joseph Machado https://www.linkedin.com/in/josephmachado1991/ Netflix | LinkedIn https://www.linkedin.com/company/netflix/ Netflix | Website https://www.netflix.com/browse Start Data Engineering https://www.startdataengineering.com/ Apache Airflow https://airflow.apache.org/ dbt Labs https://www.getdbt.com/ Great Expectations https://greatexpectations.io/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    39 мин.
  7. Why Developer Experience Shapes Data Pipeline Standards at Next Insurance with Snir Israeli

    8 МАЯ

    Why Developer Experience Shapes Data Pipeline Standards at Next Insurance with Snir Israeli

    Creating consistency across data pipelines is critical for scaling engineering teams and ensuring long-term maintainability. In this episode, Snir Israeli, Senior Data Engineer at Next Insurance, shares how enforcing coding standards and investing in developer experience transformed their approach to data engineering. He explains how implementing automated code checks, clear documentation practices and a scoring system helped drive alignment across teams, improve collaboration and reduce technical debt in a fast-growing data environment. Key Takeaways: (02:59) Inconsistencies in code style create challenges for collaboration and maintenance. (04:22) Programmatically enforcing rules helps teams scale their best practices. (08:55) Performance improvements in data pipelines lead to infrastructure cost savings. (13:22) Developer experience is essential for driving adoption of internal tools. (19:44) Dashboards can operationalize standards enforcement and track progress over time. (22:49) Standardization accelerates onboarding and reduces friction in code reviews. (25:39) Linting rules require ongoing maintenance as tools and platforms evolve. (27:47) Starting small and involving the team leads to better adoption and long-term success. Resources Mentioned: Snir Israeli https://www.linkedin.com/in/snir-israeli/ Next Insurance | LinkedIn https://www.linkedin.com/company/nextinsurance/ Next Insurance | Website https://www.nextinsurance.com/ Apache Airflow https://airflow.apache.org/ https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/      https://www.astronomer.io/events/roadshow/sydney/     https://www.astronomer.io/events/roadshow/san-francisco/     https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    30 мин.
  8. Data Quality and Observability at Tekmetric with Ipsa Trivedi

    1 МАЯ

    Data Quality and Observability at Tekmetric with Ipsa Trivedi

    Airflow’s adaptability is driving Tekmetric’s ability to unify complex data workflows, deliver accurate insights and support both internal operations and customer-facing services — all within a rapidly growing startup environment. In this episode, Ipsa Trivedi, Lead Data Engineer at Tekmetric, shares how her team is standardizing pipelines while supporting unique customer needs. She explains how Airflow enables end-to-end data services, simplifies orchestration across varied sources and supports scalable customization. Ipsa also highlights early wins with Airflow, its intuitive UI and the team's roadmap toward data quality, observability and a future self-serve data platform. Key Takeaways: (02:26) Powering auto shops nationwide with a unified platform. (05:17) A new data team was formed to centralize and scale insights. (07:23) Flexible, open source and made to fit — Airflow wins. (10:42) Pipelines handle anything from email to AWS. (12:15) Custom DAGs fit every team’s unique needs. (17:01) Data quality checks are built into the plan. (18:17) Self-serve data mesh is the end goal. (19:59) Airflow now fits so well, there's nothing left on the wishlist. Resources Mentioned: Ipsa Trivedi https://www.linkedin.com/in/ipsatrivedi/ Tekmetric | LinkedIn https://www.linkedin.com/company/tekmetric/ Tekmetric | Website https://www.tekmetric.com/ Apache Airflow https://airflow.apache.org/ AWS RDS https://aws.amazon.com/free/database/?trk=fc551e06-56b0-418c-9ddd-5c9dba18569b&sc_channel=ps&ef_id=CjwKCAjwzMi_BhACEiwAX4YZULS4jV2Xpnpcac_Q3eS9BAg-klKUDyCt6XSdOul8BLHkmWzFFh4NXRoCGhQQAvD_BwE:G:s&s_kwcid=AL!4422!3!548989592596!e!!g!!amazon%20sql%20database!11543056228!112002958549&gclid=CjwKCAjwzMi_BhACEiwAX4YZULS4jV2Xpnpcac_Q3eS9BAg-klKUDyCt6XSdOul8BLHkmWzFFh4NXRoCGhQQAvD_BwE Astro by Astronomer https://www.astronomer.io/product/ https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/     https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/     https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    23 мин.
5
из 5
Оценок: 20

Об этом подкасте

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

Вам может также понравиться

Чтобы прослушивать выпуски с ненормативным контентом, войдите в систему.

Следите за новостями подкаста

Войдите в систему или зарегистрируйтесь, чтобы следить за подкастами, сохранять выпуски и получать последние обновления.

Выберите страну или регион

Африка, Ближний Восток и Индия

Азиатско-Тихоокеанский регион

Европа

Латинская Америка и страны Карибского бассейна

США и Канада