The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

    13시간 전

    Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

    StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the company leverages Airflow, dbt, and Cosmos to drive marketplace intelligence, improve client connections and deliver measurable growth for professionals. Key Takeaways: 00:00 Introduction. 05:44 The role of the data engineering team in driving business success. 08:52 Leveraging technology for real-time business intelligence. 10:52 Data-driven strategies for improving marketing outcomes. 13:05 How adopting the right tools can increase revenue growth. 14:25 Advantages of simplifying and integrating technical workflows. 18:45 Benefits of multi-environment configurations for development and production. 20:17 Foundational skills and best practices for learning Airflow effectively. 22:33 Opportunities for deeper tool integration and improved data visualization. Resources Mentioned: Paschal Onuorah https://www.linkedin.com/in/onuorah-paschal/ StyleSeat | LinkedIn https://www.linkedin.com/company/styleseat/ StyleSeat | Website https://www.styleseat.com Apache Airflow https://airflow.apache.org/ dbt https://www.getdbt.com/ Astronomer Cosmos https://www.astronomer.io/cosmos/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    23분
  2. Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak

    8월 28일

    Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak

    The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines. In this episode, we’re joined by Ian Buss, Principal Software Engineer at Astronomer, and Piotr Chomiak, Principal Product Manager at Astronomer, who share insights into the Astro Executor and remote execution. Key Takeaways: 00:00 Introduction. 04:13 How product leadership drives scalability for enterprise needs. 08:23 Architectural changes that improve reliability and remove bottlenecks. 10:15 Metrics that enhance visibility into system performance. 12:54 The role of remote execution in addressing security requirements. 15:56 Differences between open-source solutions and managed offerings. 19:04 Broad industry adoption and applicability of remote execution. 20:39 Future advancements in language support and multi-tenancy. Resources Mentioned: Ian Buss https://www.linkedin.com/in/ian-buss/ Piotr Chomiak https://www.linkedin.com/in/piotr-chomiak-b1955624/ Astronomer | Website https://www.astronomer.io Apache Airflow https://airflow.apache.org/ Airflow Slack Community https://airflow.apache.org/community/ Beyond Analytics conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22분
  3. Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

    8월 21일

    Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille

    Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable. In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features. Key Takeaways: 00:00 Introduction. 02:13 Overview of the company’s operations and global presence. 04:00 The tech stack and structure of the data engineering team. 04:24 Running nearly 2,000 DAGs in production using Airflow. 05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot. 07:05 Details on the Kubernetes-based Airflow setup using Helm charts. 09:31 Transition from GitSync to NFS for DAG syncing due to performance issues. 14:11 Making every team member Airflow-literate through local installation. 17:56 Using custom libraries and plugins to extend Airflow functionality. Resources Mentioned: Sébastien Crocquevieille https://www.linkedin.com/in/scroc/ Numberly | LinkedIn https://www.linkedin.com/company/numberly/ Numberly | Website https://numberly.com/ Apache Airflow https://airflow.apache.org/ Grafana https://grafana.com/ Apache Kafka https://kafka.apache.org/ Helm Chart for Apache Airflow https://airflow.apache.org/docs/helm-chart/stable/index.html Kubernetes https://kubernetes.io/ GitLab https://about.gitlab.com/ KubernetesPodOperator – Airflow https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    24분
  4. How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye

    8월 14일

    How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye

    Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency. In this episode, we are joined by Adeolu Adegboye, Data Engineer at Moniepoint Group, who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and support diverse stakeholders across the business. Key Takeaways: (00:00) Introduction.  (02:48) The role of data engineering in supporting all business operations. (04:17) Leveraging workflow orchestration to manage daily processes. (05:20) Proactively monitoring for anomalies to prevent potential issues. (08:12) Simplifying complex insights for non-technical teams. (13:01) Improving efficiency through dynamic and parallel workflows. (14:19) Optimizing system performance to handle large-scale operations. (17:19) Exploring creative and innovative uses for workflow automation. Resources Mentioned: Adeolu Adegboye https://www.linkedin.com/in/adeolu-adegboye/ Moniepoint Group | LinkedIn https://www.linkedin.com/company/moniepoint-inc/ Moniepoint Group | Website https://www.moniepoint.com Apache Airflow https://airflow.apache.org/ ClickHouse https://clickhouse.com/ Grafana https://grafana.com/ Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22분
  5. Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler

    8월 7일

    Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler

    The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments. In this episode, Jens Scheffler, Test Execution Cluster Technical Architect at Bosch, shares insights on how his team’s need for large-scale, cross-environment testing influenced the development of the Edge Executor and shaped this major release. Key Takeaways: (02:39) The role of remote execution in supporting large-scale testing needs. (04:44) How community support contributed to the Edge Executor’s development. (08:41) Navigating network and infrastructure limitations within secure environments. (13:25) Transitioning from database-heavy processes to an API-driven model. (14:16) How the new task SDK in Airflow 3 improves distributed task execution. (16:54) What is required to set up and configure the Edge Executor. (19:36) Managing multiple queues to optimize tasks across different environments. (23:30) Examples of extreme distance use cases for edge execution. Resources Mentioned: Jens Scheffler https://www.linkedin.com/in/jens-scheffler/ Bosch | LinkedIn https://www.linkedin.com/company/bosch/ Bosch | Website https://www.bosch.com/ Apache Airflow https://airflow.apache.org/ Edge Executor (Edge3 Provider Package) https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html Astronomer’s Astro Executor https://www.astronomer.io/docs/astro/astro-executor/ Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    28분
  6. Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero

    7월 31일

    Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero

    Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability. In this episode, we’re joined by Cory O’Daniel, CEO and Co-Founder at Massdriver, and Jacob Ferriero, Senior Software Engineer at Astronomer, to unpack what it takes to make data platform engineering scalable, sustainable and secure. They share lessons from years of experience working with DevOps, ML teams and platform engineers and discuss how Airflow fits into the orchestration layer of today’s data stacks. Key Takeaways: (03:27) Making infrastructure accessible without deep ops knowledge. (07:23) Distinct personas and responsibilities across data teams. (09:53) Infrastructure hurdles specific to ML workloads. (11:13) Compliance and governance shaping platform design. (13:27) Tooling mismatches between teams cause friction. (15:13) Airflow’s orchestration role within broader system architecture. (22:10) Creating reusable infrastructure patterns for consistency. (24:13) Enabling secure access without slowing down development. (26:55) Opportunities to improve Airflow with event-driven and reliability tooling. Resources Mentioned: Cory O’Daniel https://www.linkedin.com/in/coryodaniel/ Massdriver | LinkedIn https://www.linkedin.com/company/massdriver/ Massdriver | Website https://www.massdriver.cloud/ Jacob Ferriero https://www.linkedin.com/in/jacob-ferriero/ Astronomer https://www.linkedin.com/company/astronomer/ Apache Airflow https://airflow.apache.org/ Prequel https://www.prequel.co/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    31분
  7. The Future of Airflow Telemetry with Bolke de Bruin

    7월 17일

    The Future of Airflow Telemetry with Bolke de Bruin

    Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust.  In this episode, we’re joined by Bolke de Bruin, Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right. Key Takeaways: (03:20) The role of foundations in establishing credibility and sustainability. (04:52) Why data collection is critical to open-source project direction. (07:24) Lessons learned from previous approaches to user data collection. (10:23) The current state of telemetry in the project. (10:53) Community trust as a prerequisite for technical implementation. (12:54) The importance of managing sensitive data within trusted ecosystems. (16:37) Ethical considerations in balancing participation and access. (18:45) Forward-looking ideas for improving workflow design and usability. Resources Mentioned: Bolke de Bruin https://www.linkedin.com/in/bolke/ Metyis | LinkedIn https://www.linkedin.com/company/metyis/ Metyis | Website http://www.metyis.com Apache Airflow https://airflow.apache.org/ Airflow Summit https://airflowsummit.org/ Airflow Dev List https://lists.apache.org/list.html?dev@airflow.apache.org https://www.astronomer.io/events/roadshow/london/     https://www.astronomer.io/events/roadshow/new-york/     https://www.astronomer.io/events/roadshow/sydney/    https://www.astronomer.io/events/roadshow/san-francisco/     https://www.astronomer.io/events/roadshow/chicago/  Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22분
  8. Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

    7월 10일

    Transforming the Airflow UI for Cloudera’s Users with Shubham Raj

    Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj, Software Engineer II at Cloudera, shares how his team built a drag-and-drop DAG editor for non-coders, contributions which helped shape the Airflow 3.0 Ul and introduced features like external XCom control and bulk APls.  Key Takeaways: (02:30) Day-to-day responsibilities building platforms that simplify orchestration. (05:27) Factors that make onboarding into large open-source projects accessible. (07:35) The value of improved user interfaces for task state visibility and control. (09:49) Enabling faster debugging by exposing internal data through APIs. (13:00) Balancing frontend design goals with backend functionality. (14:19) Creating workflow editors that lower the barrier to entry. (16:54) Supporting a variety of task types within a visual DAG builder. (19:32) Common infrastructure challenges faced by orchestration users. (20:37) Addressing dependency management across distributed environments. Resources Mentioned: Shubham Raj https://www.linkedin.com/in/shubhamrajofficial/ Cloudera | LinkedIn https://www.linkedin.com/company/cloudera/ Cloudera | Website https://www.cloudera.com/ Apache Airflow https://airflow.apache.org/ 2023 Airflow Summit https://airflowsummit.org/ https://www.astronomer.io/events/roadshow/london/   https://www.astronomer.io/events/roadshow/new-york/   https://www.astronomer.io/events/roadshow/sydney/   https://www.astronomer.io/events/roadshow/san-francisco/   https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    22분
5
최고 5점
20개의 평가

소개

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

좋아할 만한 다른 항목