The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

  1. Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

    HACE 2 D

    Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

    The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration. Key Takeaways: 00:00 Introduction. 05:28 Challenges of decentralization. 06:45 YAML-based generator standardizes pipelines and dependencies. 12:28 Declarative assets and sensors align cross-DAG dependencies. 17:29 Task-level callbacks enable auto-recovery and clear ownership. 21:39 Standardized building blocks simplify upgrades and maintenance. 24:52 Platform focus frees domain work. 26:49 Container-only standardization prevents sprawl. Resources Mentioned: Oscar Ligthart https://www.linkedin.com/in/oscar-ligthart/ Rodrigo Loredo https://www.linkedin.com/in/rodrigo-loredo-410a16134/ Vinted | LinkedIn https://www.linkedin.com/company/vinted/ Vinted | Website https://www.vinted.com/?srsltid=AfmBOor87MGR_eLOauCO93V9A-aLDaAhGYx9cnu_oN8s1SAXMlCRuhW7 Apache Airflow https://airflow.apache.org/ Kubernetes https://kubernetes.io/ dbt https://www.getdbt.com/ Google Cloud Vertex AI https://cloud.google.com/vertex-ai Airflow Datasets & Assets (concepts) https://www.astronomer.io/docs/learn/airflow-datasets Airflow Summit https://airflowsummit.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    30 min
  2. Transforming Data Pipelines at XENA Intelligence with Naseem Shah

    16 OCT

    Transforming Data Pipelines at XENA Intelligence with Naseem Shah

    The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed Amazon review analysis with LLMs. Key Takeaways: 00:00 Introduction. 03:28 The importance of building initial products that support growth and investment. 06:16 The process of adopting new tools to improve reliability and efficiency. 09:29 Approaches to learning complex technologies through practice and fundamentals. 13:57 Trade-offs small teams face when balancing performance and costs. 18:40 Using AI-driven approaches to generate insights from large datasets. 22:38 How unstructured data can be transformed into actionable information. 25:55 Moving from manual tasks to fully automated workflows. 28:05 Orchestration as a foundation for scaling advanced use cases. Resources Mentioned: Naseem Shah https://www.linkedin.com/in/naseemshah/ Xena Intelligence | LinkedIn https://www.linkedin.com/company/xena-intelligence/ Xena Intelligence | Website https://xenaintelligence.com/ Apache Airflow https://airflow.apache.org/ Google Cloud Composer https://cloud.google.com/composer Techstars https://www.techstars.com/ Docker https://www.docker.com/ AWS SQS https://aws.amazon.com/sqs/ PostgreSQL https://www.postgresql.org/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    29 min
  3. Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith

    9 OCT

    Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith

    Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed. In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Solutions Architect at Wherobots, join us to discuss leveraging Apache Airflow and Apache Sedona to process massive geospatial datasets, build reproducible pipelines and orchestrate complex workflows across platforms. Key Takeaways: 00:00 Introduction. 03:22 How merging multiple data sources supports comprehensive datasets. 04:20 The value of flexible configurations for running pipelines on different platforms. 06:35 Why orchestration tools are essential for handling continuous data streams. 09:45 The importance of observability for monitoring progress and troubleshooting issues. 11:30 Strategies for processing large, complex datasets efficiently. 13:27 Expanding orchestration beyond core pipelines to automate frequent tasks. 17:02 Advantages of using open-source operators to simplify integration and deployment. 20:32 Desired improvements in orchestration tools for usability and workflow management. Resources Mentioned: Alex Iannicelli https://www.linkedin.com/in/atiannicelli/ Overture Maps Foundation | LinkedIn https://www.linkedin.com/company/overture-maps-foundation/ Overture Maps Foundation | Website https://overturemaps.org Daniel Smith https://www.linkedin.com/in/daniel-smith-analyst/ Wherobots | LinkedIn https://www.linkedin.com/company/wherobots Wherobots | Website https://www.wherobots.com Apache Airflow https://airflow.apache.org/ Apache Sedona https://sedona.apache.org/ Github repo https://github.com/wherobots/airflow-providers-wherobots Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    24 min
  4. Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya

    2 OCT

    Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya

    PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible. In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team manages Airflow at scale while ensuring security, performance and cost efficiency. Key Takeaways: 00:00 Introduction. 02:31 Enabling developer delight by extending platform capabilities. 03:56 Role of Snowflake, dbt and Airflow in PepsiCo’s data stack. 06:10 Local developer environments built using official Airflow Helm charts. 07:13 Pre-staging and PR environments as testing playgrounds. 08:08 Automating labeling and resource allocation via DAG factories. 12:16 Cost optimization through pod labeling and Datadog insights. 14:01 Isolating dbt engines to improve performance across teams. 16:12 Wishlist for Airflow 3: Improved role-based grants and database modeling. Resources Mentioned: Kunal Bhattacharya https://www.linkedin.com/in/kunaljubce/ PepsiCo | LinkedIn https://www.linkedin.com/company/pepsico/ PepsiCo | Website https://www.pepsico.com Apache Airflow https://airflow.apache.org/ Snowflake https://www.snowflake.com dbt https://www.getdbt.com Kubernetes https://kubernetes.io Great Expectations https://greatexpectations.io Monte Carlo https://www.montecarlodata.com Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    19 min
  5. Building a Unified Data Platform at Pattern with William Graham

    25 SEPT

    Building a Unified Data Platform at Pattern with William Graham

    The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines. In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open-source tool Heimdall to streamline scheduling, orchestration and access management. Key Takeaways: 00:00 Introduction. 02:44 Structure of Pattern’s data teams across acquisition, engineering and platform. 04:27 How Airflow became the central scheduler for batch jobs. 08:57 Credential management challenges that led to decoupling scheduling and orchestration. 12:21 Heimdall simplifies multi-application access through a unified interface. 13:15 Standardized operators in Airflow using Heimdall integration. 17:13 Open-source contributions and early adoption of Heimdall within Pattern. 21:01 Community support for Airflow and satisfaction with scheduling flexibility. Resources Mentioned: William Graham https://www.linkedin.com/in/willgraham2/ Pattern | LinkedIn https://www.linkedin.com/company/pattern-hq/ Pattern | Website https://pattern.com Apache Airflow https://airflow.apache.org Heimdall on GitHub https://github.com/Rev4N1/Heimdall Netflix Genie https://netflix.github.io/genie/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    24 min
  6. How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty

    18 SEPT

    How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty

    The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations. In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey into data engineering and the lessons learned from leading Astronomer’s Customer Reliability Engineering (CRE) team. Key Takeaways: 00:00 Introduction. 03:07 Lessons learned in adapting to major platform transitions. 05:18 How proactive monitoring improves reliability and customer experience. 08:10 Using automation to enhance internal support processes. 12:09 Why keeping systems current helps avoid unnecessary issues. 15:14 Approaches that strengthen system reliability and efficiency. 18:46 Best practices for simplifying complex orchestration dependencies. 23:24 Anticipated innovations that expand orchestration capabilities. Resources Mentioned: Collin McNulty https://www.linkedin.com/in/collin-mcnulty/ Astronomer | LinkedIn https://www.linkedin.com/company/astronomer/ Astronomer | Website https://www.astronomer.io Apache Airflow https://airflow.apache.org/ Prometheus https://prometheus.io/ Splunk https://www.splunk.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    26 min
  7. Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov

    11 SEPT

    Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov

    The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics. In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building and scaling a centralized data platform with Airflow and Astronomer. Key Takeaways: 00:00 Introduction. 02:49 Building a centralized data platform for 15 European countries. 05:19 Adopting SaaS to manage Airflow from day one. 07:01 Leveraging Airflow for data orchestration across products. 08:16 Teaching non-Python users how to work with Airflow is challenging. 12:25 Creating a global data community across Europe, the US and Japan. 14:04 Monthly calls help share knowledge and align regional teams. 15:47 Contributing to the open-source Airflow project as a way to deepen expertise. 16:32 Desire for more guidelines, debugging tutorials and testing best practices in Airflow. Resources Mentioned:  Evgenii Prusov https://www.linkedin.com/in/prusov/ Daiichi Sankyo Europe GmbH | LinkedIn https://www.linkedin.com/company/daiichi-sankyo-europe-gmbh/ Daiichi Sankyo Europe GmbH | Website https://www.daiichi-sankyo.eu Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ Snowflake https://www.snowflake.com/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    19 min
  8. Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

    4 SEPT

    Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah

    StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the company leverages Airflow, dbt, and Cosmos to drive marketplace intelligence, improve client connections and deliver measurable growth for professionals. Key Takeaways: 00:00 Introduction. 05:44 The role of the data engineering team in driving business success. 08:52 Leveraging technology for real-time business intelligence. 10:52 Data-driven strategies for improving marketing outcomes. 13:05 How adopting the right tools can increase revenue growth. 14:25 Advantages of simplifying and integrating technical workflows. 18:45 Benefits of multi-environment configurations for development and production. 20:17 Foundational skills and best practices for learning Airflow effectively. 22:33 Opportunities for deeper tool integration and improved data visualization. Resources Mentioned: Paschal Onuorah https://www.linkedin.com/in/onuorah-paschal/ StyleSeat | LinkedIn https://www.linkedin.com/company/styleseat/ StyleSeat | Website https://www.styleseat.com Apache Airflow https://airflow.apache.org/ dbt https://www.getdbt.com/ Astronomer Cosmos https://www.astronomer.io/cosmos/ Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning

    23 min

Información

Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/

Quizá también te guste