18 episodi

Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.

Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems

The Data Flowcast: Mastering Airflow for Data Engineering & AI Astronomer

    • Tecnologia

Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.

Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems

    The Future of AI in Data Engineering With Astronomer’s Julian LaNeve and David Xue

    The Future of AI in Data Engineering With Astronomer’s Julian LaNeve and David Xue

    The world of data orchestration and machine learning is rapidly evolving, and tools like Apache Airflow are at the forefront of these changes. Understanding how to effectively utilize these tools can significantly enhance data processing and AI model deployment.
    This episode features Julian LaNeve, CTO at Astronomer, and David Xue, Machine Learning Engineer at Astronomer. They delve into the intricacies of data orchestration, generative AI and the practical applications of these technologies in modern data workflows.
    Key Takeaways:
    (01:51) The pressure to engage in the generative AI space.
    (02:02) Generative AI can elevate data utilization to the next level.
    (02:43) The transparency issues with commercial AI models.
    (04:27) High-quality data in model performance is crucial.
    (06:40) Running new models on smaller devices, like phones.
    (12:19) Fine-tuning LLMs to handle millions of task failures.
    (16:54) Teaching AI to understand specific logs, not general passages, is a goal.
    (21:56) Using Airflow as a general-purpose orchestration tool.
    (22:00) Airflow is adaptable for various use cases, including ETL and ML systems.


    Resources Mentioned:

    Julian LaNeve - https://www.linkedin.com/in/julianlaneve/
    Atronomer - https://www.linkedin.com/company/astronomer/
    David Xue - https://www.linkedin.com/in/david-xue-uva/
    Apache Airflow - https://airflow.apache.org/
    Meta’s Open Source Llama 3 model: https://ai.meta.com/blog/meta-llama-3/https://ai.meta.com/blog/meta-llama-3/
    Microsoft’s Phi-3 model: https://www.microsoft.com/en-us/research/publication/phi-3-technical-report-a-highly-capable-language-model-locally-on-your-phone/
    GPT-4 - https://www.openai.com/research/gpt-4




    Thanks for listening to The Data Flowcast: Mastering Airflow for Data Engineering & AI. If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.


    #ai #automation #airflow #machinelearning

    • 23 min
    The Power of Airflow in Modern Data Environments at Wynn Las Vegas with Siva Krishna Yetukuri

    The Power of Airflow in Modern Data Environments at Wynn Las Vegas with Siva Krishna Yetukuri

    Understanding the critical role of data integration and management is essential for driving business success, particularly in a dynamic environment like a luxury casino resort.

    In this episode, we sit down with Siva Krishna Yetukuri, Cloud Data Architect at Wynn Las Vegas, to explore how Airflow and other tools are transforming data workflows and customer experiences at Wynn Las Vegas.

    Key Takeaways:

    (02:00) Siva designs and builds cutting-edge data pipelines and architectures.
    (02:54) Wynn is building a data platform to drive surveys and marketing strategies.
    (05:00) Airflow is the backbone of data ingestion, curation and integration.
    (07:00) Custom operators in Airflow enhance monitoring and reporting.
    (09:00) Excitement surrounds the use of Airflow 2.9 and its new features.
    (08:32) A metadata database drives Airflow workflows and captures metrics.
    (12:31) Understanding Airflow fundamentals in layman’s terms simplifies complexity.
    (16:33) Transitioning from Control-M to Airflow eases building complex workflows.
    (24:06) ML models for volume and freshness anomalies improve data quality.
    (20:15) DAGs are often auto-generated, simplifying the process for engineers.


    Resources Mentioned:

    Apache Airflow -
    https://airflow.apache.org/
    Snowflake -
    https://www.snowflake.com/
    Databricks -
    https://databricks.com/
    Great Expectations -
    https://greatexpectations.io/


    Thanks for listening to The Data Flowcast: Mastering Airflow for Data Engineering & AI. If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.


    #ai #automation #airflow #machinelearning

    • 24 min
    Powering the Texas Rangers World Series Win With AI on Airflow with Alexander Booth

    Powering the Texas Rangers World Series Win With AI on Airflow with Alexander Booth

    The integration of data and AI in sports is transforming how teams strategize and perform. Understanding how to harness this technology is key to staying competitive in the rapidly evolving landscape of baseball.

    In this episode, we sit down with Alexander Booth, Assistant Director of Research and Development at Texas Rangers Baseball Club, to explore the intersection of big data, AI and baseball strategy.

    Key Takeaways:

    (03:00) Alexander Booth's role and responsibilities at the Texas Rangers.
    (03:33) The implementation of multiple cameras and pose tracking in stadiums.
    (06:16) The importance of Airflow in organizing data orchestrations.
    (06:22) The demand for faster data among modern baseball players.
    (11:01) The necessity of scalable solutions for handling large data sets.
    (15:00) How weather data influences game strategy.
    (15:46) The impact of advanced technology on decision-making in baseball.
    (18:00) The role of AI and machine learning in player and game analysis.
    (22:26) The use of dynamic tasks in Airflow for better data management.


    Resources Mentioned:

    Apache Airflow -
    https://airflow.apache.org/
    Statcast -
    https://www.mlb.com/statcast
    Google BigQuery -
    https://cloud.google.com/bigquery/
    Databricks -
    https://databricks.com/


    Thanks for listening to The Data Flowcast: Mastering Airflow for Data Engineering & AI. If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.


    #ai #automation #airflow #machinelearning

    • 23 min
    Expanding the Data Engineering Toolkit at Reddit

    Expanding the Data Engineering Toolkit at Reddit

    Welcome back to the Airflow Podcast.

    This week, we met up with Ben Wisegarver, a staff data scientist at Reddit who runs their data warehousing and data engineering functions.

    Reddit users generate petabytes of data every day that needs to be processed, stored, and analyzed by a wide breadth of backend services. Our conversation with Ben touches on everything from Airflow as a tool for career mobility across the data stack to scaling out a self-service data architecture across many teams.

    For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at pete@astronomer.io if you're passionate about what we're doing and think you'd be a good addition to the team.

    Mentioned Resources:

    Careers: https://careers.astronomer.io

    Guest Profile:

    Ben Wisegarver: https://www.linkedin.com/in/ben-wisegarver-54566576

    • 45 min
    GDPR, Self-Service Data, and Infrastructure Automation with Typeform

    GDPR, Self-Service Data, and Infrastructure Automation with Typeform

    Welcome back to the Airflow Podcast.

    This week, we met up with Albert Franzi and Carlos Escura from Typeform. Typeform is a tool that allows you to build beautiful interactive forms that you can use for a wide variety of use cases, including customer surveys, employee engagement, product feedback, and market research to name a few. In our conversation, we discussed Airflow as a tool for GDPR compliance, the concept of self-service data and how it allows your data operations team to function as a data platform team, and some of the more specialized infrastructure tooling that the Typeform team has built out to support their internal teams.

    For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at pete@astronomer.io if you're passionate about what we're doing and think you'd be a good addition to the team.

    Mentioned Resources:
    Dag Factory: https://github.com/ajbosco/dag-factory
    Astronomer Careers: https://careers.astronomer.io

    Guest Profiles:
    Albert Franzi: https://www.linkedin.com/in/albertfranzi/?originalSubdomain=es
    Carlos Escura: https://www.linkedin.com/in/carlosescura/en-us/

    • 31 min
    Adopting Airflow at Netlify

    Adopting Airflow at Netlify

    After a bit of a break, we're back with the third official episode bundle of The Airflow Podcast. In this batch, we'll get a little bit deeper with current Airflow users and maintainers on core fundamental concepts in data engineering, architectures for operating modern data platforms at scale, and the process of maintaining and operating Airflow, specifically as we go through the release process of Airflow 2.0.

    This week, we met up with Brian de la Motte and Florian Hines at Netlify. Netlify provides an extremely popular toolset for building and deploying JAMstack sites. They provide hosting services, CI, DNS, authentication, and managed backend tools that help users run and operate static sites at scale. The team over there recently adopted Airflow to help decouple orchestration logic from a complex collection Spark jobs and are currently in the process of expanding their Airflow footprint to accommodate a broader group of interesting use-cases.

    Disclaimer: we get a bit of a surprise about halfway through the episode when Brian tells us that they had recently signed up for Astronomer- we promise that it wasn't a planted ad :).

    Please contact pete@astronomer.io if you'd like to get in touch regarding future episodes. Hope you enjoy!

    Guest Profiles:
    Brian de la Motte: https://www.linkedin.com/in/brian-de-la-motte/
    Florian Hines: https://www.linkedin.com/in/florianhines/

    • 28 min

Top podcast nella categoria Tecnologia

Il Caffettino - Un espresso di innovazione
OnePodcast
Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Digitalia
Franco Solerio
Geni invisibili
Corriere della Sera
Il Disinformatico
RSI - Radiotelevisione svizzera