27 episodes

This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field.

Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

The Data Life Podcast Sanket Gupta

    • Technology
    • 5.0 • 5 Ratings

This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field.

Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

    27: Building Open Source Data Startup with Airbyte CEO, Michel Tricot

    We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉

    We cover lots of things in the podcast including: 

    1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc. 

    2. Data Warehouses being a canonical source of data and how Airbyte helps with bringing the data into the warehouse. 

    3. How Airbyte works as an open source data tool. 

    4. Life aspects of running a fast growing start-up including raising capital, hiring etc. 



    Links to the tools/ services mentioned: 

    1. Airbyte: airbyte.io

    2. Airbyte Slack where you can talk with the team: slack.airbyte.io 

    3. Dbt for transformation in ELT: getdbt.com 

    4. Airflow which is a data orchestration tool: https://airflow.apache.org/

    5. Astronomer which can host Airflow: https://astronomer.io/ 



    Pay as you use data warehouses: 

    6. Snowflake Data Warehouse: https://www.snowflake.com/

    7. BigQuery Data Warehouse: https://cloud.google.com/bigquery 

    Set up your own infrastructure: 

    8. Redshift Data Warehouse: https://aws.amazon.com/redshift/ 


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 44 min
    26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

    26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

    Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines. 

    Using an example of running a necklace business from shells - we learn about the following data engineering concepts: 

    1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics. 

    2. Spark for large data processing and hosting / running

    3. Data orchestration using Airflow



    My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb 

    Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20 


    Tools covered in the episode: 

    dbt: https://www.getdbt.com/ 

    Databricks: https://databricks.com/

    EMR: https://aws.amazon.com/emr/

    AWS Redshift: https://aws.amazon.com/redshift/

    Snowflake: https://www.snowflake.com/

    Delta Lake: https://databricks.com/product/delta-lake-on-databricks 


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 39 min
    25: Talking Data Privacy with Jeff Bermant

    25: Talking Data Privacy with Jeff Bermant

    In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨ 

    In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more! 

    Go to https://getcocoon.com to download and use Cocoon Rewards Browser. 

    ~Thanks for listening~


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 28 min
    24: Promoting Women in Tech - With Rupal Gupta

    24: Promoting Women in Tech - With Rupal Gupta

    In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech! 

    In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the interviews etc. 

    If you want to reach out to Rupal for any help or to collaborate with her project womenmentors.co, here is her LinkedIn: https://www.linkedin.com/in/rupalgupta15/ 

    FREE Women in Tech Conference by Manning Publications on Oct 13th at 12pm ET on Twitch: https://freecontent.manning.com/livemanning-conferences-women-in-tech/ 🎉 There will be women in tech speakers from Dropbox, Microsoft, Warby Parker and more.

    🌟 Programs and conferences covered in the episode:
    OMSCS program at Georgia Tech: https://omscs.gatech.edu/
    Grace Hopper conference: https://ghc.anitab.org/
    Anita Borg Institute: https://anitab.org/

    🌟 Interviewing resources:
    1. Pramp: https://www.pramp.com/#/
    2. Interviewing.io: https://interviewing.io/
    3. Educative "Grokking the System Design Interview": https://www.educative.io/courses/grokking-the-system-design-interview
    4. AWS Certifications: https://aws.amazon.com/certification/

    Disclaimer: All opinions on this podcast are our own and not the views of our employers or organizations.

    ~Thanks for listening~


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 15 min
    23: Let’s Talk AWS SageMaker for ML Model Deployment

    23: Let’s Talk AWS SageMaker for ML Model Deployment

    In this episode, we talk about Amazon SageMaker and how it can help with ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas. 
    We cover points such as:
    1. Host ML endpoints for deploying models to thousands or millions of users.
    2. Saving costs for model training using SageMaker.
    3. Use CloudWatch logs with SageMaker endpoints to debug ML models. 
    4. Use preconfigured environments or models provided by AWS.
    5. Automatically save model artifacts in AWS S3 as you train in SageMaker. 
    6. Use of version control for SageMaker notebooks with Github.
    and more… 
    Please rate, subscribe and share this episode with anyone who might find SageMaker useful in their work. I feel that SageMaker is a great tool and want to share about it with data scientists. 
    For comments/feedback/questions or if you think I have missed something in the episode, please reach out to me at LinkedIn: https://www.linkedin.com/in/sanketgupta107/

    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 19 min
    22: Transfer Learning for NLP - With Paul Azunre

    22: Transfer Learning for NLP - With Paul Azunre

    In this episode, we are talking with Paul Azunre. Paul is one of the world’s experts in the area of Transfer Learning for NLP and is also an author of the upcoming book Transfer Learning for NLP published by Manning Publications. In this episode we talk about things such as: 

    1) Paul’s background and how his background in maths and optimization as well as fake news detection got him started in transfer learning in NLP.
    2) How Paul got started with the book, book writing process as well as tips to the listeners for writing a technical book.
    3) High level summary of transfer learning in both computer vision and NLP and why this is the ImageNet moment of NLP.
    4) Why ML and NLP practitioners today should be excited about transfer learning (such as how students in Ghana are able to build their own Google Translate using transfer learning)
    5) How BERT, ELMo and ALBERT work at the high level and how they differ from traditional techniques like Word2Vec or FastText.
    6) Differences between BERT, ELMo and ALBERT.
    7) What makes Paul’s new book a must-read for anyone interested in this field. 

    ✨Paul's Info👇

    Paul’s Website: azunre.com (with all social media handles)
    Please reach out to Paul if you have any questions about transfer learning in NLP or the book.

    ✨Chance for one of 2 free copies of Transfer Learning for NLP 🎉

    Get a chance to win the free copy of Paul's book! Please share this episode on Twitter and add my Twitter handle "sanket107" to it, you will get a chance to win one of 2 free books. My Twitter: https://twitter.com/sanket107

    ✨Discount Code for all Manning Publications books! 🎊🤩

    Special Link to get extra discount for Paul’s book:
    https://www.manning.com/books/transfer-learning-for-natural-language-processing?a_aid=Omnilence&a_bid=d53fed17
    As The Data Life Podcast listeners, you can also go to this link http://www.manning.com/?a_aid=Omnilence to get any Manning book with 40% discount with the code: poddlife20

    This will help support this show as well and is much appreciated.

    Thank you Manning Publications and Paul as well as sponsors to make this show a reality. 

    ~Thanks for listening~


















    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message
    Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support

    • 46 min

Customer Reviews

5.0 out of 5
5 Ratings

5 Ratings

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Acquired
Ben Gilbert and David Rosenthal
TED Radio Hour
NPR
Dwarkesh Podcast
Dwarkesh Patel
Hard Fork
The New York Times