26 episodes

Data Journeys is a podcast for aspiring Data Scientists by AJ Goldstein, where he interviews world-class Data Scientists about their learning journeys. The focus is on how they’ve bridged the gap between acquiring technical skills and creating real-world impact. In each episode, the goal is to equip up-and-comers with the strategies, tactics, and tools that the best in the world have used to get to where they are today.

Data Journeys AJ Goldstein

    • Technology
    • 5.0 • 13 Ratings

Data Journeys is a podcast for aspiring Data Scientists by AJ Goldstein, where he interviews world-class Data Scientists about their learning journeys. The focus is on how they’ve bridged the gap between acquiring technical skills and creating real-world impact. In each episode, the goal is to equip up-and-comers with the strategies, tactics, and tools that the best in the world have used to get to where they are today.

    #25: Laura Noren: The Ethics of Data Science

    #25: Laura Noren: The Ethics of Data Science

    Laura Noren is a data science ethicist and researcher currently working in cybersecurity at Obsidian Security in Newport Beach. She holds undergraduate degrees from MIT, a PhD from NYU where she recently completed a postdoc in the Center for Data Science. Her work has been covered in The New York Times, Canada's Globe and Mail, American Public Media's Marketplace program, in numerous academic journals and international conferences. Dr. Norén is a champion of open source software and those who write it.
     
    Enjoy the show!
     
    Show Notes:
     
    [3:55] Laura explains how she produces the Data Science Community Newsletter, covering things like how the department of defense just got billions in funding to do AI research. How do you incorporate humor into such rigorous coverage? [10:22] How can you distinguish signal from noise in choosing a news source? [12:13] When and how to control your biases in your work when in the heat of the moment. [14:05] Laura’s interests in data science began as an undergraduate at MIT, surrounded by people who build. [16:10] Sociology in the context of people who build, since people are the *actual* most complicated systems. [18:00] What important things defines a profession? [19:30] What’s the difference between ethics and morals? [22:04] How ethics affects the field of data science, specifically. [25:35] The data science ethicist as person who is a creator, and not just there to put up stop signs. [31:40] How can companies strike a balance between hard stops in a product and more negotiated unique messaging for customers to address ethical employees? [38:53] How can smaller companies who can’t afford a Chief Ethics Officer monitor and address ethical issues? [48:30] Techniques that can be used by individuals and organizations to identify and address ethical issues in a company. [50:00] How data scientists can navigate non-black and white ethical issues in their own work. [55:15] Laura’s recommendations for ethics 101: Data and Society, AI Now Institute, and Open AI. [1:00:00] Laura ends off with a call-to-action to start conversations on ethics with your colleagues.  
    If you enjoyed this episode of Data Journeys, the best way to support the show is by leaving a review on iTunes and sharing on your social medias using the hashtag #datajourneys.
     
    Laura’s Twitter: https://twitter.com/digitalflaneuse?lang=en

    • 55 min
    #24: Brian McFee: Music and Data Science

    #24: Brian McFee: Music and Data Science

    Dr. Brian McFee develops machine learning tools to analyze multimedia data. This includes recommender systems, image and audio analysis, similarity learning, cross-modal feature integration, and automatic annotation. As of Fall, 2014, he is a data science fellow at the Center for Data Science at New York University. Previously, he was a postdoctoral research scholar in the Center for Jazz Studies and LabROSA at Columbia University.
     
    My conversation with Brian today was focused on discussing his research in music informatics and its many facets and applications. He tells about some of the methods he used during his dissertation, and I ask him for insight on how to get a recommender system to recommend stuff that you actually like.
     
    Here are some of the highlights of the show:
     
    [3:17] What came first for Brian, the data science or the music? [5:19] Of all the things he could have chose to study, why did Brian choose music? [7:35] What is it like to be in a branch of data science that has become so closely tied with industry and well understood by the public? [9:37] How has Brian’s work expanded his own taste in music, and given him an appreciation of jazz? [12:00] Brian gives a brief history of the field of music informatics. [14:48] Where was the field when Brian wrote his dissertation, “More like this: Machine learning approaches to music similarity?“ [17:15] How have the characteristics and features for making predictions about music evolved since then? [21:06] Why does the concept of genre generally irritate Brian, and what is the “David Bowie problem?” [26:20] How do you address the problem of subjectivity in the field when conducting research? [31:21] Is there a dilemma in trying to take a subjective art like music and trying to quantify it as a science? [35:24] How can a recommender system actually accurately predict what kind of music a listener is looking for? [38:43] What can you do to train your Spotify recommendations? [42:33] How do you make the career decision whether to stay in academia vs. go into industry? [46:31] What kind of problems is Brian currently interested in solving? [49:00] What major life lessons can be taken away from work in machine learning? [50:00] Rapid fire questions.  
    Enjoy the show!
     
    Find more at www.ajgoldstein.com/podcast/ep24

    • 59 min
    #23: Wes McKinney - The Creator of Pandas

    #23: Wes McKinney - The Creator of Pandas

    Wes McKinney is the creator and "Benevolent Dictator for Life" (BDFL) of the open-source pandas package for data analysis in Python, and has also authored two versions of the reference book Python for Data Analysis. Wes is also one of the co-creators of the Apache Arrow project, which is currently his main focus. Most recently, he is the founder Ursa Labs, a not-for-profit open source development group in partnership with RStudio.
     
    He describes himself as a problem-solver, and is particularly interested in improving the usability of data tools for programmers, accelerating data access and in-memory data processing performance, and improving data system interoperability.
     
    In my conversation with Wes today, we focused on getting to know Wes on a more personal level, discussing his background and interests to get some insight into the living legend of open source he has become.
     
    [3:48] How did coming from four generations of newspaperman impact Wes’s upbringing? [6:00] What kind of hobbies was he interested in growing up, and what is the origin of his interest in computers? [11:08] How did he come to run a Goldeneye 007 world record website, and update and maintain it by hand? [16:10] Wes’s high school career as a mathlete, and how an early interest in math contributed to his approach to programming. [18:15] How wes brings the rigor he learned in mathematics to software engineering. [19:50] How languages and math scratch the same itch for composition. [21:00] About learning enough German to complete a PhP programming internship in Munich. [23:00] How Wes’s experience using data in his first year working post-undergrad set him down the path to Pandas. [25:00] What went into his decision to take leave from grad school to build Pandas? [27:00] The legendary tweet where Wes expressed his sense of purpose and motivation in building Pandas. [29:52] Why Wes’s work is motivated by the desire to free up people’s time to realize their full potential. [30:51] Zero to One - Peter Thiel [31:40] Why is solving basic efficiency problems, like reading CSV files. so important? [34:12] How community management has played such a huge role in making Pandas so successful compared to other tools. [39:00] The importance of seeing peers in an open source project as people with good intentions and more than just a GitHub profile. [46:00] How do the incentives of an open source project influence prioritization in a project? [51:45] How Wes’s newest project, UrsaLabs, is tackling the problem of funding in open source software development. [56:20] Wes’s goals for UrsaLabs over the next five years.  
    AJ’s Twitter: https://twitter.com/ajgoldstein393
    Wes’s Twitter:https://twitter.com/wesmckinn
    Wes’s personal website: http://wesmckinney.com
    Wes’s LinkedIn: https://www.linkedin.com/in/wesmckinn/

    • 1 hr
    #22: Mike Tamir: Identifying Fake News with the Head of Data Science at Uber ATG

    #22: Mike Tamir: Identifying Fake News with the Head of Data Science at Uber ATG

    Mike Tamir is the Head of Data Science at Uber ATG. He is a leader in data science, specializing in deep learning and distributed scalable machine learning, and he’s also a faculty member at UC Berkeley.
     
    Mike has led several teams of Data Scientists in the San Francisco Bay Area as Chief Data Scientist for InterTrust and Formation, Director of Data Sciences for MetaScale, and Chief Science Officer for Galvanize, where he oversaw all data science product development. He also created an MS degree program in Data Science in partnership with UNH.
     
    Mike began his career in academia serving as a mathematics teaching fellow for Columbia University and graduate student at the University of Pittsburgh. His early research focused on developing the epsilon-anchor methodology for resolving both an inconsistency he highlighted in the dynamics of Einstein’s general relativity theory and the convergence of “large N” Monte Carlo simulations in Statistical Mechanics’ universality models of criticality phenomena.
     
    The focus of today’s conversation was on his fake news detection AI project called Faker Fact.
     
    Show notes:
     
    0:00 First, a life update from AJ. Read about his new opportunity in Portland here on his blog. 5:28 What is the evolutionary explanation for why a human’s capacity for careful, rational thought often takes a back seat to emotion? Explained in a comic on the project website. 6:17 Emotions often win over rational though, but as a result, it can be difficult to think clearly on issues we’re passionate about. 7:05 Why people should be aware of their emotional biases, even though it’s not our fault that we have them. 7:50 Why Facebook deleted over a billion fake accounts recently, and why fake accounts, clickbait, blatantly false content, and other forms of fake news are everywhere on social media. 9:10 What mechanisms can we put in place to counterbalance the parts of our nature that compel us to create and engage with content on an emotional level? 9:51 Since a majority of our information is second-hand, how do we distingush what’s really true? 11:44 How did Mike become motivated to pursue this problem, on top of his full time job at Uber ATG? 12:45 How can we tackle “fake news” without censorship? 16:40 Post-Walter Cronkite era, how do we create a sense of credibility and neutrality in our information? 21:00 Why would it be a mistake if the algorithm learned to only classify right or left wing content as fake news? 22:19 The algorithm only looks at the title and words on a page, not the url. 23:15 How Walt (the FakerFact AI) classifies different types of content. Satire, journalism, etc. 26:46 How do you strike the balance of entertainment and informativeness in content? 31:10 What features and characteristics defines each different category of content that Walt identifies? 36:16 What is Walt’s ideal use case? 36:55 You can use the FakerFact Chrome extension to view the “nutrition facts” of the page you’re reading. 37:42 How does research on run-on sentences and other grammatical choices help Walt understand and score an article? 40:34 What techniques were used to train the Walt AI? 42:41 A discussion on the use of wisdom of the crowds in algorithms. 45:30 What makes it difficult to use the wisdom of the crowds when answers are too closely correlated (because of political affiliations or the news cycle?) 46:47 Visit Humanetech.com for tips on regulating your daily notifications and escaping the “24-hour news cycle” to prevent media from controlling your emotions. 50:15 Rapid fire questions! 52:27 Mike’s advice to his 20 year old self. 52:40 What was his best investment in himself? 53:18 The Deep Learning Book a starting point for basic literacy in data science. 53:20 Mike, like lots of guests on this show, makes a distinction between things he believes but couldn’t

    • 55 min
    #21: Frank Diana: The Future of AI - Predicting, Preparing, & Thriving in Our Changing Future

    #21: Frank Diana: The Future of AI - Predicting, Preparing, & Thriving in Our Changing Future

    Frank Diana is a recognized futurist, thought leader and frequent keynote speaker. He has served in various executive roles throughout his career and has over 30 years of leadership experience. He is focused on leadership dialog in the context of our emerging future and its implications. He blends a futurist perspective with a pragmatic, actionable approach, leveraging horizon scanning and storytelling to see possible futures and drive foresight into leadership deliberation.

    • 1 hr 7 min
    #20: Kyle Polich: Skepticism and Simplifying Complex Topics with the Host of the Data Skeptic Podcast

    #20: Kyle Polich: Skepticism and Simplifying Complex Topics with the Host of the Data Skeptic Podcast

    Kyle Polich is the co-host of the incredibly popular Data Skeptic podcast. His general interests range from areas like statistics, machine learning, data viz, and optimization to data provenance, data governance, econometrics, and metrology.

    In this episode of the Data Journeys Podcast, I pick Kyle’s brain for patterns noticed and lessons learned through interviewing and teaching his way through nearly 400 episodes of the Data Skeptic Podcast.

    • 1 hr 6 min

Customer Reviews

5.0 out of 5
13 Ratings

13 Ratings

tylerstephen1234 ,

At least 1 awesome takeaway every episode!

This podcast is great for a high-level perspective of data science as a career. I’m a self-taught data scientist (relatively new to the field), and get at least 1 super helpful takeaway from every episode. I consider this a major win, and really enjoy the podcast!

NateCBe ,

Great listen

As a physician, And someone whose primary orientation is not engineering/data science I find the language and presentation of AJs podcast useful and understandable. I have been so inspired that I’m going back for some formal training in data sciences.

Top Podcasts In Technology

Listeners Also Subscribed To