250 episodes

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

Linear Digressions Udacity

    • Technology

In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications.

    Running experiments when there are network effects

    Running experiments when there are network effects

    Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment unit value assumption, is a big phrase for this assumption and violations of SUTVA make for some pretty interesting experiment designs. From news feeds in LinkedIn to disentangling herd immunity from individual immunity in vaccine studies, indirect (i.e. network) effects in experiments can be just as big as, or even bigger than, direct (i.e. individual effects). And this is what we talk about this week on the podcast.

    Relevant links:
    http://hanj.cs.illinois.edu/pdf/www15_hgui.pdf
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2600548/pdf/nihms-73860.pdf

    • 24 min
    Zeroing in on what makes adversarial examples possible

    Zeroing in on what makes adversarial examples possible

    Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms so much as the features in the datasets that holds the clue. This week’s episode goes through several papers pushing our collective understanding of adversarial examples, and giving us clues to what makes these counterintuitive cases possible.

    Relevant links:
    https://arxiv.org/pdf/1905.02175.pdf
    https://arxiv.org/pdf/1805.12152.pdf
    https://distill.pub/2019/advex-bugs-discussion/
    https://arxiv.org/pdf/1911.02508.pdf

    • 22 min
    Unsupervised Dimensionality Reduction: UMAP vs t-SNE

    Unsupervised Dimensionality Reduction: UMAP vs t-SNE

    Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with information theory, then gets into how UMAP is different (many say better).

    Between the time we recorded and released this episode, an interesting argument made the rounds on the internet that UMAP’s advantages largely stem from good initialization, not from advantages inherent in the algorithm. We don’t cover that argument here obviously, because it wasn’t out there when we were recording, but you can find a link to the paper below.

    Relevant links:
    https://pair-code.github.io/understanding-umap/
    https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1

    • 29 min
    Data scientists: beware of simple metrics

    Data scientists: beware of simple metrics

    Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think about what they tell you, or don’t, about how well you’re really solving the underlying problem. This episode takes a step back and says, what are some metrics that are popular with data scientists, why are they popular, and what are their shortcomings when it comes to the real world? There’s been a lot of great thinking and writing recently on this topic, and we cover a lot of that discussion along with some perspective of our own.

    Relevant links:
    https://www.fast.ai/2019/09/24/metrics/
    https://arxiv.org/abs/1909.12475
    https://medium.com/shoprunner/evaluating-classification-models-1-ff0730801f17
    https://hbr.org/2019/09/dont-let-metrics-undermine-your-business

    • 24 min
    Communicating data science, from academia to industry

    Communicating data science, from academia to industry

    For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, who is leading an effort to start an open-access Data Science Review journal in the model of the Harvard Business Review or Law Review. This episode features Xiao-Li talking about the need he sees for a central gathering place for data scientists in academia, industry, and government to come together to learn from (and teach!) each other.

    Relevant links:
    https://hdsr.mitpress.mit.edu/

    • 26 min
    Optimizing for the short-term vs. the long-term

    Optimizing for the short-term vs. the long-term

    When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sales this week, but doing that repeatedly will teach customers to wait until there’s a sale and never buy anything at full price, which could ultimately drive down revenue in the long term. Increasing the volume of ads on a website might lead people to click on more ads in the short term, but in the long term they’ll be more likely to visually block the ads out and learn to ignore them. But these long-term effects aren’t apparent from the short-term experiment, so this week we’re talking about a paper from Google research that confronts the short-term vs. long-term tradeoff, and how to measure long-term effects from short-term experiments.

    Relevant links:
    https://research.google/pubs/pub43887/

    • 19 min

Top Podcasts In Technology

Listeners Also Subscribed To