Data Skeptic

Kyle Polich

4.4 (479)
TECHNOLOGY
UPDATED BIWEEKLY

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

MAR 10

Disentanglement and Interpretability in Recommender Systems

Ervin Dervishaj, a PhD student at the University of Copenhagen, discusses his research on disentangled representation learning in recommender systems, finding that while disentanglement strongly correlates with interpretability, it doesn't consistently improve recommendation performance. The conversation explores how disentanglement acts as a regularizer that can enhance user trust and interpretability at the potential cost of some accuracy, and touches on the future of large language models in denoising user interaction data.

31 min
FEB 27

Collective Altruism in Recommender Systems

Ekaterina (Kat) Fedorova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

55 min
FEB 18

Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

34 min
FEB 2

Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested whether users could distinguish between real AI-generated explanations and randomly generated ones, revealing that participants often used explanations as information sources rather than decision-making tools. The discussion delves into the technical architecture behind these systems, including the use of knowledge graphs built from tabular data, inference rules, and large language models to generate human-friendly explanations. Roan explains how his research aims to open the black box of recommender systems, making them more transparent and trustworthy for non-technical users. Looking forward, he discusses ongoing work on automated knowledge graph construction from resumes and job listings, research into fairness considerations around gender and location, and plans for real-world testing with actual job seekers. The episode concludes with Roan's vision for the future: AI systems that support rather than replace human recruiters, making the job search process less grueling while maintaining the essential human judgment that recruitment requires.

27 min
JAN 26

Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized interests who generate valuable data that can benefit the entire platform. We discuss his paper "When Collaborative Filtering Is Not Collaborative," which reveals how PCA can over-specialize on popular content while neglecting both niche items and even failing to properly recommend popular artists to new potential fans. David presents solutions through item-weighted PCA and thoughtful data upweighting strategies that can improve both fairness and performance simultaneously, challenging the common assumption that these goals must be in tension. The conversation spans from theoretical insights to practical applications at companies like Meta, offering a comprehensive look at the future of personalized recommendations.

50 min
12/26/2025

Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial role of homepage curation, and how human curators help by contextualizing content, cleaning data, and identifying positive feedback loops that algorithms might miss. The conversation covers practical challenges like measuring "surprise and delight," the content deluge created by democratized creation tools, and why trust in tech companies is essential for better personalization. Cory emphasizes that discovery is "a good type of friction" and explains how the CODE framework (Capture, Organize, Distill, Express, plus Analysis) guides professional curation work. Looking to the future, they discuss the need for systems thinking that creates narrative connections between content, the potential for conversational AI to help users articulate preferences, and why diverse perspectives beyond engineering are crucial for building effective discovery systems. Resources mentioned include the newsletter "Top Information Retrieval Papers of the Week" and Notebook LM for synthesizing research.

38 min
12/18/2025

Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrates how eye tracking can uncover insights about positional bias and user engagement that traditional click data misses. Beyond the technical aspects, Santiago addresses the ethical considerations surrounding eye tracking data, particularly concerning pupil data and privacy. He emphasizes the importance of questioning assumptions in recommender systems and shares practical advice for improving recommendation algorithms by understanding actual user behavior rather than relying solely on click patterns. Looking forward, Santiago discusses exciting future directions including simulating user behavior using eye tracking data, addressing the cold start problem, and translating these findings to e-commerce applications. This conversation challenges researchers and practitioners to think more deeply about de-biasing clicks and leveraging eye tracking as a powerful tool to enhance user experience in recommendation systems.

52 min
12/08/2025

Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.

40 min

See All (598)

Kyle Polich

Host

4.4

out of 5

479 Ratings

LOVE THE SHOW

Feb 12

Cdascientist

I just absolutely love the show and I’m just wondering if maybe you can cover as a topic sub polynomial compute for graph networks?
Great resource

07/01/2023

calzone.onsets

A colleague introduced me to Data Skeptic last year and I’ve been enjoying the episodes. Kyle’s good at covering topics from many levels of data science understanding—his mini series with a non-data scientist are a great way to learn the basics!
Data science

12/09/2022

joey...1989

Lots of interviews
Amazing

09/27/2022

Steveo CO

I have been following for a few months now. If you’re looking for a wide perspective on big data and AI, this is the place. Kyle has a wide knowledge base, he’s not focused on a simple thing, rather, he explores multiple models, to help with research. His guests are amazing. I can’t wait to be invited to appear on the show to discuss standard operations for an average guy doing signs and lighting.

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Creator

Kyle Polich
Years Active

2014 - 2026
Episodes

598
Rating

Clean
Show Website

Data Skeptic

Technology

Technology

Updated Weekly
Science

Science

Updated Biweekly
Technology

Technology

Updated Weekly
Technology

Technology

Updated Daily
Technology

Technology

Updated Semiweekly
Management

Management

Updated Semiweekly
Technology

Technology

Updated Weekly

Data Skeptic

Disentanglement and Interpretability in Recommender Systems

Collective Altruism in Recommender Systems

Niche vs Mainstream

Healthy Friction in Job Recommender Systems

Fairness in PCA-Based Recommenders

Video Recommendations in Industry

Eye Tracking in Recommender Systems

Cracking the Cold Start Problem

Hosts & Guests

Kyle Polich

LOVE THE SHOW

Great resource

Data science

Amazing

About

Information

You Might Also Like

Data Skeptic

Episodes

Disentanglement and Interpretability in Recommender Systems

Collective Altruism in Recommender Systems

Niche vs Mainstream

Healthy Friction in Job Recommender Systems

Fairness in PCA-Based Recommenders

Video Recommendations in Industry

Eye Tracking in Recommender Systems

Cracking the Cold Start Problem

Hosts & Guests

Ratings & Reviews

About

Information

You Might Also Like