53 min

Large-Scale Entity Resolution - Sonal Goyal DataTalks.Club

    • Technology

We talked about:



Sonal’s background
How the idea for Zingg came about
What Zingg is
The difference between entity resolution and identity resolution
How duplicate detection relates to entity resolution
How Sonal decided to start working on Zingg
How Zingg works
What Zingg runs on
Switching from consultancy to working on a new open source solution
Why Zingg is open source
Open source licensing
Working on Zingg initially vs now
Zingg’s current and future team
Sonal’s biggest current challenge
Avoiding problems with entity/identity resolution through database design
Identity resolution vs basic joins, data fusions, and fuzzy joins
Deterministic matching vs probabilistic machine learning
Identity and entity resolution applications for fraud detection
Graph algorithms vs classic ML in entity resolution
Identity resolution success stories
What Sonal would do differently given the chance to start over with Zingg
Advice for those seeking to realize their own solution to a data problem
Reading suggestion from Sonal
Conclusion



Links:


Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466



ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:



Sonal’s background
How the idea for Zingg came about
What Zingg is
The difference between entity resolution and identity resolution
How duplicate detection relates to entity resolution
How Sonal decided to start working on Zingg
How Zingg works
What Zingg runs on
Switching from consultancy to working on a new open source solution
Why Zingg is open source
Open source licensing
Working on Zingg initially vs now
Zingg’s current and future team
Sonal’s biggest current challenge
Avoiding problems with entity/identity resolution through database design
Identity resolution vs basic joins, data fusions, and fuzzy joins
Deterministic matching vs probabilistic machine learning
Identity and entity resolution applications for fraud detection
Graph algorithms vs classic ML in entity resolution
Identity resolution success stories
What Sonal would do differently given the chance to start over with Zingg
Advice for those seeking to realize their own solution to a data problem
Reading suggestion from Sonal
Conclusion



Links:


Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466



ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

53 min

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Acquired
Ben Gilbert and David Rosenthal
BG2Pod with Brad Gerstner and Bill Gurley
BG2Pod
The Neuron: AI Explained
The Neuron
TED Radio Hour
NPR