1h 26 min

Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML Vector Podcast

    • Ciência

Toloka’s support for Academia: grants and educator partnerships
https://toloka.ai/collaboration-with-educators-form
https://toloka.ai/research-grants-form
These are pages leading to them:
https://toloka.ai/academy/education-partnerships
https://toloka.ai/grants
Topics:
00:00 Intro
01:25 Jenny’s path from graduating in ML to a Data Advocate role
07:50 What goes into the labeling process with Toloka
11:27 How to prepare data for labeling and design tasks
16:01 Jenny’s take on why Relevancy needs more data in addition to clicks in Search
18:23 Dmitry plays the Devil’s Advocate for a moment
22:41 Implicit signals vs user behavior and offline A/B testing
26:54 Dmitry goes back to advocating for good search practices
27:42 Flower search as a concrete example of labeling for relevancy
39:12 NDCG, ERR as ranking quality metrics
44:27 Cross-annotator agreement, perfect list for NDCG and Aggregations
47:17 On measuring and ensuring the quality of annotators with honeypots
54:48 Deep-dive into aggregations
59:55 Bias in data, SERP, labeling and A/B tests
1:16:10 Is unbiased data attainable?
1:23:20 Announcements
This episode on YouTube: https://youtu.be/Xsw9vPFqGf4
Podcast design: Saurabh Rai: https://twitter.com/srvbhr

Toloka’s support for Academia: grants and educator partnerships
https://toloka.ai/collaboration-with-educators-form
https://toloka.ai/research-grants-form
These are pages leading to them:
https://toloka.ai/academy/education-partnerships
https://toloka.ai/grants
Topics:
00:00 Intro
01:25 Jenny’s path from graduating in ML to a Data Advocate role
07:50 What goes into the labeling process with Toloka
11:27 How to prepare data for labeling and design tasks
16:01 Jenny’s take on why Relevancy needs more data in addition to clicks in Search
18:23 Dmitry plays the Devil’s Advocate for a moment
22:41 Implicit signals vs user behavior and offline A/B testing
26:54 Dmitry goes back to advocating for good search practices
27:42 Flower search as a concrete example of labeling for relevancy
39:12 NDCG, ERR as ranking quality metrics
44:27 Cross-annotator agreement, perfect list for NDCG and Aggregations
47:17 On measuring and ensuring the quality of annotators with honeypots
54:48 Deep-dive into aggregations
59:55 Bias in data, SERP, labeling and A/B tests
1:16:10 Is unbiased data attainable?
1:23:20 Announcements
This episode on YouTube: https://youtu.be/Xsw9vPFqGf4
Podcast design: Saurabh Rai: https://twitter.com/srvbhr

1h 26 min

Top podcasts em Ciência

Ciência Sem Fim
Estúdios Flow
Ciência Suja
Ciência Suja
Naruhodo
B9, Naruhodo, Ken Fujioka, Altay de Souza
Ta de Clinicagem
tadeclinicagem
Sinapse
Ciência Todo Dia
Os três elementos
Os três elementos / TocaCast