DataTalks.Club

DataTalks.Club
DataTalks.Club

DataTalks.Club - the place to talk about data!

  1. 12月13日

    Career advice, learning, and featuring women in ML and AI - Isabella Bicalho

    In this podcast episode, we talked with Isabella Bicalho about Career advice, learning, and featuring women in ML and AI. About the Speaker: Isabella is a Machine Learning Engineer and Data Scientist with three years of hands-on AI development experience. She draws upon her early computational research expertise to develop ML solutions. While contributing to open-source projects, she runs a newsletter dedicated to showcasing women's accomplishments in data science. During this event, the guest discussed her transition into machine learning, her freelance work in AI, and the growing AI scene in France. She shared insights on freelancing versus full-time work, the value of open-source contributions, and developing both technical and soft skills. The conversation also covered career advice, mentorship, and her Substack series on women in data science, emphasizing leadership, motivation, and career opportunities in tech. 0:00 Introduction 1:23 Background of Isabella Bicalho 2:02 Transition to machine learning 4:03 Study and work experience 5:00 Living in France and language learning 6:03 Internship experience 8:45 Focus areas of Inria 9:37 AI development in France 10:37 Current freelance work 11:03 Freelancing in machine learning 13:31 Moving from research to freelancing 14:03 Freelance vs. full-time data science 17:00 Finding first freelance client 18:00 Involvement in open-source projects 20:17 Passion for open-source and teamwork 23:52 Starting new projects 25:03 Community project experience 26:02 Teaching and learning 29:04 Contributing to open-source projects 32:05 Open-source tools vs. projects 33:32 Importance of community-driven projects 34:03 Learning resources 36:07 Green space segmentation project 39:02 Developing technical and soft skills 40:31 Gaining insights from industry experts 41:15 Understanding data science roles 41:31 Project challenges and team dynamics 42:05 Turnover in open-source projects 43:05 Managing expectations in open-source work 44:50 Mentorship in projects 46:17 Role of AI tools in learning 47:59 Overcoming learning challenges 48:52 Discussion on substack 49:01 Interview series on women in data 50:15 Insights from women in data science 51:20 Impactful stories from substack 53:01 Leadership challenges in projects 54:19 Career advice and opportunities 56:07 Motivating others to step out of comfort zone 57:06 Contacting for substack story sharing 58:00 Closing remarks and connections 🔗 CONNECT WITH ISABELLA BICALHO Github: github https://github.com/bellabf LinkedIn:   / isabella-frazeto   🔗 CONNECT WITH DataTalksClub Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html Datalike Substack - https://datalike.substack.com/ LinkedIn:   / datatalks-club

    55 分钟
  2. 12月6日

    AI in Industry: Trust, Return on Investment and Future - Maria Sukhareva

    Reflection on an Almost Two-Year Journey of Generative AI in Industry – Maria Sukhareva ​About the speaker: ​Maria Sukhareva is a principal key expert in Artificial Intelligence in Siemens with over 15 years of experience at the forefront of generative AI technologies. Known for her keen eye for technological innovation, Maria excels at transforming cutting-edge AI research into practical, value-driven tools that address real-world needs. Her approach is both hands-on and results-focused, with a commitment to creating scalable, long-term solutions that improve communication, streamline complex processes, and empower smarter decision-making. Maria's work reflects a balanced vision, where the power of innovation is met with ethical responsibility, ensuring that her AI projects deliver impactful and production-ready outcomes. We talked about: 00:00 DataTalks.Club intro 02:13 Career journey: From linguistics to AI 08:02 The Evolution of AI Expertise and its Future 13:10 AI vulnerabilities: Bypassing bot restrictions 17:00 Non-LLM classifiers as a more robust solution 22:56 Risks of chatbot deployment: Reputational and financial 27:13 The role of AI as a tool, not a replacement for human workers 31:41 The role of human translators in the age of AI 34:49 Evolution of English and its Germanic roots 38:44 Beowulf and Old English 39:43 Impact of the Norman occupation on English grammar 42:34 Identifying mushrooms with AI apps and safety precautions 45:08 Decoding ancient languages ​​like Sumerian 49:43 The evolution of machine translation and multilingual models 53:01 Challenges with low-resource languages ​​and inconsistent orthography 57:28 Transition from academia to industry in AI Join our Slack: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

    53 分钟
  3. 11月1日

    Using Data to Create Liveable Cities - Rachel Lim

    We talked about: 00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science About the speaker: Rachel is an urban data scientist dedicated to creating liveable cities through the innovative use of data. With a background in geography, and a masters in urban data science, she blends qualitative and quantitative analysis to tackle urban challenges. Her aim is to integrate data driven techniques with urban design to foster sustainable and equitable urban environments.  Links: - https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html 00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science Join our slack: https: //datatalks.club/slack.html

    46 分钟
  4. 10月10日

    Human-Centered AI for Disordered Speech Recognition - Katarzyna Foremniak

    We talked about: 00:00 DataTalks.Club intro 08:06 Background and career journey of Katarzyna 09:06 Transition from linguistics to computational linguistics 11:38 Merging linguistics and computer science 15:25 Understanding phonetics and morpho-syntax 17:28 Exploring morpho-syntax and its relation to grammar 20:33 Connection between phonetics and speech disorders 24:41 Improvement of voice recognition systems 27:31 Overview of speech recognition technology 30:24 Challenges of ASR systems with atypical speech 30:53 Strategies for improving recognition of disordered speech 37:07 Data augmentation for training models 40:17 Transfer learning in speech recognition 42:18 Challenges of collecting data for various speech disorders 44:31 Stammering and its connection to fluency issues 45:16 Polish consonant combinations and pronunciation challenges 46:17 Use of Amazon Transcribe for generating podcast transcripts 47:28 Role of language models in speech recognition 49:19 Contextual understanding in speech recognition 51:27 How voice recognition systems analyze utterances 54:05 Personalization of ASR models for individuals 56:25 Language disorders and their impact on communication 58:00 Applications of speech recognition technology 1:00:34 Challenges of personalized and universal models 1:01:23 Voice recognition in automotive applications 1:03:27 Humorous voice recognition failures in cars 1:04:13 Closing remarks and reflections on the discussion About the speaker: Katarzyna is a computational linguist with over 10 years of experience in NLP and speech recognition. She has developed language models for automotive brands like Audi and Porsche and specializes in phonetics, morpho-syntax, and sentiment analysis. Kasia also teaches at the University of Warsaw and is passionate about human-centered AI and multilingual NLP. Join our slack: https://datatalks.club/slack.html

    48 分钟
  5. 8月15日

    DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh

    0:00 hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love 0:06 data and we have weekly events and today one is one of such events and I guess we 0:12 are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so 0:19 much because this is the time we usually have uh uh our events uh for our guests 0:27 and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of 0:34 slipped my mind but anyways we have a lot of events you can check them in the 0:41 description like there's a link um I don't think there are a lot of them right now on that link but we will be 0:48 adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget 0:56 to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome 1:02 as the one today and of course very important do not forget to join our community where you can hang out with 1:09 other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click 1:18 on that link ask your question and we will be covering these questions during the interview now I will stop sharing my 1:27 screen and uh there is there's a a message in uh and Christopher is from 1:34 you so we actually have this on YouTube but so they have not seen what you wrote 1:39 but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I 1:46 call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't 1:53 need like you we'll need to focus on answering questions and I'll keep an eye 1:58 I'll be keeping an eye on all the question questions so um 2:04 yeah if you're ready we can start I'm ready yeah and you prefer Christopher 2:10 not Chris right Chris is fine Chris is fine it's a bit shorter um 2:18 okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per 2:25 year but we actually skipped one year so because we did not have we haven't had 2:31 Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and 2:37 head chef or hat cook at data kitchen with 25 years of experience maybe this 2:43 is outdated uh cuz probably now you have more and maybe you stopped counting I 2:48 don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the 2:55 co-author of the data Ops cookbook and data Ops Manifesto and it's not the 3:00 first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one 3:07 will be about data hops so we'll catch up and see what actually changed in in 3:13 these two years and yeah so welcome to the interview well thank you for having 3:19 me I'm I'm happy to be here and talking all things related to data Ops and why 3:24 why why bother with data Ops and happy to talk about the company or or what's changed 3:30 excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always 3:37 thanks Johanna for your help so before we start with our main topic for today 3:42 data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who 3:50 have not heard have not listened to the previous podcast maybe you can um talk 3:55 about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed 4:03 in the last two years so we'll do yeah so um my name is Chris so I guess I'm 4:09 a sort of an engineer so I spent about the first 15 years of my career in 4:15 software sort of working and building some AI systems some non- AI systems uh 4:21 at uh Us's NASA and MIT linol lab and then some startups and then um 4:30 Microsoft and then about 2005 I got I got the data bug uh I think you know my 4:35 kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life 4:41 would be fine um because I was a big you started your own company right and uh it didn't work out that way 4:50 and um and what was interesting is is for me it the problem wasn't doing the 4:57 data like I we had smart people who did data science and data engineering the act of creating things it was like the 5:04 systems around the data that were hard um things it was really hard to not have 5:11 errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my 5:18 Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and 5:24 look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and 5:30 very happy um and if there was I'd have to like rce myself um and you know and 5:36 then the second problem is the team I worked for we just couldn't go fast enough the customers were super 5:42 demanding they didn't care they all they always thought things should be faster and we are always behind and so um how 5:50 do you you know how do you live in that world where things are breaking left and right you're terrified of making errors 5:57 um and then second you just can't go fast enough um and it's preh Hadoop era 6:02 right it's like before all this big data Tech yeah before this was we were using 6:08 uh SQL Server um and we actually you know we had smart people so we we we 6:14 built an engine in SQL Server that made SQL Server a column or 6:20 database so we built a column or database inside of SQL Server um so uh 6:26 in order to make certain things fast and and uh yeah it was it was really uh it's not 6:33 bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's 6:38 still queries um things like that we we uh at the time uh you would use olap 6:43 engines we didn't use those but you those reports you know are for models it's it's not that different um you know 6:50 we had a rack of servers instead of the cloud um so yeah and I think so what what I 6:57 took from that was uh it's just hard to run a team of people to do do data and analytics and it's not 7:05 really I I took it from a manager perspective I started to read Deming and 7:11 think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um 7:18 and so how do you run that factory so it produces things that are good of good 7:24 quality and then second since I had come from software I've been very influenced 7:29 by by the devops movement how you automate deployment how you run in an agile way how you 7:35 produce um how you how you change things quickly and how you innovate and so 7:41 those two things of like running you know running a really good solid production line that has very low errors 7:47 um and then second changing that production line at at very very often they're kind of opposite right um and so 7:55 how do you how do you as a manager how do you technically approach that and 8:00 then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off 8:07 uh with some customers we started building some software and realized that we couldn't work any other way and that 8:13 the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our 8:21 methods and then so yeah we've been in so we've been in business now about a little over 10 8:28 years oh that's cool and uh like what 8:33 uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do 8:41 you remember roughly when devops as I think started to appear like when did people start calling these principles 8:49 and like tools around them as de yeah so agile Manifesto well first of all the I 8:57 mean I had a boss in 1990 at Nasa who had this idea build a 9:03 little test a little learn a lot right that was his Mantra and then which made 9:09 made a lot of sense um and so and then the sort of agile software Manifesto 9:14 came out which is very similar in 2001 and then um the sort of first real 9:22 devops was a guy at Twitter started to do automat automated deployment you know 9:27 push a button and that was like 200 Nish and so the first I think devops 9:33 Meetup was around then so it's it's it's been 15 years I guess 6 like I was 9:39 trying to so I started my career in 2010 so I my first job was a Java 9:44 developer and like I remember for some things like we would just uh SFTP to the 9:52 machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like 10:00 it was not really the I wouldn't call it this way right you were deploying you 10:06 had a Dey process I put it yeah 10:11 right was that so that was documented too it was like put the jar on production cross your 10:17 fingers I think there was uh like a page on uh some internal Viki uh yeah that 10:25 describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is 10:33 why that changed right and and we laugh at it now but that was why didn't you 10:38 invest in automating deployment or a whole bunch of automated regression 10:44 tests right that would run because I think in software now that would be rare 10:49 that people wouldn't use C CD they wouldn't have some automated tests you know functional 10:56 regression tests that would be the

    54 分钟

评分及评论

5
共 5 分
7 个评分

关于

DataTalks.Club - the place to talk about data!

你可能还喜欢

若要收听包含儿童不宜内容的单集,请登录。

关注此节目的最新内容

登录或注册,以关注节目、存储单集,并获取最新更新。

选择国家或地区

非洲、中东和印度

亚太地区

欧洲

拉丁美洲和加勒比海地区

美国和加拿大