22 episodes

Welcome to the Data Science Conversations Podcast hosted by Damien Deighan and Dr Philipp Diesinger. We bring you interesting conversations with the world’s leading Academics working on cutting edge topics with potential for real world impact.

We explore how their latest research in Data Science and AI could scale into broader industry applications, so you can expand your knowledge and grow your career.

Every 4 or 5 episodes we will feature an industry trailblazer from a strong academic background who has applied research effectively in the real world.

Podcast Website: www.datascienceconversations.com

Data Science Conversations Damien Deighan and Philipp Diesinger

- Technology
- 5.0 • 3 Ratings

- 4 MAR 2024
Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

At LanguageTool, Bartmoss St Clair (Head of AI) is pioneering the use of Large Language Models (LLMs) for grammatical error correction (GEC), moving away from the tool's initial non-AI approach to create a system capable of catching and correcting errors across multiple languages.
LanguageTool supports over 30 languages, has several million users, and over 4 million installations of its browser add-on, benefiting from a diverse team of employees from around the world.
Episode Summary -
LanguageTool decided against using existing LLMs like GPT-3 or GPT-4 due to cost, speed, and accuracy benefits of developing their own models, focusing on creating a balance between performance, speed, and cost.The tool is designed to work with low latency for real-time applications, catering to a wide range of users including academics and businesses, with the aim to balance accurate grammar correction without being intrusive.Bartmoss discussed the nuanced approach to grammar correction, acknowledging that language evolves and user preferences may vary, necessitating a balance between strict grammatical rules and user acceptability.The company employs a mix of decoder and encoder-decoder models depending on the task, with a focus on contextual understanding and the challenges of maintaining the original meaning of text while correcting grammar.A hybrid system that combines rule-based algorithms with machine learning is used to provide nuanced grammar corrections and explanations for the corrections, enhancing user understanding and trust.LanguageTool is developing a generalized GEC system, incorporating legacy rules and machine learning for comprehensive error correction across various types of text.Training models involve a mix of user data, expert-annotated data, and synthetic data, aiming to reflect real user error patterns for effective correction.The company has built tools to benchmark GEC tasks, focusing on precision, recall, and user feedback to guide quality improvements.Introduction of LLMs has expanded LanguageTool's capabilities, including rewriting and rephrasing, and improved error detection beyond simple grammatical rules.Despite the higher costs associated with LLMs and hosting infrastructure, the investment is seen as worthwhile for improving user experience and conversion rates for premium products.Bartmoss speculates on the future impact of LLMs on language evolution, noting their current influence and the importance of adapting to changes in language use over time.LanguageTool prioritizes privacy and data security, avoiding external APIs for grammatical error correction and developing their systems in-house with open-source models.
- 50 min
- 29 JAN 2024
The Path to Responsible AI with Julia Stoyanovich of NYU

The Path to Responsible AI with Julia Stoyanovich of NYU

In this enlightening episode, Dr. Julia Stoyanovich delves into the world of responsible AI, exploring the ethical, societal, and technological implications of AI systems. She underscores the importance of global regulations, human-centric decision-making, and the proactive management of biases and risks associated with AI deployment. Through her expert lens, Dr. Stoyanovich advocates for a future where AI is not only innovative but also equitable, transparent, and aligned with human values.
Julia is an Institute Associate Professor at NYU in both the Tandon School of Engineering, and the Center for Data Science. In addition she is Director of the Center for Responsible AI also at NYU. Her research focuses on responsible data management, fairness, diversity, transparency, and data protection in all stages of the data science lifecycle.
Episode Summary -
The Definition of Responsible AIExample of ethical AI in the medical world - Fast MRI technologyFairness and Diversity in AIThe role of regulation - What it can and can’t doTransparency, Bias in AI models and Data ProtectionThe dangers of Gen AI Hype and problematic AI narratives from the tech industryThe impotence of humans in ensuring ethical development Why “Responsible AI” is actually a bit of a misleading termWhat Data & AI leaders can do to practise Responsible AI
- 48 min
- 8 DEC 2023
Transforming Freight Logistics with AI and Machine Learning

Transforming Freight Logistics with AI and Machine Learning

Luis Moreira-Matias is Senior Director of Artificial Intelligence at sennder, Europe’s leading digital freight forwarder. At sennder, Luis founded sennAI: sennder’s organization that oversees the creation (from R&D to real-world productization) of proprietary AI technology for the road logistics industry.

During his 15 years of career, Luis led 50+ FTEs across 4+ organisations to develop award-winning ML solutions to address real-world problems in various fields such as e-commerce, travel, logistics, and finance.

Luis holds a Ph.D. in Machine Learning from the U. Porto, Portugal. He possesses a world-class academic track with high impact publications at top tier venues in ML/AI fundamentals, 5 patents and multiple keynotes worldwide - ranging from Brisbane (Australia) to Las Palmas (Spain).
- 1 hr 1 min
- 1 NOV 2023
The future of LLMs, ELMs and the semantic layer

The future of LLMs, ELMs and the semantic layer

In this episode Tarush Aggarwal, formerly of Salesforce and WeWork is back on the podcast to discuss the evolution of the Semantic layer and how that can help practitioners get results from LLMs. We also discuss how smaller ELMs (expert language models) might be the future when it comes to consistent reliable outputs from Generative AI and also the impact of all of this on traditional BI tools.
- 34 min
- 9 MAY 2023
Data Strategy Evolved: How the Biological Model fuels enterprise data performance

Data Strategy Evolved: How the Biological Model fuels enterprise data performance

In this episode Patrick McQuillan shares his innovative Biological Model - a concept you can use to enhance data outcome in large enterprises. The concept takes the idea that the best way to design a data strategy is to align it closely with a biological system.
He discusses the power of centralized information, importance of data governance, and the necessity for a common performance narrative across an organization.
Episode Summary -
- Biological Model Concept
- Centralized vs. Decentralized Data
- Data Collection and Maturity
- Horizontal translation layer
- Partnership with vertical leaders
- Curated data layers
- Data dictionary for consistency
- Focusing on vital metrics
- Data Flow in Organizations
- Biological Model Governance
- Overcoming Inconsistency and Inaccuracy
- 56 min
- 14 MAR 2023
Mapping forests: Verifying carbon offsetting with machine learning

Mapping forests: Verifying carbon offsetting with machine learning

In this episode Heidi Hurst returns to talk to us about how in her current role at Pachama she is using the power of machine learning to fight climate change. She discusses her work in measuring the capacity of existing forests and reforestation projects using satellite imagery.
Episode Summary
1. The importance of carbon credits verification in mitigating climate change
2. How Pachama is using machine learning and satellite imagery to verify carbon projects
3. Three types of carbon projects: avoided deforestation, reforestation, and improved forest management
4. Challenges in using satellite imagery to measure the capacity of existing forests
5. The role of multispectral imaging in measuring density of forests
6. Challenges in collecting data from dense rainforests and weather obstructions
7. The impact of machine learning on scaling up carbon verification
8. Advancements in the field of satellite imaging, particularly in small satellite constellations
- 25 min