AI lab by information labs

information labs
AI lab by information labs

AI lab podcast, "decrypting" expert analysis to understand Artificial Intelligence from a policy making point of view.

  1. AI lab TL;DR | Carys J. Craig - The Copyright Trap and AI Policy

    JAN 27 · BONUS

    AI lab TL;DR | Carys J. Craig - The Copyright Trap and AI Policy

    🔍 In this TL;DR episode, Carys J Craig (Osgoode Professional Development) explains the "copyright trap" in AI regulation, where relying on copyright favors corporate interests over creativity. She challenges misconceptions about copying and property rights, showing how this approach harms innovation and access. Carys offers alternative ways to protect human creativity without falling into this trap. 📌 TL;DR Highlights ⏲️[00:00] Intro ⏲️[00:46] Q1-What is the "Copyright Trap," and why could it harm AI and creativity? ⏲️[10:05] Q2-Can you explain the three routes that lead into the copyright trap and their relevance to AI? ⏲️[22:08] Q3-What alternatives should policymakers consider to protect creators and manage AI? ⏲️[28:45] Wrap-up & Outro 💭 Q1 - What is the "Copyright Trap," and why could it harm AI and creativity? 🗣️ “To turn to copyright law is to turn to really a false friend. The idea that copyright is going to be our friend, is going to help us in this situation,(...) it's likely to do more harm than good." 🗣️ “We are imagining increasingly in these policy debates that copyright and protection of copyright owners will be a kind of counterweight to corporate power and to the sort of extractive logics of Big Tech and AI development. I think that that is misguided. And in fact, we're playing into the interests of both the entertainment industries and big tech ” 🗣️ "When we run into the copyright trap, this sort of conviction that copyright is going to be the right regulatory tool, we are sort of defining how this technology is going to evolve in a way that I think will backfire and will actually undermine the political objectives of those who are pointing to the inequities and the unfairness behind the technology and the way that it's being developed.” 🗣️ "AI industry, big tech industry and the creative industry stakeholders are all, I think, perfectly happy to approach these larger policy questions through the sort of logic of copyright, sort of proprietary logic of ownership, control, exchange in the free market, licencing structures that we're already seeing taking hold" 🗣️ "What we're going to see, I think, if we run into the copyright trap is that certainly smaller developers, but really everyone will be training the technology on incomplete data sets, the data sets that reflect the sort of big packaged data products that have been exchanged for value between the main market actors. So that's going to lessen the quality really of what's going in generally by making it more exclusive and less inclusive." 💭 Q2 - Can you explain the three routes that lead into the copyright trap and their relevance to AI? 🗣️ ""The first route that I identify is what's sometimes called the if-value-then-right fallacy. So that's the assumption that if something has value, then there should be or must be some right over it.“ 🗣️ "Because something has value, whether economic or social, doesn't mean we should turn it into property that can be owned and controlled through these exclusive rights that we find in copyright law."  🗣️ "The second route that I identify is a sort of obsession with copying and the idea that copying is inherently just a wrongful activity. (...) The reality is that there's nothing inherently wrongful about copying. And in fact, this is how we learn. This is how we create. 🗣️ "One of the clearest routes into the copyright trap is saying, well, you know, you have to make copies of texts in order to train AI. So of course, copyright is implicated. And of course, we have to prevent that from happening without permission.. (...) But our obsession with the individuated sort of discrete copies of works behind the scenes is now an anachronism that we really need to let go.” 🗣️ "Using the figure of the artist as a reason to expand copyright control, and assuming that that's going to magically turn into lining the pockets of artists and creators seems to me to be a fallacy and a route into the copyright trap." 💭 Q3 - Why is output-based remuneration better for creators, AI developers, and society? 🗣️ "The health of our cultural environment (..) [should be] the biggest concern and not simply or only protecting creators as a separate class of sort of professional actors." 🗣️ "I think what we could do is shift our copyright policy focus to protecting and encouraging human authorship by refusing to protect AI generated outputs. 🗣️ "If the outputs of generative AI are substantially similar to works on which the AI was trained, then those are infringing outputs and copyright law will apply to them such that to distribute those infringing copies would produce liability under the system as it currently exists.“ 🗣️ "There are privacy experts who might be much better placed to say how should we curate or ensure that we regulate the data on which the machines are trained and I would be supportive of those kinds of interventions at the input stage. 🗣️ “Copyright seems like a tempting way to do it but that's not what it does. And so maybe rather than some of the big collective licencing solutions that are being imagined in this context, we'd be better off thinking about tax solutions, where we properly tax big tech and then we use that tax in a way that actually supports the things that we as a society care about, including funding culture and the arts." 📌 About Our Guest 🎙️ Carys J Craig | Osgoode Hall Law School 🌐 Article | The AI-Copyright Trap https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4905118  🌐 Carys J Craig https://www.osgoode.yorku.ca/faculty-and-staff/craig-carys-j/   Carys is the Academic Director of the Osgoode Professional Development LLM Program in Intellectual Property Law, and recently served as Osgoode’s Associate Dean. A recipient of multiple teaching awards, Carys researches and publishes widely on intellectual property law and policy, with an emphasis on authorship, users’ rights and the public domain. #AI #ArtificialIntelligence #GenerativeAI

    29 min
  2. AI lab TL;DR | Ariadna Matas - Should Institutions Enable or Prevent Cultural Data Mining?

    JAN 13 · BONUS

    AI lab TL;DR | Ariadna Matas - Should Institutions Enable or Prevent Cultural Data Mining?

    🔍 In this TL;DR episode, Ariadna Matas (Europeana Foundation) discusses how the 2019 Copyright Directive has influenced text and data mining practices in cultural heritage institutions, highlighting the tension between public interest missions and restrictive approaches, and explores the broader implications of opt-outs on access, research, and the role of AI in the cultural sector. 📌 TL;DR Highlights ⏲️[00:00] Intro ⏲️[00:53] Q1-How did the 2019 Copyright Directive change the landscape for cultural heritage institutions in terms of text and data mining? ⏲️[05:07] Q2-Why are some cultural heritage institutions choosing to opt-out of text and data mining and what are the challenges involved? ⏲️[11:27] Q3-What are the broader implications of these opt-outs for research, smaller institutions, and open access to cultural content? ⏲️[14:53] Wrap-up & Outro 💭 Q1 - How did the 2019 Copyright Directive change the landscape for cultural heritage institutions in terms of text and data mining? 🗣️ "The 2019 Directive was expected to open up possibilities for cultural heritage institutions to continue their public interest mission with the help of technology." 🗣️ "The first big use of text and data mining techniques is to facilitate cultural heritage institutions’ day-to-day work.It’s rare to see cultural heritage institutions preparing datasets for public text and data mining activities." 🗣️ "More institutions are leaning toward putting barriers on data use rather than encouraging it.Instead of embracing possibilities, there’s unnecessary caution in the cultural heritage sector around AI." 💭 Q2 - Why are some cultural heritage institutions choosing to opt-out of text and data mining and what are the challenges involved? 🗣️ "The only fully legitimate reason for opting out is when the rights holder explicitly requests it." 🗣️ "Cultural heritage institutions rarely own the copyright for the materials they hold, making enforcement of opt-outs challenging." 🗣️ "Confusion about the legal framework leads some institutions to fear they must protect data from misuse." 🗣️ "By opting out, institutions risk missing out on positive uses of their data due to fear of negative outcomes." 🗣️ "Cultural heritage institutions have a public interest mission to safeguard access and encourage the use of their information." 💭 Q3 - What are the broader implications of these opt-outs for research, smaller institutions, and open access to cultural content? 🗣️ "Some organisations block access to avoid supporting big players in activities perceived as unethical." 🗣️ "Opting out doesn’t weaken monopolistic practices but harms smaller players who can’t access the data." 🗣️ "Institutions must balance the implications of their decisions on access with the potential for positive uses." 🗣️ "Aggressive crawling that disrupts public services may justify access restrictions in certain cases." 🗣️ "Overly broad decisions could limit the positive applications of text and data mining techniques on cultural heritage data." 📌 About Our Guest 🎙️ Ariadna Matas | Europeana Foundation 🌐 Article | AI ‘opt-outs’: should cultural heritage institutions (dis)allow the mining of cultural heritage data? AI ‘opt-outs’: should cultural heritage institutions (dis)allow the mining of cultural heritage data? 🌐 Ariadna Matas https://pro.europeana.eu/person/ariadna-matas Ariadna is Policy Advisor at the Europeana Foundation, an independent, non-profit organisation that stewards the common European data space for cultural heritage and contributes to other digital initiatives that put cultural heritage to good use in the world.  #AI #ArtificialIntelligence #GenerativeAI

    16 min
  3. AI lab TL;DR | Martin Senftleben - How Copyright Challenges AI Innovation and Creativity

    12/16/2024 · BONUS

    AI lab TL;DR | Martin Senftleben - How Copyright Challenges AI Innovation and Creativity

    🔍 In this TL;DR episode, Martin Senftleben (Institute for Information Law (IViR) & University of Amsterdam) discusses how EU regulations, including the AI Act and copyright frameworks, impose heavy burdens on AI training and development. The discussion highlights concerns about bias, quality, and fairness due to opt-outs and complex rights management systems, questioning whether these rules truly benefit individual creators. A proposal is made to focus regulatory efforts on the market exploitation phase of AI systems, ensuring compensation flows back to creative industries and authors through well-managed redistribution mechanisms. 📌 TL;DR Highlights ⏲️[00:00] Intro ⏲️[01:04] Q1-How does the EU's current approach to AI training and copyright hinder innovation and fair pay for authors? ⏲️[04:53] Q2-What’s your alternative to balance author compensation and AI development? ⏲️[06:50] Q3-Why is output-based remuneration better for creators, AI developers, and society? ⏲️[09:23] Wrap-up & Outro 💭 Q1 - How does the EU's current approach to AI training and copyright hinder innovation and fair pay for authors? 🗣️ "What policymakers try to do in this space where we try to reconcile AI innovation with traditional copyright goals is: first of all, of course we want the best AI systems and we want the least biassed AI systems." 🗣️ "The regulation puts a heavy, heavy burden on AI training by requiring to take into account rights reservations, the so-called opts-outs." 🗣️ "You might not get the best AI systems if you put all these burdens on the AI training process." 🗣️ "The moment you allow rights holders to opt out and to remove certain resources from training, then of course you no longer know whether you get the least biassed AI systems." 🗣️ "The simple fact that a big creative industry right holder receives some extra money doesn't mean that this money is passed on to the individual authors really doing the creative work." 💭 Q2 - What’s your alternative to balance author compensation and AI development? 🗣️ "If we imagine a regulation that leaves this development phase totally unencumbered by copyright burdens, you give lots of freedom for AI developers to use all the resources they think are necessary." 🗣️ "Once we have these fully developed, high potential AI systems and these systems are brought to the market, (...) You place a tax, a burden on the AI systems, not at the development stage, but at the moment where they are exploited in the marketplace.” 🗣️ "The money finally flows back in the form of compensation to the creative industry and individual authors." 💭 Q3 - Why is output-based remuneration better for creators, AI developers, and society? 🗣️ "From a European perspective, it's quite easy to propose that this should be collecting societies because we have a very well developed system of collective rights management in the area of copyright." 🗣️ "In the case of AI output, you can also use data from the systems itself: to which extent is a certain style, a certain genre prominent in prompts that users enter? What type of AI output is generated and to which extent does it resemble certain pre-existing human works and creations? What is the market share on the more general market for literary, artistic expression and so on?" 🗣️ "Traditionally, repartitioning schemes have a split between money that is directly given to individual authors and money that is given to the industry.We have a guarantee that a certain percentage of the money will directly reach the individual authors and performers, and will not stay at industry level exclusively." 📌 About Our Guest 🎙️ Martin Senftleben | Institute for Information Law (IViR) and University of Amsterdam 🌐 Article | Win-win: How to Remove Copyright Obstacles to AI Training While Ensuring Author Remuneration (and Why the European AI Act Fails to Do the Magic) Win-win: How to Remove Copyright Obstacles to AI Training While Ensuring Author Remuneration (and Why the European AI Act Fails to Do the Magic) 🌐 Martin Senftleben linkedin.com/in/martin-senftleben-2430aa5b            Martin Senftleben is Professor of Intellectual Property Law and Director, Institute for Information Law (IViR), University of Amsterdam. His activities focus on the reconciliation of private intellectual property rights with competing public interests of a social, cultural or economic nature. He publishes extensively on these topics and lectures across the globe. #AI #ArtificialIntelligence #GenerativeAI

    10 min
  4. AI lab TL;DR | Mark Lemley - How Generative AI Disrupts Traditional Copyright Law

    11/25/2024 · BONUS

    AI lab TL;DR | Mark Lemley - How Generative AI Disrupts Traditional Copyright Law

    🔍 In this TL;DR episode, Mark Lemley (Stanford Law School) discusses how generative AI challenges traditional copyright doctrines, such as the idea-expression dichotomy and substantial similarity test, and explores the evolving role of human creativity in the age of AI. 📌 TL;DR Highlights⏲️[00:00] Intro⏲️[00:54] Q1-How does genAI challenge traditional copyright doctrines and will this lead to an evolution of copyright?⏲️[03:58] Q2-Can we expect new forms of legal recognition or protection for prompts?⏲️[06:13] Q3-Are current copyright rules able to address authorship in genAI works or do we need new legal categories?⏲️[08:00] Wrap-up & Outro 💭 Q1 - How does genAI challenge traditional copyright doctrines and will this lead to an evolution of copyright? 🗣️ "Copyright law has always tried to protect creative expression but is careful not to protect the idea behind a work."🗣️ "Generative AI changes the normal economics and dynamics of creation by doing the hard work for us, like making the painting or doing the actual brushstrokes."🗣️ "If copyright law doesn’t protect the expression created by AI rather than by a person, the question is, what, if anything, is there to copyright?"🗣️ "Generative AI blows up the substantial similarity test because it’s unclear whether two similar works came from the same prompt or if the AI just made the same thing."🗣️ "I might copy your prompt, input it into generative AI, and get a different output—making similarity no longer the evident marker of copying." 💭 Q2 - Can we expect new forms of legal recognition or protection for prompts? 🗣️ "We're still litigating whether the material generated by AI can be copyrighted, but we may ultimately say yes, as with photography 150 years ago."🗣️ "Courts may get comfortable with the idea that structuring the prompt and iterating it is a form of creativity that leads to the final output."🗣️ "In early photography, we gave copyright protection even though the machine made the image, because human judgment helped determine the outcome."🗣️ "Prompt engineering could become more sophisticated, leading courts to see creativity in how prompts are structured and refined."🗣️ "Sometimes I just ask a very simple question, and if that's all I contribute, I’m not sure there’s any protection." 💭 Q3 - Are current copyright rules able to address authorship in genAI works or do we need new legal categories? 🗣️ "There may be something around the creativity of prompts that will matter, but we're not there yet in terms of case law."🗣️ "The assumption that 'I made a movie, I wrote text, so I get copyright in that work' is going to be called into question in the generative AI context."🗣️ "Movie studios or video game companies that use AI to save money might be shocked when other people are free to copy AI-generated backgrounds."🗣️ "Even if we get copyright protection for AI outputs, it will occupy a weird middle ground that feels different from what we’re used to."🗣️ "There’s going to be pressure to change the law to make it align more with what copyright industries have been comfortable with, but it won’t be easy." 📌 About Our Guest🎙️ Mark Lemley | Stanford Law School🌐 Article | How Generative AI Turns Copyright Upside Downhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4517702 🌐 Mark Lemleyhttps://law.stanford.edu/mark-a-lemley/ Mark is William H. Neukom Professor of Law at Stanford Law School and the Director of the Stanford Program in Law, Science and Technology. He teaches intellectual property, patent law, trademark law, antitrust, the law of robotics and AI, video game law, and remedies and he is the author of 11 books and 218 articles.

    9 min
  5. AI lab TL;DR | Jacob Mchangama - Are AI Chatbot Restrictions Threatening Free Speech?

    11/04/2024 · BONUS

    AI lab TL;DR | Jacob Mchangama - Are AI Chatbot Restrictions Threatening Free Speech?

    🔍 In this TL;DR episode, Jacob Mchangama (The Future of Free Speech & Vanderbilt University) discusses the high rate of AI chatbot refusals to generate content for controversial prompts, examining how this may conflict with the principles of free speech and access to diverse information. 📌 TL;DR Highlights ⏲️[00:00] Intro ⏲️[00:51] Q1-How does the high rate of refusal by chatbots to generate content conflict with the principles of free speech and access to information? ⏲️[06:53] Q2-Could AI chatbot self-censorship conflict with the systemic risk provisions of the Digital Services Act (DSA)? ⏲️[10:20] Q3-What changes would you recommend to better align chatbot moderation policies with free speech protections? ⏲️[15:18] Wrap-up & Outro 💭 Q1 - How does the high rate of refusal by chatbots to generate content conflict with the principles of free speech and access to information? 🗣️ "This is the first time in human history that new communications technology does not solely depend on human input, like the printing press or radio." 🗣️ "Limiting or restricting the output and even the ability to make prompts will necessarily affect the underlying capability to reinforce free speech, and especially access to information." 🗣️ "If I interact with an AI chatbot, it's me and the AI system, so it seems counterintuitive that the restrictions on AI chatbots are more wide-ranging than those on social media." 🗣️ "Would it be acceptable to ordinary users to say, you're writing a document on blasphemy, and then Word says, 'I can't complete that sentence because it violates our policies'?" 🗣️ "The boundary between freedom of speech being in danger and freedom of thought being affected is a very narrow one." 🗣️ "Under international human rights law, freedom of thought is absolute, but algorithmic restrictions risk subtly interfering with that freedom.(...) These restrictions risk being tentacles into freedom of thought, subtly guiding us in ways we might not even notice." 💭 Q2 - Could AI chatbot self-censorship conflict with the systemic risk provisions of the Digital Services Act (DSA)? 🗣️ "The AI act includes an obligation to assess and mitigate systemic risk, which could be relevant here regarding generative AI’s impact on free expression." 🗣️ "The AI act defines systemic risk as a risk that is specific to the high-impact capabilities of general-purpose AI models that could affect public health, safety, or fundamental rights." 🗣️ "The question is whether the interpretation under the AI act would lean more in a speech protective or a speech restrictive manner." 🗣️  "Overly broad restrictions could undermine freedom of expression in the Charter of Fundamental Rights, which is part of EU law." 🗣️ "My instinct is that the AI act would likely lean in a more speech-restrictive way, but it's too early to say for certain." 💭 Q3 - What changes would you recommend to better align chatbot moderation policies with free speech protections? 🗣️ "Let’s use international human rights law as a benchmark—something most major social media platforms commit to on paper but don’t live up to in practice." 🗣️ "We showed that major social media platforms' hate speech policies have undergone extensive scope creep over the past decade, which does not align with international human rights standards." 🗣️ "It's conceptually more difficult to apply international human rights standards to an AI chatbot because my interaction is private, unlike public speech." 🗣️ "We should avoid adopting a 'harm-oriented' principle to AI chatbots, especially when dealing with disinformation and misinformation, which is often protected under freedom of expression." 🗣️ "It's important to maintain an iterative process with AI systems, where humans remain responsible for how we use and share information, rather than placing all the responsibility on the chatbot." 📌 About Our Guest 🎙️ Jacob Mchangama | The Future of Free Speech & Vanderbilt University  𝕏  https://x.com/@JMchangama 🌐 Article | AI chatbots refuse to produce ‘controversial’ output − why that’s a free speech problem https://theconversation.com/ai-chatbots-refuse-to-produce-controversial-output-why-thats-a-free-speech-problem-226596 🌐 The Future of Free Speech https://futurefreespeech.org  🌐 Jacob Mchangama http://jacobmchangama.com  Jacob Mchangama is the Executive Director of The Future of Free Speech and a Research Professor at Vanderbilt University. He is also a Senior Fellow at The Foundation for Individual Rights and Expression (FIRE) and author of “Free Speech: A History From Socrates to Social Media”.

    16 min
  6. AI lab TL;DR | Jurgen Gravestein - The Intelligence Paradox

    10/21/2024 · BONUS

    AI lab TL;DR | Jurgen Gravestein - The Intelligence Paradox

    🔍 In this TL;DR episode, Jurgen Gravestein (Conversation Design Institute) discusses his Substack blog post delving into the ‘Intelligence Paradox’ with the AI lab 📌 TL;DR Highlights⏲️[00:00] Intro⏲️[01:08] Q1-The ‘Intelligence Paradox’:How does the language used to describe AI lead to misconceptions and the so-called ‘Intelligence Paradox’?⏲️[05:36] Q2-‘Conceptual Borrowing’:What is ‘conceptual borrowing’ and how does it impact public perception and understanding of AI?⏲️[10:04] Q3-Human vs AI ‘Learning’:Why is it misleading to use the term ‘learning’ for AI processes and what this means for the future of AI development?⏲️[14:11] Wrap-up & Outro 💭 Q1-The ‘Intelligence Paradox’ 🗣️ What’s really interesting about chatbots and AI is that for the first time in human history, we have technology talking back at us, and that's doing a lot of interesting things to our brains.🗣️ In the 1960s, there was an experiment with Chatbot Eliza, which was a very simple, pre-programmed chatbot (...) And it showed that when people are talking to technology, and technology talks back, we’re quite easily fooled by that technology. And that has to do with language fluency and how we perceive language.🗣️ Language is a very powerful tool (...) there’s a correlation between perceived intelligence and language fluency (...) a social phenomenon that I like to call the ‘Intelligence Paradox’. (...) people perceive you as less smart, just because you are less fluent in how you’re able to express yourself.🗣️ That also works the other way around with AI and chatbots (...). We saw that chatbots can now respond in extremely fluent language very flexibly. (...) And as a result of that, we perceive them as pretty smart. Smarter than they actually are, in fact.🗣️ We tend to overestimate the capabilities of [AI] systems because of their language fluency, and we perceive them as smarter than they really are, and it leads to confusion (...) about how the technology actually works. 💭 Q2-‘Conceptual Borrowing’ 🗣️ A research article (...) from two professors, Luciano Floridi and Anna Nobre, (...) explaining (...) conceptual borrowing [states]: “through extensive conceptual borrowing, AI has ended up describing computers anthropomorphically, as computational brains with psychological properties, while brain and cognitive sciences have ended up describing brains and minds computationally and informationally, as biological computers."🗣️ Similar to the Intelligence Paradox, it can lead to confusion (...) about whether we underestimate or overestimate the impact of a certain technology. And that, in turn, informs how we make policies or regulate certain technologies now or in the future.🗣️ A small example of conceptual borrowing would be the term “hallucinations”. (...) a common term to describe when systems like chatGPT say something that sounds very authoritative and sounds very correct and precise, but is actually made up, or partly confabulated. (...) this actually has nothing to do with real hallucinations [but] with statistical patterns that don’t match up with the question that’s being asked. 💭 Q3-Human vs AI ‘Learning’ 🗣️ If you talk about conceptual borrowing, “machine learning” is a great example of that, too. (...) there's a very (...) big discrepancy between what learning is in the psychological terms and the biological terms when we talk about learning, and then when it comes to these systems.🗣️ So if you actually start to be convinced that LLMs are as smart and learn as quickly as people or children (...) you could be over attributing qualities to these systems.🗣️ [ARC-AGI challenge:] a $1 million USD prize pool for the first person that can build an AI to solve a new benchmark that (...) consists of very simple puzzles that a five-year old (...) could basically solve. (...) it hasn't been solved yet.🗣️ That’s, again, an interesting way to look at learning, and especially where these systems fall short. [AI] can reason based on (...) the data that they've seen, but as soon as it (..) goes out of (...) what they've seen in their data set, they will struggle with whatever task they are being asked to perform. 📌 About Our Guest🎙️ Jurgen Gravestein | Sr Conversation Designer, Conversation Design Institute (CDI) 𝕏   https://x.com/@gravestein1989 🌐 Blog Post | The Intelligence Paradoxhttps://jurgengravestein.substack.com/p/the-intelligence-paradox🌐 Newsletterhttps://jurgengravestein.substack.com🌐 CDIhttps://www.conversationdesigninstitute.com🌐 Profs. Floridi & Nobre's articlehttp://dx.doi.org/10.2139/ssrn.4738331🌐 Jurgen Gravesteinhttps://www.linkedin.com/in/jurgen-gravesteinJurgen Gravestein is a writer, conversation designer and AI consultant. He works at the CDI, the world’s leading training and certification institute in conversational AI. He also runs a successful Substack newsletter “Teaching computers how to talk”.

    15 min
  7. AI lab TL;DR | Stefaan G. Verhulst - Are we entering a Data Winter?

    09/30/2024 · BONUS

    AI lab TL;DR | Stefaan G. Verhulst - Are we entering a Data Winter?

    🔍 In this TL;DR episode, Dr. Stefaan G. Verhulst (The GovLab & The Data Tank) discusses his Frontiers Policy Labs contribution on the urgent need to preserve data access for the public interest with the AI lab 📌 TL;DR Highlights ⏲️[00:00] Intro ⏲️[01:13] Q1-‘Data Winter’: Can you provide a brief overview of your concept of 'Data Winter' and why you believe we are on the brink of entering one? ⏲️[05:05] Q2-Generative AI-nxiety: What are some of the most significant challenges currently hindering public access to social media and climate data, and the effects of Generative AI-nxiety? ⏲️[07:49] Q3-‘Decade for Data’: Could you outline what the “Decade for Data” initiative entails and how it could transform data stewardship and collaboration? ⏲️[12:25] Wrap-up & Outro 💭 Q1-‘Data Winter’ 🗣️ At the time of an AI summer, when everyone suddenly is excited about the potential of generative AI (...) for public interest purposes, (...)  we are actually entering a data winter. 🗣️ What I’ve witnessed the last few months, and that’s mainly as a result of advances in artificial intelligence, is that we actually see a backtracking of the progress that we’ve made in society as it relates to opening up data for public interest purposes. 🗣️ Social media platforms such as X, but also Facebook, have closed down access to some of their data for research and for data journalism purposes as well. 🗣️ Science data, such as climate science data, which was typically open science, has now become commercialised and is becoming proprietary data enclosed for many in society. 🗣️ The initial data that was available for training data has now also become much harder to access, a result of concerns that some of that data has been extracted without a return to the data holder. 💭 Q2-Generative AI-nxiety  🗣️ Some of the data that typically was available through APIs has now been closed off, and so some are calling this the post-API environment that we're currently in, where data was easily available through an API now is actually much harder to access unless one pays for it. 🗣️ New licensing is being used to actually shield off the data for public interest purposes as well. So there are a whole range of vehicles that exist to enclose data that actually makes it much harder to access it for reuse. 🗣️ We see a decline in access to Wikipedia, a decline in people accessing Wikipedia, and a decline in people contributing to Wikipedia, mainly because they fear that whatever they contribute will be used as training fodder for generative AI purposes. 🗣️ Initiatives like Wikipedia, which are to a large extent the main source of a lot of the training data of generative AI services, are currently also suffering from AI extraction because they are dependent on voluntary contributions by the audience and the participants. 🗣️ As a result, we are entering a data winter, which if we are not careful (...) may actually affect the AI summer that we currently have as well. 💭 Q3-‘Decade for Data’  🗣️ I’ve been calling for, together with others, such as the United Nations University, a Decade for Data, which is a typical way the United Nations often operates, to feature a problem and then have a well-defined strategy to address that problem. 🗣️ A Decade for Data would have multiple components, one being advancing data collaboration, where you actually have new models of data being shared, including data commons, which can be updated in the current AI environment. 🗣️ We need a new reimagined profession of data stewards that are individuals or teams who have the sophistication and competencies to provide access to data in a systematic, sustainable, and responsible manner. 🗣️ A Decade for Data would also involve rethinking data governance and embedding digital self-determination in data governance to go beyond the current paradox of consent, facilitating access in a way that aligns with perceptions, expectations, and preferences of communities. 🗣️ Establishing a social license for reuse is key, where you understand the preferences and expectations of communities and individuals, translating that into a social license so that data can be reused in a way that is trusted and aligned with community expectations. 📌 About Our Guest 🎙️ Dr. Stefaan G. Verhulst | Co-Founder, The GovLab & The Data Tank 🌐 Frontiers Policy Labs | Are We Entering a Data Winter? https://policylabs.frontiersin.org/content/commentary-are-we-entering-a-data-winter  🌐 The Data Tank https://datatank.org  🌐 GovLab https://thegovlab.org  🌐 Dr. Stefaan G. Verhulst https://www.linkedin.com/in/stefaan-verhulst  Dr. Stefaan G. Verhulst co-founded several research organisations, including the GovLab (New York) and The DataTank (Brussels). He focuses on using advances in science and technology, including data and AI, to improve decision-making and problem-solving and has been recognized as one of the 10 Most Influential Academics in Digital Government globally.

    13 min

About

AI lab podcast, "decrypting" expert analysis to understand Artificial Intelligence from a policy making point of view.

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada