Exeter Postgraduate Researcher Podcast

exeterpostgraduateresearcherpodcast

A PGR podcast, by PGRs for PGRs. Our PGR presenters interview a range of guests to explore the topics other PGRs want to know about. The episodes mix personal perspectives with practical advice. These interviews were conducted over MS Teams to enable more people to participate. We welcome all constructive feedback; if you would like to leave feedback on our 2025 series, please fill out our quick survey.

Episodes

  1. 04/15/2024

    Episode 3- Open Research- Prof Sabina Leonelli (Professor of Philosophy and History of Science)

    Prof. Sabina Leonelli (Professor of Philosophy and History of Science) talks to Dr Chris Tibbs, Research Data Officer at University of Exeter about open research, the use of Artificial Intelligence in research, and the importance of understanding the diversity of research environments when implementing open research practices. Podcast transcript Chris Tibbs: Hello and welcome. I'm Dr Chris Tibbs and I'm the University Research Data Officer, part of the open research team based in the library here at the University of Exeter. My role involves providing support for researchers across the university as they work with and manage their research data, and today I have the pleasure to be joined by Professor Sabina Leonelli, a Professor of Philosophy and History of Science at the University of Exeter. So welcome, Sabina. Just to start, would you like to tell us a little bit about the research area that you work in? Sabina Leonelli: Thank you and hello everyone. So, I'm interested in the dynamics of research and research processes. Why is it that people who work in science use the methods that they use, handle data in particular ways, decide to publish in particular ways? Why do they choose certain research goals, and how does that occur historically but also conceptually? And what are the social implications of those choices? Chris Tibbs: That's very interesting. So you're really looking at these sort of different approaches and the different methodologies that different researchers are taking and that's very interesting because obviously different research areas will have different approaches and methodologies that they use. Now one thing that I noticed that you're very interested in, based on your web profile, is obviously open science and openness in research, and the European Commission and the United Nations, among others, all use this term of open science and just so that everyone listening is clear, open science is the approach to research based on openness and co-operative working, and it really emphasises the sharing of knowledge, results and the tools as widely as possible. But I just wanted to point out also that obviously these approaches can apply to all research disciplines, not just science. And so, for example, we are the open research team. And so, I tend to regard open research and open science as synonymous. So, I just wanted to get your take on this, Sabina. Do you see these as separate terms, or do you use them interchangeably? Sabina Leonelli: I also tend to use them interchangeably, but I think it is very unfortunate that it’s the term open science that has gotten so much mileage in the English language because in the English language we are aware of the fact that it does tend to be taken to refer to the natural sciences, more rarely to the social sciences, and never to the humanities and the arts. And this is different for lots of other languages. I mean, most, I guess famously the term wissenschaft in German tends to encompass all of the research domains, including humanities and the arts. I’m very partial to that, partly because I think that we're in a moment where research is so interdisciplinary and the boundaries between domains are so blurred, that actually making strict distinctions between what counts as a humanist approach, and what counts as a natural science approach, or a mathematical approach is becoming more and more difficult. As of course in history it has been very difficult throughout. So yeah, so I'm very partial to the use of the idea of open research in English, but of course we tend to use a lot the term open science too, because this is, as you were saying, very well recognised by policymakers and by funding bodies and a lot of people working in academia more generally. Chris Tibbs: OK. Well, thank you for explaining that. And again, the reason I just wanted to confirm this is because I want to ensure that everyone listening can be clear that what we mean by the term open science and that they don't feel that this doesn't apply to them, maybe because they don't see themselves as a scientist. So that's what I just wanted to clear up, and so that these practises do apply to all disciplines. Now moving on. Sabina, you hold many different roles and one in particular that I would like to mention is that you are the theme lead for the data governance, openness and ethics strand of the Exeter Institute for Data Science and Artificial Intelligence. So, given this particular role, I'd really be interested to hear your thoughts on how you feel artificial intelligence can play a role in the research process, and particularly around openness and the open research. Sabina Leonelli: Yes, thank you. So, I guess openness lies at the heart of what it means to do research, no matter how you look at it, right. I mean, doing research basically means trying to answer a certain question, trying to solve a problem that you may have encountered in your everyday life and within the more scientific landscape, it means doing it in a way that's a bit more systematic, that is susceptible to scrutiny and can be evaluated by others. So, research and science are public enterprises pretty much by definition. If it was only something that, you know, the one individual does in their own room, then it wouldn't really be something that we count as being research or science. And in that sense, openness, the availability of the outputs of research, being able to discuss the methods one uses, the procedure one uses and make them available for scrutiny really are what defines the very idea of research. So, given this, of course, the fact that we now have an emergence of more and more artificial intelligent tools that can be pretty much directly applied to the research process affects the ways in which we think about openness, because this means accelerating the research process to some extent. It means the potential to automate some parts of it, or at the very least work together with machines so that some of the, if you want, hopefully at least more tedious tasks associated to research, the more repetitive ones, the ones that more easily standardised can actually be delegated to machines and then can be iteratively dealt with in collaboration with humans. So in that respect there is a strong temptation to think that bringing AI into research processes will almost automatically improve the openness of research because it will allow, and in fact it will be almost incentive for people to make their methods ever more transparent, to make their data more available, and to be more careful in noting down the procedures that they’re using and make them available to others because all of those strategies make it easier to adapt research work to AI and to make it machine readable if you want so that machines can actually take over some parts of that work. The problem in assuming that AI automatically enhances openness comes at several levels, however. First of all, there is this problem, I'm sure many people have heard about of opacity in AI. The fact that because a lot of the reasoning that machines go through to produce certain outputs, so the type of algorithms that are used, particularly machine learning, tend to become less and less transparent and more and more opaque as time goes by – precisely because the machine is doing operations that humans wouldn't quite do or wouldn't be able to follow in the same way, and we can't quite track every single step that the machine is making in that sense. Then actually the system that is AI powered, becomes by definition less open because it's less obvious how do we read the system, how do we make it more transparent, more scrutinizable and more open for review. Given that there are all these parts of the system which are not necessarily intelligible to humans. So that's one issue that is happening in the area. Another very big issue is the fact that many of the providers of artificial intelligence technologies and particularly tools that are then applied in research, we can only think about large language modelling and tools like Chat GPT, which is produced by a company which contrary to his name, is not publicly funded but is a private company, Open AI. Many of those tools are privately funded. The ways in which they operate is even less transparent because a lot of the algorithms that are used, are actually trademarked, are not available for public scrutiny. A lot of the training data that is used to refine those algorithms is also not necessarily transparent, in some cases not even entirely clear that the data are data that are, you know, in fact right to use, they tend to have been data scraped off the internet in a variety of ways that may or may not be ethically acceptable. And so, we are in a situation where whenever we pick a tool from the internet thinking, oh, great, this is going to help me to do my bibliography; this is going to help me to write my essays; this is going to help me to search for, you know, literary sources on a particular topic, we are immediately given data and relying on tools that are not openly accessible, that have been developed in a way which is not immediately scrutinizable. And in fact, it may be using some of our own information in a way that is not open. So, let's just say that there are quite a few questions that are open around whether the use of AI in research in fact is favouring openness, and whether that's going to happen more and more in the future or in fact it’s going to have the opposite effect. Chris Tibbs: Wow, that's really interesting. I really love that sort of yes, you might think that it's going to help make research more open, but actually in fact it's not so clear and so that is really interesting and I think, the one thing I just wanted to add is about obviously any data that's being used by AI systems, the data need to be well maintained, well documented. You need good data going into training t

    29 min
  2. 11/16/2023

    Episode 2- Open Research- Dr Gavin Buckingham (Associate Professor in Public Health and Sport Sciences)

    Dr Gavin Buckingham (Associate Professor in Public Health and Sport Sciences) talks to Dr Chris Tibbs, Research Data Officer at University of Exeter about the different types of research data he works with and best practices for managing research data during your project. Podcast transcript Chris Tibbs: Hello and welcome. I'm Dr Chris Tibbs and I'm the University of Research Data Officer, part of the open research team based in the library here at the University of Exeter. My role involves supporting researchers across the university as they work with and manage their research data, and so this episode is going to be all about research data and how best to look after it and manage it during your project. And to discuss all of this, today I have the pleasure to be joined by Dr Gavin Buckingham, an Associate Professor in Public Health and Sport Sciences here at the University of Exeter. So just to start with Gavin, would you like to tell us a little bit about your research and the different types of data that you work with?   Gavin Buckingham: Hi, there, Chris. Yeah, I'm a cognitive psychologist by training, and I'm interested in human perception and human motor control. And I've been looking at this in the context of measuring the movements and forces people apply to pick objects up, and more recently I've been looking at this in the context of immersive virtual reality as well. Now, most of this data takes the form of pretty simple time streams, time series of data, so numbers representing forces or positions of things in multiple dimensions, and their expression over time. So many thousands of lines of data potentially that we then take maybe the largest value or the value at some critical other time points and that reflects some aspect of human behaviour. So that pretty simply is really what it is that we deal with here.   Chris Tibbs: So thinking about all those types of data that that you're working with, I mean you mentioned, like numerical time series data. I just want to point out that, you know, data can also mean a wide variety of other types of data and many people might not think that they work with data. But generally, when I refer to data, you know, I'm thinking about any sort of information, evidence, materials that are being collected and used for that research. So I’d just like to hear your thoughts on, so when you're thinking about your data and why it's important that you look after your data and you manage your data in terms of helping your research and also then potentially making that data available.   Gavin Buckingham: Yeah, it's a really interesting question because the pipeline that goes from the stuff that comes out of the apparatus that I used to capture people's data to the things that are subsequently reported in the paper, that's a pretty lengthy pipeline that has many different steps. And those steps can be fairly clearly articulated, but being able to show the consequences of each of those steps, I think is a really key part in terms of people being able to eventually understand your data and make sense of it and use it in other sorts of ways and I really feel that's the narrative I feel most passionately about in many ways. I'm perhaps, slightly selfishly, I'm not so interested in other people finding mistakes that are present in my data, God forbid, but I'm more interested in this resource that was collected that could potentially be a useful thing for other people in ways that I cannot even really imagine. That for me is the really big value I see in my dataset and I work with clinical populations. I work with children, with older adults, typically developing university aged people, all of whom have interesting ways that they interact with the world around them that you know could feed into hitherto unforeseen mechanisms or rehabilitation or technological advances and, you know, I really see sort of the value of data just sitting there waiting for someone to be able to harvest in that way. Chris Tibbs: Yeah, all of this sort of potential that's in that data, that you know, doing analysis that are just completely irrelevant, that are completely separate from your research. So when did you sort of first start thinking about making, like managing your data, to make it available so that others could have it, and be able to analyze it? Was this sort of something that you had a discussion with, maybe your supervisor as a PhD student? Was this something that, you know, you sort of just picked up on sort of later during your career? Gavin Buckingham: Yeah. When I was a PhD student and postdoc, this wasn't really part of the narrative at all. There was no real sense that this is what you would do, but it was actually more to do with the experimental and analytical code: the MATLAB files in my case that I fairly vividly remember asking someone if I could use the MATLAB files to run an experiment of my own, and they're like, well, these were developed in collaboration with my colleagues and it cost money to get these developed, so probably not. And I was sort of thinking to myself, that's a bit of a disappointing perspective given that this doesn't directly earn anyone any money and gate keeping it from me isn't stopping you getting the benefit from them. So when I got my first lectureship, I was given, as part of my start-up contract, a research system to help develop the code that would underpin the data collection in my lab and I was sort of very clear in my head that data will be available to everyone and I started creating a wiki from my lab webpage and you know a lot of this is lucky me to have the resources and the skilled person available to do this and set this up from the beginning. But really that was kind of the key, the key step as far as I was concerned. You know, once all of this MATLAB code to control the data acquisition unit in the force transducers that underpinned all of my research at the time was up online. That was number one, a really nice way for me to stay on top of something that someone else had written for me, which was a new experience for me anyway. But also to share with the world and you know, I mean sort of going forward since then I've had seven or eight people set up their labs with that code and it's a pretty niche research field, but it feels really nice to know that that code has been used in this way for this particular purpose. And then from then the sharing of data kind of felt like a pretty natural step once that became part of the narrative on social media in particular, is seeing people talk about this on Twitter, that has been really formative part of my education in this area.   Chris Tibbs: That's really interesting and just picking up on something. So you mentioned this wiki for your lab. So, this is obviously something that that you discussed with your team and with the PhD students that you supervise. So I mean, you made a concerted effort that this would be part of this. Obviously, they're learning when you're helping them to develop as researchers on their own. You made a concerted effort that this would be part of that process? Gavin Buckingham: Yes, although maybe perhaps not as aggressively as one might imagine. I certainly don't mandate things like data sharing or sharing of code, because at the end of the day, particularly if you're future life is not likely to be outside of academia and you have potential intellectual property issues, or you want to display your own evidence of your expertise, that's done in very different ways in very different fields. So I encourage and I support my trainees to provide basically everything as open as it possibly could be, but I'm not that interested in mandating it to them. As it stands, they've been even more enthusiastic in their uptake of this than I have and you know, certainly some of my PhD students have improved my own nascent processes quite substantially and taught me things and do stuff a lot better than I'm able to do as it stands. Chris Tibbs: So do you have any tools or techniques that you could share in terms of, so some of these examples of where you and students from your lab are sort of building-in these sort of best practices? You mentioned the wiki, you mentioned about data sharing. So is there any, you know, like sort of examples of like a tool or you know something that you could sort of just share, some sort of, this is one thing that we have done in our lab? Gavin Buckingham: Every project that gets up and running in my lab, there's an Open Science Framework (OSF) page created for it. That Open Science Framework page might exist as nothing other than a place to put a preprint of the paper at the point of publication. So I know that everyone has access to at least the version of the scientific outputs, which I feel very, very strongly about. That seems like a complete no brainer, zero effort thing to happen. Oftentimes that's accompanied by a pre-registration document, be it a version of the introduction that we'd sort of hashed out together, me and the trainee, or a template from As Predicted or something like that. Eventually, this is also often populated with individual participant data and then the summary statistics that would have been used to calculate the F ratios and P values and things like that, and the statistical analysis and the supplementary materials that would go alongside the paper as well. So it becomes just this wonderful, convenient storage place to segregate everything to do with that particular research project, which as I've progressed through my career and I am working concurrently on what feels like 1000 different things at the same time, it's incredibly, I would say essential. An essential part of my practice, because otherwise I'd be like relying on my, uh, incoherent filing system to keep track of everything, whereas now I can look in my OSF page and all the things that are shared with me and capture a huge amou

    25 min
  3. 08/15/2023

    Episode 1- Open Research- Dr Eilis Hannon (Senior Research Fellow in the Complex Disease Epigenetics Group at the University of Exeter Medical School)

    Dr Chris Tibbs, Research Data Officer at University of Exeter, discusses research data and how best to manage that data during your project with Dr Eilis Hannon, Senior Research Fellow in the Complex Disease Epigenetics Group at the University of Exeter Medical School. Podcast transcript Chris Tibbs:  Hello and welcome. I'm Dr Chris Tibbs and I'm the University of Research Data Officer, part of the  Open Research team based in the library here at the University of Exeter. So my role involves  supporting researchers across the university as they work with and manage their research data, and  so this episode is going to be all about research data and how best to manage that data during your  project. And to discuss all of this today, I have the pleasure to be joined by Dr Eilis Hannon, a senior  research fellow in Clinical and Biomedical Sciences here at the University of Exeter. So Eilis, would  you like to tell us a little bit about your research, what it involves and the different types of data that  you work with? Eilis Hannon: Yes. Well, thank you very much for inviting me along today. So I'm based in the complex disease  epigenetics group and we have a group of mixed modalities. We've got wet lab scientists and dry lab  scientists, like myself. So we generate and analyze quite a lot of genomic data. So we're primarily  interested in the brain and modelling gene regulation in the brain and we're in a really exciting time  where there are so many different technologies and experiments that we can take advantage of, that the quantity of data we've started to generate has just kind of exploded. So from one single  sample, we can have kind of, you know, be 4, 5, 6 different experiments and kind of layers of data.  And so what I'm quite interested in doing is trying to integrate those different layers together. So a  lot of what I'm working with is experimental data, but because a lot of these technologies are quite  new, we're often developing new methods to analyze them in parallel. And so what we also do  sometimes is simulate data where we kind of know what it looks like. We know what the outcome  should be to kind of test and develop methods. So it's quite a broad spectrum of different data type. Chris Tibbs: Yeah. So you mentioned it there, right? So you maybe have simulated data, you've experimental  data, and so I just wanted to pick up on the point here when we're talking about data and this  obviously might mean different things to different people. And so if you're listening to this  discussion and thinking, oh well, I don't work with data or this doesn't apply to me, then I just want  to really make clear that when I refer to data or research data, it really means all of the information  or the evidence or the materials that are generated or collected or being used for the research, and so that we're clear about data and what it refers to. Why is it so important to manage this data  effectively? I mean, you talked about you're producing a large quantity of data, so I'm guessing that's  one of the reasons why it's important to look after it. Eilis Hannon: Yes. So from my point of view, efficiency in terms of processing that data in, I mean you know if it  wasn't organised in a kind of sensible or a kind of pre-planned format, then it would be incredibly  challenging to work with, so from you know, we take advantage of the high performance computing  available at the University and so to do that efficiently, we need to kind of have some pre-described  format for the data. But there's also ethical implications. So, you know, we're working on data  generated ultimately from a piece of human tissue. So we have requirements in terms of how we  look after that data, what we do with it. Who uses it and how? So we need to make sure that you  know our data is organized that such that those requirements can be met. But also, you know, one  of the really nice things about what we do is from one experiment you can answer lots and lots of  different research questions. So different people within the research group will be taking advantage  of the same dataset. And to, you know, to really maximize that utility, we need to, you know,  organize it in a way that we can find it. We know what's what. And we can really reap the benefit of  that initial kind of financial investment. Chris Tibbs: Yeah. So it's obviously clear, especially if multiple people are working, doing different analysis on the  same data. It's obviously important to know what the data are and make sure that they're obviously  described and who's doing what on the data, and version control, I imagine is something that's very  important for you. Like, it's clear that the data are fundamental for the research, right, and it doesn't  matter if you have, you know, the most sophisticated methodology to analyse the data, if the data  are not described or the data are inaccurate then your results are not going to be good. They’re  going to be inaccurate. They’re not going to be clear. So this is something obviously that you're  doing at the minute, your managing your data. When did you really first start thinking about the idea  of, you know, managing your data, particularly with the aim of potentially making it available to  others to validate or to build upon your research? Was this something that your supervisor discussed  with you as a PhD student or was this something that you sort of picked up later on in your career? Eilis Hannon: So during my PhD, which I did in Cardiff, I was using publicly available data and so I had quite a naïve, I guess, view of kind of experimental work and when I came to Exeter and joined a team where we  generated the data we analyzed, you suddenly start to realize that, you know, of course,  experiments aren't perfect. Of course they don't work as expected all the time. And you know, I  gained a real insight at that point because obviously questions about how we use the data, how we  process the data and how we ultimately share the data became a lot more relevant to my work. But I  also gained a huge insight, you know, being much more aware of the whole kind of research process  from kind of study design, generating data, analysing the data and publishing it, you know, kind of  what the requirements were and also the kind of challenges with data generation, and so that was,  you know, I strongly recommend it to anybody who sees themself more as an analyst, that actually  the insights you gain from working closely with the people that generate the data are just  unfathomable really. It really opens your eyes and gives you a much, I guess I think much more  holistic view of research. Chris Tibbs: It's really interesting how your perspective changed from someone who's just analyzing the data to  someone who actually is experimenting and generating the data, right? That's a really interesting  view. When you start to be someone who's producing data and potentially sharing it, then it's a lot  more important to think about all of these processes. So talking about these processes of looking  after the data, I mean what sort of tools or techniques would you recommend to someone who's  interested in, you know, making sure that they manage their data effectively or looking after their  data? Eilis Hannon: So I think, forward planning where you can. Thinking about where you're ultimately trying to get to  in terms of you know what format do you need the data in to do the analysis that you want to do? But also thinking about kind of, particularly working with large datasets like we tend to, we can't  store kind of multiple iterations. We need to be quite practical about what are the core stages that  we need to stay, and actually if you sit down and think about it, for us the most important parts of  the data are the raw data and then our analysis scripts, because from there we can recreate  anything that we've kind of done after that point, if we were to lose it in some kind of, you know  freak event or something. It's very tempting to hoard these kind of intermediate datasets, but often  they actually make your life much harder because you can't actually remember at what stage each  file relates to. And so the kind of more streamlined you can be, in terms of what you save and what  you keep, does actually make management during these data much easier and, you know, clear  records in terms of having scripts can also help you navigate that process. And as you become kind  of more ingrained in your project, you do start to realize what the kind of critical points are that you  want to kind of save and keep a record of. Chris Tibbs: So you mentioned two points there that I'd like to pick up on. First of all forward planning, which I  completely agree is very important. And so I just wanted to at this point highlight sort of the  importance of a data management plan to do exactly that. And so this is the plan that you develop  sort of at the beginning of the project and thinking about all of those things that you talked about in  terms of what it is you want to do with the data and trying to think about them from the beginning so that you can identify potential obstacles or issues and then try and plan around them to mitigate  them. So that's definitely very important. And then the other thing you picked up on was about code  and reproducibility. So would you, would you say that for someone who's, you know, working in a  similar area or with similar types of data that it's really a requirement to learn a programming  language, so Python or R, to really ensure that not only is their analysis reproducible for someone  else, but also for them. So like you mentioned, then you just need essentially the raw data and the  code, and you can reproduce the analysis. So would you say that’s sort of a requirement? Eilis Hannon: I would strong

    26 min

About

A PGR podcast, by PGRs for PGRs. Our PGR presenters interview a range of guests to explore the topics other PGRs want to know about. The episodes mix personal perspectives with practical advice. These interviews were conducted over MS Teams to enable more people to participate. We welcome all constructive feedback; if you would like to leave feedback on our 2025 series, please fill out our quick survey.