FLOSS for Science

FLOSSforScience
FLOSS for Science

Interviews with scientists who are using or developing free and libre open source software

  1. 04/27/2021

    EP031 GNU licenses

    Note : This interview was recorded in the summer of 2020. However, due to the pandemic we could not release the episode timely. Therefore, the current status of FSF and recent events are not discussed in this episode. In episode 31, we interviewed Craig Topham from the Licensing and Compliance Team of the GNU Project about GNU software licenses. We started by discussing about his involvment in the compliance team at the Free Software Foundation (FSF) and what got him interested in the topic of free software. The portion of the interview centered around the GNU project with an emphasis on the GNU software licenses. We went through the GPL, LGPL, AGPL and GFDL licenses to explain some of their differences and why you may want to use one instead of another. We asked questions about the specificities of licensing your code in the context of scientific software and the issue with licenses proliferation. Some of the differences between the different versions of the GPL were presented later in the discussion to show the improvements brought in the version 3 in regard to the compliance and patent sections. We asked him about his take on the philosophical differences between GNU style licenses and the MIT/BSD licenses in regard to the debate between user and developper freedom. We followed by going through some myths surrounding the GNU licenses and a general discussion about freedom and privacy. We finished the interview with our usual quick questions.

    49 min
  2. 09/03/2020

    EP030 Spack: a package manager for supercomputers

    In episode 30, we interviewed Todd Gamblin from the Lawrence Livermore National Laboratory about the Spack project. We discussed his current research project along with his involvement in Spack. We widely discussed the philosophy of Spack, some usage patterns, its capabilities for managing package management in HPC clusters as well as standalone computers and which operating systems it supports at the moment. Todd shared with us his opinion on the trend for containerized workloads to achieve reproducible science and why it may not be the goal we need to set. He highlighted for us the similarities and differences between EasyBuild and Spack as well as the origin of those differences. We finished the interview with our usual quick questions. 00:00:00 Intro music 00:00:17 Introduction 00:00:36 Introducing Todd Gamblin 00:00:58 His current research topics 00:01:23 Spack as official duties 00:01:43 Spack usage at Lawrence Livermore National Laboratory 00:02:01 Other research projects 00:02:47 Profiling in HPC 00:04:24 His role as leader of software packaging technology for the exascale computing project 00:04:58 One-minute elevator pitch for Spack 00:05:34 Spack's usage philosophy compared to other package managers 00:06:59 Installation from source code or binary? 00:07:28 Spack's usage in the top500 super computers 00:07:49 Geographical distribution of users 00:08:18 Number of packages in the repo and some examples 00:09:05 Managing computer clusters with Spack's automation capabilities 00:11:04 Module files in Spack 00:12:32 Syntax of a Spack package file 00:13:43 Configuration of compiler flags 00:15:00 Importing python libraries in the Spack files 00:15:48 The procedure to submit a package 00:16:27 Review process for new packages 00:17:34 Reasons for rejection of Spack packages 00:18:01 Operating systems supported by Spack 00:18:23 WSL and Spack 00:18:58 Restricting packages to certain hardware and software configurations 00:20:04 Build testing and nightly builds 00:21:28 Working with containers in a Spack environment 00:22:25 Deploying prebuilt containers 00:23:05 About the "universality" of containers 00:24:16 His opinion on containerized applications for reproducible science 00:26:17 Spack's log file to document reproducibility 00:27:13 Reproducing older results 00:28:10 Specifying requirements on compilers 00:30:39 Post-installation verification test 00:31:10 Using Spack on a standalone computer instead of HPC systems 00:32:56 Differences between EasyBuild and Spack 00:34:24 EasyBuild in the top500 00:34:49 Transitionning between EasyBuild and Spack 00:35:38 Other alternatives 00:36:23 Using EasyBuild and Spack on the same system 00:38:36 When did the project start? 00:39:53 External contributions to Spack 00:40:53 How many core developpers? 00:41:30 Organization of the community and governance model 00:43:06 Who decides which package is accepted in the repo? 00:44:38 Spack's choice of software license 00:47:09 Todd's vision about the importance of FLOSS for the openness of science 00:48:13 Possible negative impacts of FLOSS 00:48:58 Most notable recent scientific discovery 00:49:14 Favourite text processing tool 00:49:25 A topic in science about which he recently changed his mind about 00:49:58 Anything else we forgot to ask? 00:50:09 How to contact Todd 00:50:46 Conclusion

    52 min
  3. 07/01/2020

    EP029 Distributing Python packages with setuptools

    In episode 29, we interviewed Jason R Coombs from the setuptools project. We started with a discussion about his background and his interest for Python and other programming languages. Following that, we had a thorough discussion about setuptools. We covered topics such as how he got involved in the project, the nature and composition of a Python package, why packaging your code can be important even for small projects, the hidden complexity of binary packages in the Python Package Index and how to maintain compatibility between Python versions. We also had a brief segment about the security aspects of Python packages. He informed us about how you could start contributing to the project and where to discuss Python packaging. We then followed with a general discussion about FLOSS in science and the problem of long-term maintenance in academia. We concluded the interview with our usual quick questions. 00:00:00.000 Intro 00:00:23 Introducing Jason R. Coombs 00:01:28 The first programming languages he learned and how he got into Python 00:03:46 New interesting programming languages 00:05:07 His favourite past Python projects 00:06:53 His one minute elevator pitch for setuptools 00:08:00 The relation between setuptools, PIP and Anaconda 00:10:43 How he got involved with the setuptools project 00:14:43 What is a Python package ? 00:16:07 What can be included in a package? 00:16:36 At which point is it beneficial to create a package ? 00:18:04 Managing compatibility with multiple versions of Python 00:20:33 Advantages of packages for small projects 00:22:46 How much work is required to create a package ? 00:25:05 Files required to create a Python package 00:27:45 Licenses and readme for Python packages 00:30:51 The nature of distribution archives 00:31:27 Compatibility of binary archives 00:32:39 Eggs and wheel files 00:34:32 Dealing with non portable packages in the Python Package Index across multiple operating systems 00:37:49 Uploading packages to the Python Package Index 00:39:12 Review for broken or malicious code 00:40:08 Vulneraility from package removal in the Python Package Index 00:43:24 Package name collisions 00:45:13 How many packages are in the Python Package Index 00:45:25 Alternatives to the main Python Package Index 00:46:35 Other packaging tools 00:47:39 How many developpers are involved in the project 00:48:31 Communication channels and discussions about Python packaging 00:49:53 Openings for new contributors 00:50:59 Skills required to contribute 00:52:24 The challenge of long term maintenance of packages in academia 00:55:43 His vision about the importance of FLOSS for the openess of science 00:59:18 Disadvantage of using FLOSS 01:01:24 The most notable scientific discovery in recent years 01:02:13 Favourite text processing tool 01:03:23 A topic in science about which he recently changed his mind 01:04:50 Contact informations 01:05:23 Conclusion

    1h 7m
  4. 05/06/2020

    EP028 NumFocus: A Nonprofit Supporting Open Source

    In episode 28, we interviewed Leah Silen from the NumFocus organization. She introduced us to the goals and the mission of the organization. We then had a discussion about the different levels of support provided by the organization to its member projects. She informed us about the legal, financial, technological and logistical support that can be provided by NumFocus. Following that, we asked her about the revenue sources of the organization as well as the possible influence from the corporate sponsors over the decisions and governance of the organization. We also discussed of the requirements to become part of NumFocus including details about the application process. We had a brief discussion about the history of the project and the evolution of the scope of projects that are part of the organization. After discussing the governance of the organization, we concluded the interview with our usual questions. 00:00:00 Intro 00:00:18 Introducing Leah Silen 00:02:28 Goals and mission of NumFocus 00:03:06 Examples of supported projects 00:03:39 Status of sponsorded and affiliated projects 00:05:04 Advantages of one status over the other 00:05:48 Legal challenges for open source scientific projects 00:07:19 Financial support for scientific open source projects 00:10:13 Assistance to apply for external grants 00:11:01 Paying developers from outside of US? 00:11:43 Revenue sources for NumFocus 00:12:21 Levels of corporate sponshorships 00:13:14 The influence of corporate sponsors 00:13:56 Motivations of corporate sponsors 00:15:02 Some of the sponsors of the NumFocus organization 00:16:05 Technological support for projects 00:17:03 Events previously supported by the organization 00:18:09 The kind of support that can be provided for events 00:19:22 Requirements for new projects 00:21:10 Clarification about the meaning of being a scientific oriented project 00:23:11 Requirements about the team size and strong governance within projects 00:24:58 The application process 00:26:18 Duration of support 00:26:44 Timeframe to receive a response for an application 00:28:30 Feedback in the case of rejection 00:29:12 Are there downsides of becoming a part of NumFocus? 00:30:32 Additional administrative overhead? 00:32:01 Location of NumFocus staff members 00:33:00 Foudation of NumFocus and initial projects 00:34:05 Opening to projects outside of the Python ecosystem 00:34:54 Favourite project? 00:35:46 Initial role in NumFocus 00:36:53 Term duration for positions at the board of directors 00:37:24 Selection process for the board of directors 00:38:31 Leah's vision about FLOSS and its importance for the openness of science 00:39:04 Negative impacts of FLOSS 00:39:33 Most notable scientific discovery in recent years 00:39:59 Favourite text processing tool 00:40:37 A topic in science about which she changed her mind about 00:41:36 Anything else we forgot to ask about? 00:42:54 How to contact Leah Silen 00:43:14 Outro

    45 min
  5. 04/08/2020

    EP027 Scientific Computing with SciPy and NumPy

    In episode 27, we interviewed Ralf Gommers from the NumPy and SciPy projects. We started our discussion by talking about his past research experience as a physicist and his transition to open source software and programming. This led him to get involved in projects such as PyWavelets, NumPy and SciPy. Following that, we had a great discussion about NumPy, its many features, its target audience and its performance. We learned why NumPy is not included in Python's standard library and its overlap with Scipy. We also compared the combination of Matlab to NumPy and Python and how users could transition to this open source solution. We then had a brief discussion about SciPy and the features it provides. Ralf informed us of the positive results from Google's previous Summer of Code and Season of Docs participations. We discussed how to reach the project and the many kinds of contributions that they are looking for. We talked about the importance of FLOSS for science and attribution of research output. We finished the interview with our classic quick questions and a reflection from Ralf about the need for more sustainability in open source software development as volunteer effort may not be sufficient in the future. 00:00:00 Intro 00:00:18 Introduction 00:00:33 Introducing Ralf Gommers 00:02:05 Research during his PhD and and PostDoc 00:03:20 When he started to use open source tools 00:03:52 Learning to code 00:04:39 PyWavelets, another sideproject he likes 00:05:55 His elevator pitch for NumPy 00:06:55 Vector arrays in Python before NumPy 00:07:49 How he got involved in the NumPy project 00:10:13 Traget users for NumPy 00:11:36 NumPy as part of the standard library? 00:13:24 Features provided by NumPy 00:14:22 Major differences between Python built-in list and NumPy's array 00:16:01 Structured data 00:16:45 Why appending a row to an array is made hard 00:18:09 Multithreaded code with NumPy 00:19:48 Distributed array processing 00:20:50 GPU computation with Python and NumPy 00:22:16 Linear algebra functions in NumPy 00:23:25 Overlap between SciPy and NumPy for linear algebra 00:23:55 Python speed as an interpreted language 00:25:43 Python with NumPy compared to Matlab 00:28:07 How easy is the transition between Matlab and Python Numpy 00:29:26 Performance difference between Matlab and Python 00:31:00 Commercial applications of NumPy 00:32:15 Contributions from the industry ans incentives to contribute 00:34:10 Elevator pitch for SciPy 00:35:37 Overview of some of the submodules in SciPy 00:38:11 The size of the communities 00:39:33 Participation in Google Summer of Code 00:40:24 Participation in Google Season of Docs 00:41:48 Communication channels in the project 00:43:25 Where to ask for support? 00:44:48 Possible contributions 00:46:25 Skills usefull to contribute to the NumPy project 00:48:12 Identifying possible contributions 00:48:52 The importance of FLOSS for science 00:52:02 Possible negative impact of FLOSS on science 00:52:49 Crediting contributions in science 00:53:42 Most notable scientific discovery in recent years 00:54:49 His favourite text processing tool 00:55:30 Volunteer effort may not be sufficient anymore 00:56:58 Contact informations for Ralf Gommers 00:57:27 Outro

    59 min
  6. 03/04/2020

    EP026 Data Analysis with pandas

    In episode 26, we interviewed Bhavani Ravi about the Python data analysis library pandas. After a brief introduction about her use of machine leaning models for pharmaceutical research, we talked extensively about pandas. She told us how much pandas is important for her everyday tasks and the strict quality standards of the project. We talked about the features provided by pandas and its compatibility with other Python libraries. We then discussed the importance of FLOSS in her industry and how they are contributing back to important projects. She share with us her experience as a first time contributor to pandas and how to find good first time issues for newcomers. We finished the interview with out usual quick questions. 00:00:17 Introduction 00:00:26 Introducing Bhavani Ravi 00:00:49 Using machine learning models for pharmaceutical research 00:02:46 How she got involed in the pandas project 00:04:29 Her elevator pitch for pandas 00:04:43 How she use pandas in her everyday job 00:05:24 What does pandas bring that is lacking in basic Python 00:06:53 Preparing data for machine learning algorithms 00:08:12 The performance of pandas 00:09:21 Data formats supported by pandas 00:11:03 Data structures provided by pandas 00:11:42 Data analysis tools provided by pandas 00:12:32 Using pandas data structures with scikit-learn 00:12:55 Plotting data from pandas 00:13:39 Transition to Python version 2 00:14:51 Commercial usage of pandas 00:15:16 Companies contributing back to pandas 00:16:02 Exposition of students to pandas 00:16:42 Tutorials to start with pandas 00:18:26 Python libraries dependencies of pandas 00:18:55 Main communication channels 00:19:44 Her experience contributing to pandas 00:21:14 Skills to contribute to the project 00:21:49 List of good first issues 00:22:21 Tasks for non-programmers 00:23:12 FLOSS and the industry 00:24:16 The most notable scientific discovery in recent years 00:24:33 Her favourite text processing tool 00:25:06 Anything else? 00:25:38 How to contact Bhavani 00:25:57 Outro

    28 min
  7. 02/05/2020

    EP025 FreeCAD, a 3D Parametric Modeler

    In episode 25, we interviewed Kurt Kremitzki about the paramatric 3D modelling tool FreeCAD. After discussing his previous experiences with CAD software and how he got involved in the FreeCAD project, we asked him about the current development status of the project before digging deeper into a few of the workbenches offered by FreeCAD. We also compared FreeCAD to LibreCAD and QCAD for applications only requiring 2D drawing instead of parametric 3D models and we discussed about compatibility with commercial CAD systems and standard exchange file formats. We were pleased to learn about the development status of a stable topological naming engine paving the way for the integration of an official assembly workbench in future releases. We then discussed about the spread of FreeCAD in companies and universities as well as ways to contribute to the FreeCAD project. We finished the interview with out usual quick questions and with a mention of their recent presentions at FOSDEM 2020. 00:00:18 Introducing Kurt Kremitzki 00:02:16 How he got involved with FreeCAD 00:03:22 His previous CAD experience before working on FreeCAD 00:04:35 One minute elevator pitch for FreeCAD 00:05:50 Current general development status of FreeCAD 00:07:12 BIM with FreeCAD 00:09:24 What are workbenches in FreeCAD? 00:10:46 Core FreeCAD workbenches 00:11:40 Technical drawing with FreeCAD 00:13:44 FEM libraries integrated within FreeCAD 00:16:15 Multiphysics simulations 00:18:16 Model updates recalculations 00:19:04 Technical drawings and annotations 00:19:49 FreeCAD for 2D CAD drawing vs other FLOSS alternatives 00:21:08 Compatibility with commercial CAD systems and standard exchange file formats 00:23:41 Performance of STEP files conversion 00:24:54 FreeCAD's native file format 00:25:44 Version control with FreeCAD 00:27:01 File formats that are supported by FreeCAD 00:29:16 Integration of Python in FreeCAD 00:30:56 Assemblies with FreeCAD 00:33:29 Stable topological naming 00:35:10 Manual approach for static assemblies 00:36:30 When to expect a stable assembly workbench 00:37:20 How to test assemblies right now 00:38:33 FreeCAD's software license 00:39:16 Companies using FreeCAD 00:39:42 Universities using FreeCAD 00:40:16 FreeCAD's use in science and citations of FreeCAD 00:42:29 How many people are involved in the project 00:43:04 Main communication channels 00:43:54 How to contribute to FreeCAD 00:45:45 Kurt's vision of FLOSS and its importance for the openness of science 00:47:13 Most notable scientific discovery in recent years 00:47:47 Favourite text processing tool 00:48:11 A topic he changed his mind about in science 00:48:28 FreeCAD at FOSDEM 2020 00:49:33 How to contact Kurt 00:49:54 Outro

    52 min
  8. 12/04/2019

    EP024 UK RSE and Software Sustainability

    In episode 24, we interviewed Simon Hettrick Professor at the University of Southampton in the UK. We started the discussion with him by asking about his transition from deveoppin high-power lasers to founding the research software engineers (RSE) association and how his experiences got him in his current position. We then discussed about the roles of RSE in research and how funding for RSE evolved over the past. The discussion went on about the RSE association, its growth over time, branches in other countries and local events. We discussed that the relation between FLOSS and more sustainable research software is not always clear and more work is needed in that area. After talking with him about the lack of sufficient preparation that students receives during their undergrad studies in regard to the tools needed to tackle research software development we finished with our usual quick questions. 00:00:00 Intro 00:00:18 Introduction and Simon Hettrick's presentation 00:00:56 His academic status at the University of Southampton 00:01:53 His transition from developping high power compact lasers to RSE 00:03:31 About his PhD and general comments about PhD defenses 00:04:21 Any relations between laser and his current research area? 00:07:57 1 minute elevator pitch for UK RSE 00:08:32 The growing importance of software and the effect on funding 00:14:03 Defining what is a RSE 00:17:35 How many RSE in UK? 00:18:20 The state of preparation of the research community for brexit 00:20:05 When was the RSE association founded? 00:20:16 How to become a member of RSE UK and the growth rate of the association 00:22:44 Other RSE branches 00:24:42 Relations between RSE associations 00:25:40 Regional RSE organizations and RSE groups 00:27:09 Local meetups groups 00:28:00 Crediting research software development 00:31:39 Is FLOSS the norm or the exception for RSE? 00:35:09 Does FLOSS helps providing better and more sustainable research software? 00:38:55 Curriculum for new researchers 00:43:50 The state of research software licensing 00:46:13 Most notable recent scientific discovery 00:48:13 His favourite text processing tool 00:48:58 A topic in science he changed his mind about 00:50:13 How to contact Simon 00:50:37 Conclusion

    52 min

About

Interviews with scientists who are using or developing free and libre open source software

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada