FLOSS for Science FLOSSforScience
-
- Technology
Interviews with scientists who are using or developing free and libre open source software
-
EP031 GNU licenses
Note : This interview was recorded in the summer of 2020. However, due to the pandemic we could not release the episode timely. Therefore, the current status of FSF and recent events are not discussed in this episode.
In episode 31, we interviewed Craig Topham from the Licensing and Compliance Team of the GNU Project about GNU software licenses. We started by discussing about his involvment in the compliance team at the Free Software Foundation (FSF) and what got him interested in the topic of free software. The portion of the interview centered around the GNU project with an emphasis on the GNU software licenses. We went through the GPL, LGPL, AGPL and GFDL licenses to explain some of their differences and why you may want to use one instead of another. We asked questions about the specificities of licensing your code in the context of scientific software and the issue with licenses proliferation. Some of the differences between the different versions of the GPL were presented later in the discussion to show the improvements brought in the version 3 in regard to the compliance and patent sections. We asked him about his take on the philosophical differences between GNU style licenses and the MIT/BSD licenses in regard to the debate between user and developper freedom. We followed by going through some myths surrounding the GNU licenses and a general discussion about freedom and privacy. We finished the interview with our usual quick questions. -
EP030 Spack: a package manager for supercomputers
In episode 30, we interviewed Todd Gamblin from the Lawrence Livermore National Laboratory about the Spack project. We discussed his current research project along with his involvement in Spack. We widely discussed the philosophy of Spack, some usage patterns, its capabilities for managing package management in HPC clusters as well as standalone computers and which operating systems it supports at the moment. Todd shared with us his opinion on the trend for containerized workloads to achieve reproducible science and why it may not be the goal we need to set. He highlighted for us the similarities and differences between EasyBuild and Spack as well as the origin of those differences. We finished the interview with our usual quick questions.
00:00:00 Intro music
00:00:17 Introduction
00:00:36 Introducing Todd Gamblin
00:00:58 His current research topics
00:01:23 Spack as official duties
00:01:43 Spack usage at Lawrence Livermore National Laboratory
00:02:01 Other research projects
00:02:47 Profiling in HPC
00:04:24 His role as leader of software packaging technology for the exascale computing project
00:04:58 One-minute elevator pitch for Spack
00:05:34 Spack's usage philosophy compared to other package managers
00:06:59 Installation from source code or binary?
00:07:28 Spack's usage in the top500 super computers
00:07:49 Geographical distribution of users
00:08:18 Number of packages in the repo and some examples
00:09:05 Managing computer clusters with Spack's automation capabilities
00:11:04 Module files in Spack
00:12:32 Syntax of a Spack package file
00:13:43 Configuration of compiler flags
00:15:00 Importing python libraries in the Spack files
00:15:48 The procedure to submit a package
00:16:27 Review process for new packages
00:17:34 Reasons for rejection of Spack packages
00:18:01 Operating systems supported by Spack
00:18:23 WSL and Spack
00:18:58 Restricting packages to certain hardware and software configurations
00:20:04 Build testing and nightly builds
00:21:28 Working with containers in a Spack environment
00:22:25 Deploying prebuilt containers
00:23:05 About the "universality" of containers
00:24:16 His opinion on containerized applications for reproducible science
00:26:17 Spack's log file to document reproducibility
00:27:13 Reproducing older results
00:28:10 Specifying requirements on compilers
00:30:39 Post-installation verification test
00:31:10 Using Spack on a standalone computer instead of HPC systems
00:32:56 Differences between EasyBuild and Spack
00:34:24 EasyBuild in the top500
00:34:49 Transitionning between EasyBuild and Spack
00:35:38 Other alternatives
00:36:23 Using EasyBuild and Spack on the same system
00:38:36 When did the project start?
00:39:53 External contributions to Spack
00:40:53 How many core developpers?
00:41:30 Organization of the community and governance model
00:43:06 Who decides which package is accepted in the repo?
00:44:38 Spack's choice of software license
00:47:09 Todd's vision about the importance of FLOSS for the openness of science
00:48:13 Possible negative impacts of FLOSS
00:48:58 Most notable recent scientific discovery
00:49:14 Favourite text processing tool
00:49:25 A topic in science about which he recently changed his mind about
00:49:58 Anything else we forgot to ask?
00:50:09 How to contact Todd
00:50:46 Conclusion -
EP029 Distributing Python packages with setuptools
In episode 29, we interviewed Jason R Coombs from the setuptools project. We started with a discussion about his background and his interest for Python and other programming languages. Following that, we had a thorough discussion about setuptools. We covered topics such as how he got involved in the project, the nature and composition of a Python package, why packaging your code can be important even for small projects, the hidden complexity of binary packages in the Python Package Index and how to maintain compatibility between Python versions. We also had a brief segment about the security aspects of Python packages. He informed us about how you could start contributing to the project and where to discuss Python packaging. We then followed with a general discussion about FLOSS in science and the problem of long-term maintenance in academia. We concluded the interview with our usual quick questions.
00:00:00.000 Intro
00:00:23 Introducing Jason R. Coombs
00:01:28 The first programming languages he learned and how he got into Python
00:03:46 New interesting programming languages
00:05:07 His favourite past Python projects
00:06:53 His one minute elevator pitch for setuptools
00:08:00 The relation between setuptools, PIP and Anaconda
00:10:43 How he got involved with the setuptools project
00:14:43 What is a Python package ?
00:16:07 What can be included in a package?
00:16:36 At which point is it beneficial to create a package ?
00:18:04 Managing compatibility with multiple versions of Python
00:20:33 Advantages of packages for small projects
00:22:46 How much work is required to create a package ?
00:25:05 Files required to create a Python package
00:27:45 Licenses and readme for Python packages
00:30:51 The nature of distribution archives
00:31:27 Compatibility of binary archives
00:32:39 Eggs and wheel files
00:34:32 Dealing with non portable packages in the Python Package Index across multiple operating systems
00:37:49 Uploading packages to the Python Package Index
00:39:12 Review for broken or malicious code
00:40:08 Vulneraility from package removal in the Python Package Index
00:43:24 Package name collisions
00:45:13 How many packages are in the Python Package Index
00:45:25 Alternatives to the main Python Package Index
00:46:35 Other packaging tools
00:47:39 How many developpers are involved in the project
00:48:31 Communication channels and discussions about Python packaging
00:49:53 Openings for new contributors
00:50:59 Skills required to contribute
00:52:24 The challenge of long term maintenance of packages in academia
00:55:43 His vision about the importance of FLOSS for the openess of science
00:59:18 Disadvantage of using FLOSS
01:01:24 The most notable scientific discovery in recent years
01:02:13 Favourite text processing tool
01:03:23 A topic in science about which he recently changed his mind
01:04:50 Contact informations
01:05:23 Conclusion -
EP028 NumFocus: A Nonprofit Supporting Open Source
In episode 28, we interviewed Leah Silen from the NumFocus organization. She introduced us to the goals and the mission of the organization. We then had a discussion about the different levels of support provided by the organization to its member projects. She informed us about the legal, financial, technological and logistical support that can be provided by NumFocus. Following that, we asked her about the revenue sources of the organization as well as the possible influence from the corporate sponsors over the decisions and governance of the organization. We also discussed of the requirements to become part of NumFocus including details about the application process. We had a brief discussion about the history of the project and the evolution of the scope of projects that are part of the organization. After discussing the governance of the organization, we concluded the interview with our usual questions.
00:00:00 Intro
00:00:18 Introducing Leah Silen
00:02:28 Goals and mission of NumFocus
00:03:06 Examples of supported projects
00:03:39 Status of sponsorded and affiliated projects
00:05:04 Advantages of one status over the other
00:05:48 Legal challenges for open source scientific projects
00:07:19 Financial support for scientific open source projects
00:10:13 Assistance to apply for external grants
00:11:01 Paying developers from outside of US?
00:11:43 Revenue sources for NumFocus
00:12:21 Levels of corporate sponshorships
00:13:14 The influence of corporate sponsors
00:13:56 Motivations of corporate sponsors
00:15:02 Some of the sponsors of the NumFocus organization
00:16:05 Technological support for projects
00:17:03 Events previously supported by the organization
00:18:09 The kind of support that can be provided for events
00:19:22 Requirements for new projects
00:21:10 Clarification about the meaning of being a scientific oriented project
00:23:11 Requirements about the team size and strong governance within projects
00:24:58 The application process
00:26:18 Duration of support
00:26:44 Timeframe to receive a response for an application
00:28:30 Feedback in the case of rejection
00:29:12 Are there downsides of becoming a part of NumFocus?
00:30:32 Additional administrative overhead?
00:32:01 Location of NumFocus staff members
00:33:00 Foudation of NumFocus and initial projects
00:34:05 Opening to projects outside of the Python ecosystem
00:34:54 Favourite project?
00:35:46 Initial role in NumFocus
00:36:53 Term duration for positions at the board of directors
00:37:24 Selection process for the board of directors
00:38:31 Leah's vision about FLOSS and its importance for the openness of science
00:39:04 Negative impacts of FLOSS
00:39:33 Most notable scientific discovery in recent years
00:39:59 Favourite text processing tool
00:40:37 A topic in science about which she changed her mind about
00:41:36 Anything else we forgot to ask about?
00:42:54 How to contact Leah Silen
00:43:14 Outro -
EP027 Scientific Computing with SciPy and NumPy
In episode 27, we interviewed Ralf Gommers from the NumPy and SciPy projects. We started our discussion by talking about his past research experience as a physicist and his transition to open source software and programming. This led him to get involved in projects such as PyWavelets, NumPy and SciPy. Following that, we had a great discussion about NumPy, its many features, its target audience and its performance. We learned why NumPy is not included in Python's standard library and its overlap with Scipy. We also compared the combination of Matlab to NumPy and Python and how users could transition to this open source solution. We then had a brief discussion about SciPy and the features it provides. Ralf informed us of the positive results from Google's previous Summer of Code and Season of Docs participations. We discussed how to reach the project and the many kinds of contributions that they are looking for. We talked about the importance of FLOSS for science and attribution of research output. We finished the interview with our classic quick questions and a reflection from Ralf about the need for more sustainability in open source software development as volunteer effort may not be sufficient in the future.
00:00:00 Intro
00:00:18 Introduction
00:00:33 Introducing Ralf Gommers
00:02:05 Research during his PhD and and PostDoc
00:03:20 When he started to use open source tools
00:03:52 Learning to code
00:04:39 PyWavelets, another sideproject he likes
00:05:55 His elevator pitch for NumPy
00:06:55 Vector arrays in Python before NumPy
00:07:49 How he got involved in the NumPy project
00:10:13 Traget users for NumPy
00:11:36 NumPy as part of the standard library?
00:13:24 Features provided by NumPy
00:14:22 Major differences between Python built-in list and NumPy's array
00:16:01 Structured data
00:16:45 Why appending a row to an array is made hard
00:18:09 Multithreaded code with NumPy
00:19:48 Distributed array processing
00:20:50 GPU computation with Python and NumPy
00:22:16 Linear algebra functions in NumPy
00:23:25 Overlap between SciPy and NumPy for linear algebra
00:23:55 Python speed as an interpreted language
00:25:43 Python with NumPy compared to Matlab
00:28:07 How easy is the transition between Matlab and Python Numpy
00:29:26 Performance difference between Matlab and Python
00:31:00 Commercial applications of NumPy
00:32:15 Contributions from the industry ans incentives to contribute
00:34:10 Elevator pitch for SciPy
00:35:37 Overview of some of the submodules in SciPy
00:38:11 The size of the communities
00:39:33 Participation in Google Summer of Code
00:40:24 Participation in Google Season of Docs
00:41:48 Communication channels in the project
00:43:25 Where to ask for support?
00:44:48 Possible contributions
00:46:25 Skills usefull to contribute to the NumPy project
00:48:12 Identifying possible contributions
00:48:52 The importance of FLOSS for science
00:52:02 Possible negative impact of FLOSS on science
00:52:49 Crediting contributions in science
00:53:42 Most notable scientific discovery in recent years
00:54:49 His favourite text processing tool
00:55:30 Volunteer effort may not be sufficient anymore
00:56:58 Contact informations for Ralf Gommers
00:57:27 Outro -
EP026 Data Analysis with pandas
In episode 26, we interviewed Bhavani Ravi about the Python data analysis library pandas. After a brief introduction about her use of machine leaning models for pharmaceutical research, we talked extensively about pandas. She told us how much pandas is important for her everyday tasks and the strict quality standards of the project. We talked about the features provided by pandas and its compatibility with other Python libraries. We then discussed the importance of FLOSS in her industry and how they are contributing back to important projects. She share with us her experience as a first time contributor to pandas and how to find good first time issues for newcomers. We finished the interview with out usual quick questions.
00:00:17 Introduction
00:00:26 Introducing Bhavani Ravi
00:00:49 Using machine learning models for pharmaceutical research
00:02:46 How she got involed in the pandas project
00:04:29 Her elevator pitch for pandas
00:04:43 How she use pandas in her everyday job
00:05:24 What does pandas bring that is lacking in basic Python
00:06:53 Preparing data for machine learning algorithms
00:08:12 The performance of pandas
00:09:21 Data formats supported by pandas
00:11:03 Data structures provided by pandas
00:11:42 Data analysis tools provided by pandas
00:12:32 Using pandas data structures with scikit-learn
00:12:55 Plotting data from pandas
00:13:39 Transition to Python version 2
00:14:51 Commercial usage of pandas
00:15:16 Companies contributing back to pandas
00:16:02 Exposition of students to pandas
00:16:42 Tutorials to start with pandas
00:18:26 Python libraries dependencies of pandas
00:18:55 Main communication channels
00:19:44 Her experience contributing to pandas
00:21:14 Skills to contribute to the project
00:21:49 List of good first issues
00:22:21 Tasks for non-programmers
00:23:12 FLOSS and the industry
00:24:16 The most notable scientific discovery in recent years
00:24:33 Her favourite text processing tool
00:25:06 Anything else?
00:25:38 How to contact Bhavani
00:25:57 Outro