31 episodes

Interviews with scientists who are using or developing free and libre open source software

FLOSS for Science FLOSSforScience

    • Technology

Interviews with scientists who are using or developing free and libre open source software

    EP031 GNU licenses

    EP031 GNU licenses

    Note : This interview was recorded in the summer of 2020. However, due to the pandemic we could not release the episode timely. Therefore, the current status of FSF and recent events are not discussed in this episode.

    In episode 31, we interviewed Craig Topham from the Licensing and Compliance Team of the GNU Project about GNU software licenses. We started by discussing about his involvment in the compliance team at the Free Software Foundation (FSF) and what got him interested in the topic of free software. The portion of the interview centered around the GNU project with an emphasis on the GNU software licenses. We went through the GPL, LGPL, AGPL and GFDL licenses to explain some of their differences and why you may want to use one instead of another. We asked questions about the specificities of licensing your code in the context of scientific software and the issue with licenses proliferation. Some of the differences between the different versions of the GPL were presented later in the discussion to show the improvements brought in the version 3 in regard to the compliance and patent sections. We asked him about his take on the philosophical differences between GNU style licenses and the MIT/BSD licenses in regard to the debate between user and developper freedom. We followed by going through some myths surrounding the GNU licenses and a general discussion about freedom and privacy. We finished the interview with our usual quick questions.

    • 48 min
    EP030 Spack: a package manager for supercomputers

    EP030 Spack: a package manager for supercomputers

    In episode 30, we interviewed Todd Gamblin from the Lawrence Livermore National Laboratory about the Spack project. We discussed his current research project along with his involvement in Spack. We widely discussed the philosophy of Spack, some usage patterns, its capabilities for managing package management in HPC clusters as well as standalone computers and which operating systems it supports at the moment. Todd shared with us his opinion on the trend for containerized workloads to achieve reproducible science and why it may not be the goal we need to set. He highlighted for us the similarities and differences between EasyBuild and Spack as well as the origin of those differences. We finished the interview with our usual quick questions.

    00:00:00 Intro music
    00:00:17 Introduction
    00:00:36 Introducing Todd Gamblin
    00:00:58 His current research topics
    00:01:23 Spack as official duties
    00:01:43 Spack usage at Lawrence Livermore National Laboratory
    00:02:01 Other research projects
    00:02:47 Profiling in HPC
    00:04:24 His role as leader of software packaging technology for the exascale computing project
    00:04:58 One-minute elevator pitch for Spack
    00:05:34 Spack's usage philosophy compared to other package managers
    00:06:59 Installation from source code or binary?
    00:07:28 Spack's usage in the top500 super computers
    00:07:49 Geographical distribution of users
    00:08:18 Number of packages in the repo and some examples
    00:09:05 Managing computer clusters with Spack's automation capabilities
    00:11:04 Module files in Spack
    00:12:32 Syntax of a Spack package file
    00:13:43 Configuration of compiler flags
    00:15:00 Importing python libraries in the Spack files
    00:15:48 The procedure to submit a package
    00:16:27 Review process for new packages
    00:17:34 Reasons for rejection of Spack packages
    00:18:01 Operating systems supported by Spack
    00:18:23 WSL and Spack
    00:18:58 Restricting packages to certain hardware and software configurations
    00:20:04 Build testing and nightly builds
    00:21:28 Working with containers in a Spack environment
    00:22:25 Deploying prebuilt containers
    00:23:05 About the "universality" of containers
    00:24:16 His opinion on containerized applications for reproducible science
    00:26:17 Spack's log file to document reproducibility
    00:27:13 Reproducing older results
    00:28:10 Specifying requirements on compilers
    00:30:39 Post-installation verification test
    00:31:10 Using Spack on a standalone computer instead of HPC systems
    00:32:56 Differences between EasyBuild and Spack
    00:34:24 EasyBuild in the top500
    00:34:49 Transitionning between EasyBuild and Spack
    00:35:38 Other alternatives
    00:36:23 Using EasyBuild and Spack on the same system
    00:38:36 When did the project start?
    00:39:53 External contributions to Spack
    00:40:53 How many core developpers?
    00:41:30 Organization of the community and governance model
    00:43:06 Who decides which package is accepted in the repo?
    00:44:38 Spack's choice of software license
    00:47:09 Todd's vision about the importance of FLOSS for the openness of science
    00:48:13 Possible negative impacts of FLOSS
    00:48:58 Most notable recent scientific discovery
    00:49:14 Favourite text processing tool
    00:49:25 A topic in science about which he recently changed his mind about
    00:49:58 Anything else we forgot to ask?
    00:50:09 How to contact Todd
    00:50:46 Conclusion

    • 52 min
    EP029 Distributing Python packages with setuptools

    EP029 Distributing Python packages with setuptools

    In episode 29, we interviewed Jason R Coombs from the setuptools project. We started with a discussion about his background and his interest for Python and other programming languages. Following that, we had a thorough discussion about setuptools. We covered topics such as how he got involved in the project, the nature and composition of a Python package, why packaging your code can be important even for small projects, the hidden complexity of binary packages in the Python Package Index and how to maintain compatibility between Python versions. We also had a brief segment about the security aspects of Python packages. He informed us about how you could start contributing to the project and where to discuss Python packaging. We then followed with a general discussion about FLOSS in science and the problem of long-term maintenance in academia. We concluded the interview with our usual quick questions.

    00:00:00.000 Intro
    00:00:23 Introducing Jason R. Coombs
    00:01:28 The first programming languages he learned and how he got into Python
    00:03:46 New interesting programming languages
    00:05:07 His favourite past Python projects
    00:06:53 His one minute elevator pitch for setuptools
    00:08:00 The relation between setuptools, PIP and Anaconda
    00:10:43 How he got involved with the setuptools project
    00:14:43 What is a Python package ?
    00:16:07 What can be included in a package?
    00:16:36 At which point is it beneficial to create a package ?
    00:18:04 Managing compatibility with multiple versions of Python
    00:20:33 Advantages of packages for small projects
    00:22:46 How much work is required to create a package ?
    00:25:05 Files required to create a Python package
    00:27:45 Licenses and readme for Python packages
    00:30:51 The nature of distribution archives
    00:31:27 Compatibility of binary archives
    00:32:39 Eggs and wheel files
    00:34:32 Dealing with non portable packages in the Python Package Index across multiple operating systems
    00:37:49 Uploading packages to the Python Package Index
    00:39:12 Review for broken or malicious code
    00:40:08 Vulneraility from package removal in the Python Package Index
    00:43:24 Package name collisions
    00:45:13 How many packages are in the Python Package Index
    00:45:25 Alternatives to the main Python Package Index
    00:46:35 Other packaging tools
    00:47:39 How many developpers are involved in the project
    00:48:31 Communication channels and discussions about Python packaging
    00:49:53 Openings for new contributors
    00:50:59 Skills required to contribute
    00:52:24 The challenge of long term maintenance of packages in academia
    00:55:43 His vision about the importance of FLOSS for the openess of science
    00:59:18 Disadvantage of using FLOSS
    01:01:24 The most notable scientific discovery in recent years
    01:02:13 Favourite text processing tool
    01:03:23 A topic in science about which he recently changed his mind
    01:04:50 Contact informations
    01:05:23 Conclusion

    • 1 hr 7 min
    EP028 NumFocus: A Nonprofit Supporting Open Source

    EP028 NumFocus: A Nonprofit Supporting Open Source

    In episode 28, we interviewed Leah Silen from the NumFocus organization. She introduced us to the goals and the mission of the organization. We then had a discussion about the different levels of support provided by the organization to its member projects. She informed us about the legal, financial, technological and logistical support that can be provided by NumFocus. Following that, we asked her about the revenue sources of the organization as well as the possible influence from the corporate sponsors over the decisions and governance of the organization. We also discussed of the requirements to become part of NumFocus including details about the application process. We had a brief discussion about the history of the project and the evolution of the scope of projects that are part of the organization. After discussing the governance of the organization, we concluded the interview with our usual questions.

    00:00:00 Intro
    00:00:18 Introducing Leah Silen
    00:02:28 Goals and mission of NumFocus
    00:03:06 Examples of supported projects
    00:03:39 Status of sponsorded and affiliated projects
    00:05:04 Advantages of one status over the other
    00:05:48 Legal challenges for open source scientific projects
    00:07:19 Financial support for scientific open source projects
    00:10:13 Assistance to apply for external grants
    00:11:01 Paying developers from outside of US?
    00:11:43 Revenue sources for NumFocus
    00:12:21 Levels of corporate sponshorships
    00:13:14 The influence of corporate sponsors
    00:13:56 Motivations of corporate sponsors
    00:15:02 Some of the sponsors of the NumFocus organization
    00:16:05 Technological support for projects
    00:17:03 Events previously supported by the organization
    00:18:09 The kind of support that can be provided for events
    00:19:22 Requirements for new projects
    00:21:10 Clarification about the meaning of being a scientific oriented project
    00:23:11 Requirements about the team size and strong governance within projects
    00:24:58 The application process
    00:26:18 Duration of support
    00:26:44 Timeframe to receive a response for an application
    00:28:30 Feedback in the case of rejection
    00:29:12 Are there downsides of becoming a part of NumFocus?
    00:30:32 Additional administrative overhead?
    00:32:01 Location of NumFocus staff members
    00:33:00 Foudation of NumFocus and initial projects
    00:34:05 Opening to projects outside of the Python ecosystem
    00:34:54 Favourite project?
    00:35:46 Initial role in NumFocus
    00:36:53 Term duration for positions at the board of directors
    00:37:24 Selection process for the board of directors
    00:38:31 Leah's vision about FLOSS and its importance for the openness of science
    00:39:04 Negative impacts of FLOSS
    00:39:33 Most notable scientific discovery in recent years
    00:39:59 Favourite text processing tool
    00:40:37 A topic in science about which she changed her mind about
    00:41:36 Anything else we forgot to ask about?
    00:42:54 How to contact Leah Silen
    00:43:14 Outro

    • 45 min
    EP027 Scientific Computing with SciPy and NumPy

    EP027 Scientific Computing with SciPy and NumPy

    In episode 27, we interviewed Ralf Gommers from the NumPy and SciPy projects. We started our discussion by talking about his past research experience as a physicist and his transition to open source software and programming. This led him to get involved in projects such as PyWavelets, NumPy and SciPy. Following that, we had a great discussion about NumPy, its many features, its target audience and its performance. We learned why NumPy is not included in Python's standard library and its overlap with Scipy. We also compared the combination of Matlab to NumPy and Python and how users could transition to this open source solution. We then had a brief discussion about SciPy and the features it provides. Ralf informed us of the positive results from Google's previous Summer of Code and Season of Docs participations. We discussed how to reach the project and the many kinds of contributions that they are looking for. We talked about the importance of FLOSS for science and attribution of research output. We finished the interview with our classic quick questions and a reflection from Ralf about the need for more sustainability in open source software development as volunteer effort may not be sufficient in the future.

    00:00:00 Intro
    00:00:18 Introduction
    00:00:33 Introducing Ralf Gommers
    00:02:05 Research during his PhD and and PostDoc
    00:03:20 When he started to use open source tools
    00:03:52 Learning to code
    00:04:39 PyWavelets, another sideproject he likes
    00:05:55 His elevator pitch for NumPy
    00:06:55 Vector arrays in Python before NumPy
    00:07:49 How he got involved in the NumPy project
    00:10:13 Traget users for NumPy
    00:11:36 NumPy as part of the standard library?
    00:13:24 Features provided by NumPy
    00:14:22 Major differences between Python built-in list and NumPy's array
    00:16:01 Structured data
    00:16:45 Why appending a row to an array is made hard
    00:18:09 Multithreaded code with NumPy
    00:19:48 Distributed array processing
    00:20:50 GPU computation with Python and NumPy
    00:22:16 Linear algebra functions in NumPy
    00:23:25 Overlap between SciPy and NumPy for linear algebra
    00:23:55 Python speed as an interpreted language
    00:25:43 Python with NumPy compared to Matlab
    00:28:07 How easy is the transition between Matlab and Python Numpy
    00:29:26 Performance difference between Matlab and Python
    00:31:00 Commercial applications of NumPy
    00:32:15 Contributions from the industry ans incentives to contribute
    00:34:10 Elevator pitch for SciPy
    00:35:37 Overview of some of the submodules in SciPy
    00:38:11 The size of the communities
    00:39:33 Participation in Google Summer of Code
    00:40:24 Participation in Google Season of Docs
    00:41:48 Communication channels in the project
    00:43:25 Where to ask for support?
    00:44:48 Possible contributions
    00:46:25 Skills usefull to contribute to the NumPy project
    00:48:12 Identifying possible contributions
    00:48:52 The importance of FLOSS for science
    00:52:02 Possible negative impact of FLOSS on science
    00:52:49 Crediting contributions in science
    00:53:42 Most notable scientific discovery in recent years
    00:54:49 His favourite text processing tool
    00:55:30 Volunteer effort may not be sufficient anymore
    00:56:58 Contact informations for Ralf Gommers
    00:57:27 Outro

    • 59 min
    EP026 Data Analysis with pandas

    EP026 Data Analysis with pandas

    In episode 26, we interviewed Bhavani Ravi about the Python data analysis library pandas. After a brief introduction about her use of machine leaning models for pharmaceutical research, we talked extensively about pandas. She told us how much pandas is important for her everyday tasks and the strict quality standards of the project. We talked about the features provided by pandas and its compatibility with other Python libraries. We then discussed the importance of FLOSS in her industry and how they are contributing back to important projects. She share with us her experience as a first time contributor to pandas and how to find good first time issues for newcomers. We finished the interview with out usual quick questions.

    00:00:17 Introduction
    00:00:26 Introducing Bhavani Ravi
    00:00:49 Using machine learning models for pharmaceutical research
    00:02:46 How she got involed in the pandas project
    00:04:29 Her elevator pitch for pandas
    00:04:43 How she use pandas in her everyday job
    00:05:24 What does pandas bring that is lacking in basic Python
    00:06:53 Preparing data for machine learning algorithms
    00:08:12 The performance of pandas
    00:09:21 Data formats supported by pandas
    00:11:03 Data structures provided by pandas
    00:11:42 Data analysis tools provided by pandas
    00:12:32 Using pandas data structures with scikit-learn
    00:12:55 Plotting data from pandas
    00:13:39 Transition to Python version 2
    00:14:51 Commercial usage of pandas
    00:15:16 Companies contributing back to pandas
    00:16:02 Exposition of students to pandas
    00:16:42 Tutorials to start with pandas
    00:18:26 Python libraries dependencies of pandas
    00:18:55 Main communication channels
    00:19:44 Her experience contributing to pandas
    00:21:14 Skills to contribute to the project
    00:21:49 List of good first issues
    00:22:21 Tasks for non-programmers
    00:23:12 FLOSS and the industry
    00:24:16 The most notable scientific discovery in recent years
    00:24:33 Her favourite text processing tool
    00:25:06 Anything else?
    00:25:38 How to contact Bhavani
    00:25:57 Outro

    • 27 min

Top Podcasts In Technology

Lex Fridman Podcast
Lex Fridman
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Acquired
Ben Gilbert and David Rosenthal
Deep Questions with Cal Newport
Cal Newport
Hard Fork
The New York Times
Dwarkesh Podcast
Dwarkesh Patel