The mission of the SciPy Proceedings Committee (Proccom) is to celebrate and promote the work of the members of the SciPy community. This is taken in the broad sense to include a community of the authors and maintainers of core libraries; the scientists and engineers who use these libraries to advance global knowledge; and new community members who are learning to use these tools for the first time. We aim to elevate their work and their voices by reviewing and publishing full scientific papers, by archiving the slides and posters presented at the annual SciPy conference, and by sharing all of the above on social media channels.
For the last several years, there has been an open question of who precisely within this community is being promoted and celebrated. Until this year, proccom had only anecdotal data about who our authors are, and how that compares to our perception of the community as a whole. This year, for the first time, we surveyed the authors of full papers, and the reviewers of those papers, with an eye towards two kinds of comparisons. First, are our authors drawn from the same distribution as conference attendees? And second, who does the labor of reviewing papers, and how does that compare to those whose papers are reviewed?
In terms of interpreting the results below, there are four important caveats:
- The population attending SciPy does not necessarily reflect the global community; and,
- The survey is for the authors of full papers, which is a subset of all abstracts accepted to the conference; and,
- SciPy does not currently report demographic data for accepted abstracts, so we cannot identify differences between who presents at the conference and who publishes here; and,
- The population we are surveying is very small (< 100 individuals combined between authors and reviewers), so some survey responses have been combined into
"Other"to protect individual privacy.
Now let's look at the numbers.
How is gender distributed across authors?
In the results from the SciPy 2020 conference attendee survey1, 22% of respondents identify as women.
In our authors and reviewers, only 10% of respondents identified as women:
How is age distributed across authors?
For the conference as a whole, the plurality of participants are under the age of 34, and this has been relatively stable across time. The SciPy Diversity Committee notes that we have been increasing participation in the 51+ age group year over year.
In our authors, the under 34 age group is underrepresented, and the 35-50 age group appears to be overrepresented.
When it comes to age and gender, the key disparity seems to be related to soliciting both kinds of contributions from the community, and not between those whose papers get published, and those who do the work of publishing for others. As we’ll see below, this will not be the case for other dimensions of inclusion.
How is ethnicity distributed across authors?
In 2020, the SciPy conference reported that less than half of all respondents identify as white. In order, the next two most frequent choices were Asian / Hawaiian Native / Pacific Islander and Hispanic.
In contrast, respondents self-identifying as white were both overrepresented among full-paper authors (~60%), and underrepresented among our paper reviewers (~40%), when compared to the community of conference attendees. Both Asian / Native Hawaiian / Pacific Islanders and Hispanic / Latinx respondents were overrepresented as reviewers, compared to what we would expect if they were drawn randomly from the population of conference attendees.
This suggests that any career benefits of publishing papers at SciPy may be accrued to white conference attendees, while the work of producing the proceedings falls to individuals identifying as something other than white.
These categories are derived from the U.S. Census options, and as such are defined in a way that is particular to North American demographics and not global demographics. This is problematic especially for the reviewer pool, which reports a current residence outside of the United States of America about 40% of the time. A significant fraction of respondents (~10%) felt that no category accurately represented them, and wrote in a more appropriate designation. These have been grouped under
How is employment distributed across authors?
The SciPy Diversity Committee has not reported data about employment or career stage, but anecdotally, the conference attendees are split fairly evenly between academia and industry.
We find this to be true for participants in the proceedings process in aggregate, but not when broken down between authors and reviewers. University employees are overrepresented among paper authors (~60%), but underrepresented among reviewers (~30%). Employees of private firms (both for-profit and non-profit) are roughly-appropriately represented as reviewers (~45%) but not as authors (~25%). This effect seems to be driven by the small number of papers submitted by employees of private firms, possibly because this is less attractive to non-academic employees as a means of career progression.
As a side note, roughly 5% of all people who volunteered to review full papers for the conference are employees of Capital One Financial.
How is career stage distributed across authors?
This was not an easy question to frame, and several respondents struggled to decide which of the enumerated options applied to them. In the academy, there are several guidelines for defining who counts as an early-career researcher, which are often given as a number of years since your PhD was awared, since beginning your first ladder position, or since your first large (e.g. R01) research grant. None of these apply easily to industry roles, so we instead tried to reframe the question in terms of primary job role. I.e.
- Are you still in training?
- Is your primary role to produce individual contributions?
- Is your primary role to act as a technical expert or mentor to others?
- Is your primary role to govern the business strategy for a group of others?
One respondent commented that it was odd we had not included "Professor" as an option, where we would have thought it to be encapsulated by "technical expertise and mentor". In the future, Proccom will want to revisit this question, but we present the following responses as they are:
These distributions are roughly similar, but we do note that roles we would include in "early career" (students and individual contributors) seem less likely to be authors than reviewers.
How are roles in the library ecosystem distributed across authors?
We don't have data from the conference, or the community as a whole to compare against, but about 35% of authors identified as a maintainer or core developer of a popular scientific Python library. About 10% of authors report that they do not regularly use scientific Python libraries.
How are relationships to the conference distributed across authors?
Of the respondents who reviewed papers for SciPy 2020, about 45% had an abstract accepted to a SciPy conference, either in 2020 or some time in the past. 10% of reviewers report never having presented a paper or poster at SciPy before.
Interestingly, 30% of reviewers report that they have never been to a SciPy conference. Even more interestingly, about 15% of paper authors report that they have never had an abstract accepted at SciPy. It's possible that the question options were not clear enough in communicating that we were including this conference, SciPy 2020, and that a more accurate interpretation might be that 2020 was their first conference, or first accepted abstract. In either case, it feels inequitable to have so much of the work of reviewing papers performed by newcomers to the conference.scipy equity inclusion publishing