Published on 01 January 2024

Trends in gender homophily in scientific publications (data)

View Dataset
Torre, Margarita;Prieto-Alonso, Jesús Álvaro;Ucar, Iñaki

Description

This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas.The main dataset (author_address_article_gend_v3.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row):ut: unique article identifier.daisng_id: unique author identifier.author_no: author number, as listed in the article.country: author country (two-letter ISO code).date: publication date.gender: gender of the author ("male" or "female"), as provided by the Genderize.io API.probability: probability of the gender attribute, as provided by the Genderize.io API.count: number of entries for the author first name, as provided by the Genderize.io API.jsc: journal subject category.field: field of research.research_area: area of research.n_aut: number of authors in this publication.journal: journal name.alphabetical: whether the author list for this article is in alphabetical order.With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format:null_field.rds: null homophily values per country, year and field of research.null_field_comp.rds: null homophily values per year and field of research (only for complete authorships).null_research.rds: null homophily values per year and area of research.null_research_comp.rds: null homophily values per year and area of research (only for complete authorships).All these datasets have the same structure:country: country (two-letter ISO code).year: year.variable: either field or research area name.m: average homophily.s: homophily std. error.Finally, some supplementary files used in the descriptive analysis and methods:File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019.File wos_category_to_field.csv is a mapping from WoS categories to more general fields.File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.8

FAIR Score

73%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

49%

Source

Scholar Data Model

Keywords

academiaoccupational segregationhomophilic behaviorgender equalityWeb of Sciencescientific research

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00