Published on 01 January 2024
Trends in gender homophily in scientific publications (data)
View DatasetDescription
This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas.The main dataset (author_address_article_gend_v3.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row):ut: unique article identifier.daisng_id: unique author identifier.author_no: author number, as listed in the article.country: author country (two-letter ISO code).date: publication date.gender: gender of the author ("male" or "female"), as provided by the Genderize.io API.probability: probability of the gender attribute, as provided by the Genderize.io API.count: number of entries for the author first name, as provided by the Genderize.io API.jsc: journal subject category.field: field of research.research_area: area of research.n_aut: number of authors in this publication.journal: journal name.alphabetical: whether the author list for this article is in alphabetical order.With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format:null_field.rds: null homophily values per country, year and field of research.null_field_comp.rds: null homophily values per year and field of research (only for complete authorships).null_research.rds: null homophily values per year and area of research.null_research_comp.rds: null homophily values per year and area of research (only for complete authorships).All these datasets have the same structure:country: country (two-letter ISO code).year: year.variable: either field or research area name.m: average homophily.s: homophily std. error.Finally, some supplementary files used in the descriptive analysis and methods:File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019.File wos_category_to_field.csv is a mapping from WoS categories to more general fields.File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.
Citations (0)
No citations found
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Artificial Intelligence
Field
Computer Science
Domain
Physical Sciences
Confidence Score
49%
Source
Scholar Data Model