Zenodo Open Metadata Snapshot. Training Dataset For Records Classifier Building

View Dataset
Nowak, Krzysztof

Description

This dataset contains Zenodo's published open access record's metadata as of 6th of March 2017.

It's composed of:A ZIP archive zenodo_open_metadata_06_03_2017.zip containing the full dataset:zenodo_open_metadata_06_03_2017.json (425MB, MD5: 22b30564e94d85373fa87fbfb77b57d3)A JSON file zenodo_open_metadata_06_03_2017_sample.json containing a small sample of the full dataset.Full dataset contains:Metadata of 171674 Open Access Zenodo records.Metadata of 5067 previously Open Access but since removed records which were classified as SPAM records by Zenodo staff.Dataset contains only already publicly available metadata of all of the records.In two cases, the metadata has been altered:One title from a SPAM-labelled record has been altered as it contained an e-mail address.One SPAM-labelled record has been removed from the full dataset Data format description:Dataset is a JSON file, containing a single list of 176741 key-value dictionaries.

Each dictionary contains the terms:
part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

which are corresponding to the fields with the same name available in Zenodo's record jsonschema v1.0.0: https://github.com/zenodo/zenodo/blob/master/zenodo/modules/records/jsonschemas/records/record-v1.0.0.jsonIn addition, some terms have been altered:

The term files contains a list of dictionaries containing filetype, size and filename only.
The term license contains a short Zenodo ID of the license (e.g "cc-by").
The term spam contains a boolean value, determining whether given record was marked as SPAM record by Zenodo staff.

Some values for the top-level terms, which were missing in the metadata may contain a null value.

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.9

FAIR Score

77%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Information Systems

Field

Computer Science

Domain

Physical Sciences

Confidence Score

69%

Source

Open Alex

Keywords

metadatazenodoopen accessmachine learningclassificationdata analysisdata mining

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00