Zenodo Open Metadata Snapshot. Training Dataset For Records Classifier Building
View DatasetDescription
This dataset contains Zenodo's published open access record's metadata as of 6th of March 2017.
It's composed of:A ZIP archive zenodo_open_metadata_06_03_2017.zip containing the full dataset:zenodo_open_metadata_06_03_2017.json (425MB, MD5: 22b30564e94d85373fa87fbfb77b57d3)A JSON file zenodo_open_metadata_06_03_2017_sample.json containing a small sample of the full dataset.Full dataset contains:Metadata of 171674 Open Access Zenodo records.Metadata of 5067 previously Open Access but since removed records which were classified as SPAM records by Zenodo staff.Dataset contains only already publicly available metadata of all of the records.In two cases, the metadata has been altered:One title from a SPAM-labelled record has been altered as it contained an e-mail address.One SPAM-labelled record has been removed from the full dataset Data format description:Dataset is a JSON file, containing a single list of 176741 key-value dictionaries.
Each dictionary contains the terms:
part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date
which are corresponding to the fields with the same name available in Zenodo's record jsonschema v1.0.0: https://github.com/zenodo/zenodo/blob/master/zenodo/modules/records/jsonschemas/records/record-v1.0.0.jsonIn addition, some terms have been altered:
The term files contains a list of dictionaries containing filetype, size and filename only.
The term license contains a short Zenodo ID of the license (e.g "cc-by").
The term spam contains a boolean value, determining whether given record was marked as SPAM record by Zenodo staff.
Some values for the top-level terms, which were missing in the metadata may contain a null value.
Citations (0)
No citations found
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Information Systems
Field
Computer Science
Domain
Physical Sciences
Confidence Score
69%
Source
Open Alex