Wiki-Disease-Benchmark

View Dataset
Alberto, Gonzalez;García-Barragán, Álvaro

Description

This benchmark consist in 255 randomly selected disease descriptions, as of February 2024. Each disease description was labeled by two data annotators who reviewed each other's annotations to ensure accuracy and consistency across the dataset. This procedure involves collecting, parsing and extracting data from Wikipedia using a software routine that interfaces with an API \footnote{https://pypi.org/project/Wikipedia-API/} to systematically retrieve and collate information related to a predefined disease. Specifically, it searches for pages with a certain disease and, within those pages, extracts the "Sings and Symptoms" section.This process has two steps:Retrieve all the labels rdfs:label of triples in DBpedia \footnote{https://dbpedia.org/} that are a disease rdf:type dbo:Disease.With these labels, go to each page of Wikipedia and scrape the section "Signs and Symptoms".After extracting the text from Wikipedia, the phenotypical entities were annotated.

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.9

FAIR Score

77%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

43%

Source

Scholar Data Model

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00