Historical Newspapers Ground Truth

View Dataset
Neudecker, Clemens

Description

This dataset contains 50 pages of ground truth data for digitized historical newspapers from the Berlin State Library for training and evaluation of OCR/OLR systems as produced in the context of the EU ICT-PSP project Europeana Newspapers (http://www.europeana-newspapers.eu/).The dataset comprises of the following resources:gt_page.zip Ground Truth files in PAGE-XML format (cf. https://github.com/PRImA-Research-Lab/PAGE-XML)img_full.zip Full resolution scanned images in TIF formatimg_bin.zip Binarized (using the Gatos method) images in TIF formatocr_full.zip OCR (FineReaderEngine11) results for full resolution imagesocr_bin.zip OCR (FineReaderEngine11) results for binarized images

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.7

FAIR Score

69%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

36%

Source

Scholar Data Model

Keywords

OCR, Ground Truth, Newspaper, Europeana

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00