Description
This dataset contains 50 pages of ground truth data for digitized historical newspapers from the Berlin State Library for training and evaluation of OCR/OLR systems as produced in the context of the EU ICT-PSP project Europeana Newspapers (http://www.europeana-newspapers.eu/).The dataset comprises of the following resources:gt_page.zip Ground Truth files in PAGE-XML format (cf. https://github.com/PRImA-Research-Lab/PAGE-XML)img_full.zip Full resolution scanned images in TIF formatimg_bin.zip Binarized (using the Gatos method) images in TIF formatocr_full.zip OCR (FineReaderEngine11) results for full resolution imagesocr_bin.zip OCR (FineReaderEngine11) results for binarized images
Citations (0)
No citations found
Mentions (0)
No mentions found
Metrics Over Time
Publication Details
Subfield
Artificial Intelligence
Field
Computer Science
Domain
Physical Sciences
Confidence Score
36%
Source
Scholar Data Model