Published on 01 January 2026 |

Version 1.0.0

JNS-OMR Dataset: Scanned and Rendered Jingju Jianpu Score Images with Annotations

View Dataset
Anonymous Authors

Description

JNS-OMR DatasetA dataset for optical music recognition (OMR) of jingju (Beijing opera) jianpu notation, derived from the Jingju Music Scores Collection (JMSC).Dataset overviewThe dataset covers 108 jingju scores spanning four role types (sheng, dan, jing, chou), two principal melodic modes (erhuang and xipi), and a range of rhythmic patterns (banshi).- Scores: 108- J-IR files: 108- Scanned scores: 107- Scanned pages: 429- Rendered scores: 106- Rendered pages (clean): 524- Rendered images (all augmentations): 2,620- Bounding-box instances (clean pages): 1,123,780- Symbol classes: 26Contentsdataset/├── README.md├── classes.txt├── metadata/│   ├── dataset_manifest.csv│   ├── kfold_splits.json│   ├── kfold_results.json│   ├── all_folds_summary.json│   ├── merged_comparison.json│   └── clean_stems.csv├── ir/│   └── *.json├── rendered/│   ├── images/│   ├── labels/│   └── previews/├── scans/│   └── {stem}/│       ├── *.png│       └── resolution.txt└── checkpoints/    ├── best_fold01.pt    ├── best_fold02.pt    ├── best_fold03.pt    ├── best_fold04.pt    └── best_fold05.ptFile naming conventionAll files share the same stem name as the source MusicXML file in JMSC.  ir/{stem}_ir.json  rendered/images/{stem}p{N:02d}{aug}.png  rendered/labels/{stem}p{N:02d}{aug}.txt  rendered/previews/{stem}_p{N:02d}_boxes.png  scans/{stem}/{stem}_p{N:04d}.pngAugmentation presets: clean, scan_light, scan_medium, scan_dark, photo_tilt.Metadata files- dataset_manifest.csv: one row per score, links each stem to its source   publication, fold assignment, and file availability flags.- kfold_splits.json: five-fold cross-validation split at piece-family level.- kfold_results.json: per-fold symbol detection results (mAP, precision, recall).- all_folds_summary.json: per-fold pitch sequence reconstruction results.- merged_comparison.json: scan vs rendered comparison on 31 evaluation scores.- clean_stems.csv: the 31 scores used in the scanned-image evaluation.Model checkpointsFive YOLOv8n checkpoints (best_fold01.pt – best_fold05.pt), trained at image size 1280, batch size 2, 100 epochs on a T4 GPU.Source corpusAll symbolic data are derived from the Jingju Music Scores Collection (JMSC). Scanned pages are sourced from the printed jingju jianpu publications listed in metadata/dataset_manifest.csv.License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Citations (0)

Mentions (0)

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo

Keywords

optical music recognitionJianpuJingjuBeijing Operamusic notationobject detection