Published on 01 January 2026 |
JNS-OMR Dataset: Scanned and Rendered Jingju Jianpu Score Images with Annotations
View DatasetDescription
JNS-OMR DatasetA dataset for optical music recognition (OMR) of jingju (Beijing opera) jianpu notation, derived from the Jingju Music Scores Collection (JMSC).Dataset overviewThe dataset covers 108 jingju scores spanning four role types (sheng, dan, jing, chou), two principal melodic modes (erhuang and xipi), and a range of rhythmic patterns (banshi).- Scores: 108- J-IR files: 108- Scanned scores: 107- Scanned pages: 429- Rendered scores: 106- Rendered pages (clean): 524- Rendered images (all augmentations): 2,620- Bounding-box instances (clean pages): 1,123,780- Symbol classes: 26Contentsdataset/├── README.md├── classes.txt├── metadata/│ ├── dataset_manifest.csv│ ├── kfold_splits.json│ ├── kfold_results.json│ ├── all_folds_summary.json│ ├── merged_comparison.json│ └── clean_stems.csv├── ir/│ └── *.json├── rendered/│ ├── images/│ ├── labels/│ └── previews/├── scans/│ └── {stem}/│ ├── *.png│ └── resolution.txt└── checkpoints/ ├── best_fold01.pt ├── best_fold02.pt ├── best_fold03.pt ├── best_fold04.pt └── best_fold05.ptFile naming conventionAll files share the same stem name as the source MusicXML file in JMSC. ir/{stem}_ir.json rendered/images/{stem}p{N:02d}{aug}.png rendered/labels/{stem}p{N:02d}{aug}.txt rendered/previews/{stem}_p{N:02d}_boxes.png scans/{stem}/{stem}_p{N:04d}.pngAugmentation presets: clean, scan_light, scan_medium, scan_dark, photo_tilt.Metadata files- dataset_manifest.csv: one row per score, links each stem to its source publication, fold assignment, and file availability flags.- kfold_splits.json: five-fold cross-validation split at piece-family level.- kfold_results.json: per-fold symbol detection results (mAP, precision, recall).- all_folds_summary.json: per-fold pitch sequence reconstruction results.- merged_comparison.json: scan vs rendered comparison on 31 evaluation scores.- clean_stems.csv: the 31 scores used in the scanned-image evaluation.Model checkpointsFive YOLOv8n checkpoints (best_fold01.pt – best_fold05.pt), trained at image size 1280, batch size 2, 100 epochs on a T4 GPU.Source corpusAll symbolic data are derived from the Jingju Music Scores Collection (JMSC). Scanned pages are sourced from the printed jingju jianpu publications listed in metadata/dataset_manifest.csv.License: Creative Commons Attribution 4.0 International (CC BY 4.0)