Published on 10 July 2025

DIY Repair Videos: A Multimodal YouTube Dataset for Instructional Content Analysis

View Dataset
Sahu, Aditya;Gunjal, Suraj ;Thorve, Piyush;Kawar, Suhas;Kadam, Rutuja

Description

DIY Repair Videos: A Multimodal YouTube Dataset for Instructional Content AnalysisOverview:This dataset contains 6 ,015 YouTube DIY‑repair tutorial videos, each enriched with structured metadata, transcripts, viewer comments, channel details, and a rigorous, multi‑round manual annotation of instructional content.Key Components:Metadata & EngagementFields: Video_ID, Title, Description, Duration (ISO 8601 + seconds), View_Count, Like_Count, Comment_Count, Published_At, Thumbnail_URLMetric: Engagement_Ratio = Like_Count / (View_Count + 1)Transcripts:Source: YouTube auto‑captions (empty if unavailable)Fields: Transcript (raw text)Manual Rounds:TR_A1, TR_A2, TR_A3 — three independent transcript reviews (correcting major errors, marking non‑verbal segments)TR_Final — consolidated transcript after consensusDIY Category AnnotationManual Rounds:DIY_A1, DIY_A2, DIY_A3 — three independent category assignments using the annotation guideDIY_Final — consensus category after adjudicationCoverage: 16 DIY sub‑domains (e.g., “home repair,” “plumbing,” “woodworking,” “other”)Reliability: Inter‑annotator agreement (Fleiss’s κ = 0.76)Comments:Fields: Comments (JSON array of up to 50 top‑level comments), Has_Comments (true if ≥ 20 total words)Channel Context:Fields: Channel_ID, Channel_Title, Channel_Thumbnail_URLAnnotation Methodology:1. Stratified Subset SelectionA subset of 180 videos was sampled to represent all DIY categories proportionally.2. Annotation GuideA concise manual defined each DIY category and outlined transcription conventions.3. Independent AnnotationsThree team members performed Round 1–3 (DIY_A1–3 and TR_A1–3) without access to others’ labels.4. Consensus AdjudicationFor each video, a fourth pass produced DIY_Final and TR_Final—the agreed‑upon labels and corrected transcript.DIY-Repair-Youtube-Dataset/│├── data/│ ├── video_metadata.csv # Main dataset file (6,015 rows × 19 columns)│ └── data_dictionary.csv # Definitions of each column/field│├── CITATION.cff├── LICENSE└── README.md└── requirements.txt#The dataset is annotated manually and reviewed the transcripts.

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.6

FAIR Score

65%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

Mendeley Data

Assigned Domain

Subfield

Artificial Intelligence

Field

Computer Science

Domain

Physical Sciences

Confidence Score

41%

Source

Scholar Data Model

Keywords

Computer VisionEducational TechnologyData ScienceNatural Language ProcessingSpeech RecognitionMachine LearningHuman-Computer InteractionMultimodalityMeta Dataset

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00