Automated Organization Profile

Linnaeus University

Current S-Index

274.0

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.4

Average Dataset Index per dataset

Total Datasets

197

Total datasets in this organization

Average FAIR Score

55.5%

Average FAIR Score per dataset

Total Citations

89

Total citations to the organization's datasets

Total Mentions

10

Total mentions of the organization's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Uranium incorporation and reduction during Fe(II)-promoted transformations of U(VI)-sorbed iron oxyhydroxysulfates: Insights from U HERFD-XA

The study investigates the U-Fe interaction, crucial for understanding uranium's incorporation into iron (hydr)oxide minerals, impacting long-term soil retention. Using U HERFD-XANES at the U M4 edge, the proposal aims for molecular-level insights into U valence and the local environment. Experiments involved U(VI)-sorbed schwertmannite and jarosite, reacted with varying Fe(II) levels over 1 hour to 2 weeks in an anaerobic chamber (O2<0.5 ppm). Recent Fe EXAFS data show schwertmannite transformation to goethite and jarosite to lepidocrocite/goethite based on Fe(II) levels. U XANES spectra suggest U(V) predominance. U HERFD-XANES will clarify U incorporation in newly formed Fe(III) phases and assess U(IV) reduction in high Fe(II) samples. Insights into kinetics and mechanisms will enhance understanding of U(IV) reduction and repartitioning during schwertmannite and jarosite transformation.

Authors

  • Andrikopoulos, Christos ;
  • Hedberg, Marcus ;
  • Kononova, Liubov ;
  • Yu, Changxun
0 Citations0 Mentions15% FAIR0.4 Dataset Index
10.15151/esrf-es-1830822069January 2027

Data from: Climate warming disrupts zooplankton phenology and overwintering strategies

Data files and R code for the manuscript: Climate warming disrupts zooplankton phenology and overwintering strategies. Published in Limnology and Oceanography.Content OverviewRcode:analyses_20250605.R -  R script used to create Figures 2-4 and Figures S1-S3, and all statistical analyses.Data files:Field_data_final.csv - Data from the field sampling. Includes Zooplankton densities (number per L water), chlorophyll (μg per L) and temperature.Expt1_fixed.csv - Data from Experiment 1, the fixed temperature experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).Expt2_gradient.csv - Date from Experiment 2, the temperature gradient experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).

Authors

  • Svendsen, Ida K. ;
  • Forsman, Anders ;
  • Dopson, Mark ;
  • Nilsson, Emelie ;
  • Sunde, Johanna ;
  • Håkansson, Sofia ;
  • Ketzer, Marcelo ;
  • Hylander, Samuel ;
  • Salis, Romana
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.12674222December 2025

Data from: Climate warming disrupts zooplankton phenology and overwintering strategies

Data files and R code for the manuscript: Climate warming disrupts zooplankton phenology and overwintering strategies. Published in Limnology and Oceanography.Content OverviewRcode:analyses_20250605.R -  R script used to create Figures 2-4 and Figures S1-S3, and all statistical analyses.Data files:Field_data_final.csv - Data from the field sampling. Includes Zooplankton densities (number per L water), chlorophyll (μg per L) and temperature.Expt1_fixed.csv - Data from Experiment 1, the fixed temperature experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).Expt2_gradient.csv - Date from Experiment 2, the temperature gradient experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).

Authors

  • Svendsen, Ida K. ;
  • Forsman, Anders ;
  • Dopson, Mark ;
  • Nilsson, Emelie ;
  • Sunde, Johanna ;
  • Håkansson, Sofia ;
  • Ketzer, Marcelo ;
  • Hylander, Samuel ;
  • Salis, Romana
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.12674223December 2025

TranscriboQuest2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • VIEIRA, Maria Florencia ;
  • Garcia, Irene Salvo ;
  • Dehaut, Catherine ;
  • Henningsson, Pontus
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.17062363September 2025

TranscriboQuest2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • VIEIRA, Maria Florencia ;
  • Garcia, Irene Salvo ;
  • Dehaut, Catherine ;
  • Henningsson, Pontus
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.17062144September 2025

TranscriboQuest 2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Henningsson, Pontus ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • Vieira, Maria Florencia ;
  • Salvo Garcia, Irene ;
  • Dehaut, Catherine ;
  • Pinche, Ariane
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.17062143September 2025

TranscriboQuest 2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Henningsson, Pontus ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • VIEIRA, Maria Florencia ;
  • Garcia, Irene Salvo ;
  • Dehaut, Catherine ;
  • Pinche, Ariane
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.17062830September 2025

TranscriboQuest 2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Henningsson, Pontus ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • Vieira, Maria Florencia ;
  • Salvo Garcia, Irene ;
  • Dehaut, Catherine ;
  • Pinche, Ariane
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.17062963September 2025

TranscriboQuest 2025 Medieval Vernacular Religious Texts

Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886

Authors

  • McDonough, Ciaran ;
  • Henningsson, Pontus ;
  • Denicolò, Barbara ;
  • Bougrelle, Roxane ;
  • VIEIRA, Maria Florencia ;
  • Salvo Garcia, Irene ;
  • Dehaut, Catherine ;
  • Pinche, Ariane
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.17062954September 2025

User Comments from a SNL 'Fast Fashion Ad' Sketch Combined with RoBERTa and BERTopic Outputs

ContextThis dataset was created for a Master's thesis in Digital Humanities by Ka Yee Suvini Lai (see Related Works for the thesis paper titled: Emotion Classification, Topic Modelling, and Discourse Evaluation of Audience Responses to SNL's Fast Fashion Sketch on Social Media: Leveraging RoBERTa, BERTopic and Discourse Analysis). The dataset consists of user comments from a SNL sketch titled 'Fast Fashion Ad', extracted across YouTube, Instagram and TikTok (n=4028). The dataset also contains emotion classification and topic modelling outputs from RoBERTa and BERTopic. Technical detailsThe dataset consists of the following columns (with explanations in brackets):comment_text (this column contains the user comments of the SNL sketch from Youtube, Instagram and Tiktok) top_emotion (RoBERTa's output of the highest emotion score from the comment)emotion_scores (RoBERTa's output of all the emotions and their scores from the comment)topic (BERTopic's output for the topic number for the comment)topic_label (BERTopic's output for the topic number and topic label for the comment)probability (BERTopic's output for the probability of the topic from the comment)This dataset is a .csv file and is interoperable across many digital tools. It is the aggregated results from the RoBERTa and BERTopic Python Pipelines (see Related Works for the source code).Further detailsTo gain access to the dataset, please reach out to the author via email: [email protected]

Authors

  • Lai, Ka Yee Suvini
0 Citations0 Mentions88% FAIR2.2 Dataset Index
10.48436/c3j49-2pv45August 2025