Automated Organization ProfileLinnaeus University
Linnaeus University
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets in this organization
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the organization's datasets
Total Mentions
Total mentions of the organization's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 274.0 (sum of 197 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
The study investigates the U-Fe interaction, crucial for understanding uranium's incorporation into iron (hydr)oxide minerals, impacting long-term soil retention. Using U HERFD-XANES at the U M4 edge, the proposal aims for molecular-level insights into U valence and the local environment. Experiments involved U(VI)-sorbed schwertmannite and jarosite, reacted with varying Fe(II) levels over 1 hour to 2 weeks in an anaerobic chamber (O2<0.5 ppm). Recent Fe EXAFS data show schwertmannite transformation to goethite and jarosite to lepidocrocite/goethite based on Fe(II) levels. U XANES spectra suggest U(V) predominance. U HERFD-XANES will clarify U incorporation in newly formed Fe(III) phases and assess U(IV) reduction in high Fe(II) samples. Insights into kinetics and mechanisms will enhance understanding of U(IV) reduction and repartitioning during schwertmannite and jarosite transformation.
Authors
- Andrikopoulos, Christos ;
- Hedberg, Marcus ;
- Kononova, Liubov ;
- Yu, Changxun
Data files and R code for the manuscript: Climate warming disrupts zooplankton phenology and overwintering strategies. Published in Limnology and Oceanography.Content OverviewRcode:analyses_20250605.R - R script used to create Figures 2-4 and Figures S1-S3, and all statistical analyses.Data files:Field_data_final.csv - Data from the field sampling. Includes Zooplankton densities (number per L water), chlorophyll (μg per L) and temperature.Expt1_fixed.csv - Data from Experiment 1, the fixed temperature experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).Expt2_gradient.csv - Date from Experiment 2, the temperature gradient experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).
Authors
- Svendsen, Ida K. ;
- Forsman, Anders ;
- Dopson, Mark ;
- Nilsson, Emelie ;
- Sunde, Johanna ;
- Håkansson, Sofia ;
- Ketzer, Marcelo ;
- Hylander, Samuel ;
- Salis, Romana
Data files and R code for the manuscript: Climate warming disrupts zooplankton phenology and overwintering strategies. Published in Limnology and Oceanography.Content OverviewRcode:analyses_20250605.R - R script used to create Figures 2-4 and Figures S1-S3, and all statistical analyses.Data files:Field_data_final.csv - Data from the field sampling. Includes Zooplankton densities (number per L water), chlorophyll (μg per L) and temperature.Expt1_fixed.csv - Data from Experiment 1, the fixed temperature experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).Expt2_gradient.csv - Date from Experiment 2, the temperature gradient experiment. Includes Zooplankton densities (number per L water and number per mL sediment) and chlorophyll (μg per L of water and μg per mL sediment).
Authors
- Svendsen, Ida K. ;
- Forsman, Anders ;
- Dopson, Mark ;
- Nilsson, Emelie ;
- Sunde, Johanna ;
- Håkansson, Sofia ;
- Ketzer, Marcelo ;
- Hylander, Samuel ;
- Salis, Romana
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- VIEIRA, Maria Florencia ;
- Garcia, Irene Salvo ;
- Dehaut, Catherine ;
- Henningsson, Pontus
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- VIEIRA, Maria Florencia ;
- Garcia, Irene Salvo ;
- Dehaut, Catherine ;
- Henningsson, Pontus
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Henningsson, Pontus ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- Vieira, Maria Florencia ;
- Salvo Garcia, Irene ;
- Dehaut, Catherine ;
- Pinche, Ariane
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Henningsson, Pontus ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- VIEIRA, Maria Florencia ;
- Garcia, Irene Salvo ;
- Dehaut, Catherine ;
- Pinche, Ariane
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Henningsson, Pontus ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- Vieira, Maria Florencia ;
- Salvo Garcia, Irene ;
- Dehaut, Catherine ;
- Pinche, Ariane
Team Medieval Vernacular Religious Texts DescriptionThis dataset was created by a collaborative working group with the aim of transcribing medieval vernacular religious texts across a range of European languages. To reflect the linguistic expertise of the group members, the project included Old and Middle French, Old and Middle Irish, Old Castilian, Old Swedish, and Early New High German (Bavarian). Religious texts were chosen as a common thread because of their wide diffusion in the European vernacular tradition, their high survival rate in manuscripts, and their relevance for the study of medieval cultural and textual practices. The dataset is based on manuscripts preserved in France, Spain, Sweden, Germany, and Ireland, dating from the 11th to the 15th centuries, with a particular concentration in the 15th century. All manuscripts belong to the category of medium to highly decorated literary manuscripts. They are written in clearly identifiable scripts, predominantly in one or two columns; two manuscripts also include marginal texts. TextThe transcribed texts represent a wide spectrum of medieval vernacular religious writing across Europe. They include hagiographic works such as the Legend om S. Barbara in Old Swedish and the Legenda aurea in Old Castilian, devotional poetry like the Amra Colum Cille in Old/Middle Irish, as well as historiographical and moralizing texts such as Simon de Hesdin’s and Nicolas de Gonesse’s French translation of Valerius Maximus (Les Faits et Dits Mémorables). Other contributions comprise Marian devotional literature (La conception Nostre Dame), vernacular adaptations of Latin translations (Décadas de Tito Livio in Old Castilian), and the widespread compendium Der Heiligen Leben in Early New High German (Bavarian) again based on the Legenda Aurea. ScriptThe manuscripts reflect the diversity of medieval book hands used in different European regions between the 11th and 15th centuries. They include early Insular Carolingian minuscule, as well as various Gothic books and cursive hands characteristic of high and late medieval manuscript culture. Our dataset is composed of following texts : | Manuscript | Date | Language | Text | Scripta | Number of transcribed lines | Name of the transcriptor || :---- | :---- | :---- | :---- | :---- | :---- | :---- || Paris, BnF, fr. 282 | 1401 | Middle French | Les Faits et Dits Mémorables de Valère le Grant, by Simon de Hesdin and Nicolas de Gonesse | Gothic cursiva | 192 | Roxane Bougrelle || Paris, BnF fr. 818 | 1251-1300 | Old French | La conception Nostre Dame | Gothic | 143 | Catherine Dehaut || Dublin, RIA 23 E 25, pp. 5-8 | 11-12th c (MS). 8/9th c (Text) | Old/Middle Irish and Latin | Amra Colum Cille | Insular Carolingian miniscule | 19 | Ciaran McDonough || Stockholm, KB A 110, ff. [250r] | 1385-1400 | Old Swedish | Legend om S. Barbara, from Codex Oxenstiernianus (Järtteckensboken) | Cursiva gothica recentior | 110 | Pontus Henningsson || Escorial, ms. h-I-14 | 13-14th c. | Old Castillan | Legenda aurea | Gothic cursiva | 141 | María Florencia Vieira || Munich, University Library, 2° Cod. ms. 314 | 15th c. | Bavarian, Early Modern | Der Heiligen Leben (Sommerteil) | Bastarda | 287 | Barbara Denicolò || BNE ms. 12732 | 1433 | Old Castilian | Décadas de Tito-Livio (Pero López de Ayala from Bersuire’s translation) | Gothic Cursiva | 141 | Irene Salvo García | The following metrics displayed below our corpus consisting of the manuscripts listed in the table above: in terms of its total number of documents (18); number of regions (74); total number of lines (1279); total number of words (11365); total number of characters (59696): Dataset structureThe data folder contains each manuscript and its transcriptions. For each manuscript, the transcription is provided in an ALTO XML, along with a METS XML file, along with images of the manuscript pages that were transcribed. These are provided in one folder per manuscript as a subfolder to the data folder. GuidelinesWe followed the guidelines of the project CatMus (1.6) :Pinche, A., Clérice, T., Vlachou-Efstathiou, M., Chagué, A., Camps, J.-B., Gille Levenson, M., Brisville-Fertin, O., Boschetti, F., Fischer, F., Gervers, M., Boutreux, A., Manton, A., Gabay, S., & Ferrante, G. (2025). CATMuS Medieval (1.6.0). Zenodo. https://doi.org/10.5281/zenodo.15030337 And used the SegmOnto controlled vocabulary :Simon Gabay, Ariane Pinche, Kelly Christensen, Jean-Baptiste Camps, Nicola Carboni, SegmOnto, A Controlled Vocabulary to Describe the Layout of Pages, version 0.9, Genève/Lyon/Paris, 2023, https://segmonto.github.io/ For our collective project, we decided to follow some common guidelines : Segmentation guidelines MainZone : see SegmOnto. MainZone contains principle text, as a single block, without any paratext. If there are many columns, we made different blocks. Interlinear glosses are part of the MainZone.NumberingZone : see SegmOnto. Contains page numbers or folio numbers.GraphicZone : images or decorations.MarginTextZone : marginal glosses or additions.DropCapitalZone : see SegmOnto. Any type of capital letters, without any subtype. Do not transcribe it.RunningTitleZone : see SegmOnto. Zone containing a running title. DefaultLine / default (Kraken Result) : standard textline.HeadingLine : Any heading, without consideration for the level.InterlinearLine : Any line between standard baseline. Verse initials (appearing at the start of each line of a versified text, similar in size to the remaining text of the verse) should be included in the line of the corresponding verse. They should thus be segmented in the MainZone as normal text. Some initial letters (not illuminated), which hang slightly below the line are not included in the mask, meaning that they have to be later added to the transcription. Transcription guidelinesWe also followed the CATMus guidelines when transcribing our documents. Given the diversity of our corpus, we chose a graphemic transcription system. Allographic variants (‘u/v’ and ‘i’/’j’) were normalised. When it comes to ‘z’/’s’, every allograph of ‘s’ was transcribed by ‘s’, and we made a difference between ‘s’ and ‘z’. We didn’t separate agglutinated words and verse initials were to be kept separated (using a space) from the rest of the word. When it comes to the choice of signs, we used four main signs: “.” for single dots, “:” for more than single dots, “/” for diastoles and signs that look like virgulas and “¶” for section markers. We didn’t normalize capital letters. We also transcribed hyphenations and diastoles. Also, if a character does not exist in MUFI but carries linguistic importance and is a common occurring character in a manuscript, a new character can be created by combining MUFI characters and symbols. Abbreviations were not developed and we transcribed them adhering to MUFI. In other words, we transcribe what we see, so that the dataset works for every vernacular language of our corpus. Every character is described in the CatMus guidelines:https://catmus-guidelines.github.io/html/guidelines/en/character_table.html For the transcription of the manuscript in Old Swedish, specifically, a new combination of characters, “a̶ ” (small antiphon), was created since it is a common occurrence in Old Swedish manuscripts and carries linguistic importance. The character “a̶ ” has similar pronunciation as “ä” and “æ”, but since these letters appear in other languages with other pronunciation/meaning and look different than the small antiphon, a new character was created. A large antiphon already exists in MUFI,https://mufi.info/q.php?p=mufi/chars/unichar/59610, but the small one does not. In the future, hopefully the small antihpon, “a̶ “, will be added to MUFI. Creators of the datasetCiaran McDonough, Aarhus University 0000-0002-5198-9205Pontus Henningsson, Linneaus University, 0009-0002-8956-7312Barbara Denicolò, Paris Lodron University of Salzburg, 0000-0001-7155-9790Roxane Bougrelle, Université Lumière Lyon 2, 0009-0002-6343-2305Catherine Dehaut, Université de Montréal, 0009-0000-5161-7906María Florencia Vieira, ENS de Lyon, 0009-0001-8222-1178Irene Salvo García, Universidad Autónoma de Madrid 0000-0003-0155-1886
Authors
- McDonough, Ciaran ;
- Henningsson, Pontus ;
- Denicolò, Barbara ;
- Bougrelle, Roxane ;
- VIEIRA, Maria Florencia ;
- Salvo Garcia, Irene ;
- Dehaut, Catherine ;
- Pinche, Ariane
ContextThis dataset was created for a Master's thesis in Digital Humanities by Ka Yee Suvini Lai (see Related Works for the thesis paper titled: Emotion Classification, Topic Modelling, and Discourse Evaluation of Audience Responses to SNL's Fast Fashion Sketch on Social Media: Leveraging RoBERTa, BERTopic and Discourse Analysis). The dataset consists of user comments from a SNL sketch titled 'Fast Fashion Ad', extracted across YouTube, Instagram and TikTok (n=4028). The dataset also contains emotion classification and topic modelling outputs from RoBERTa and BERTopic. Technical detailsThe dataset consists of the following columns (with explanations in brackets):comment_text (this column contains the user comments of the SNL sketch from Youtube, Instagram and Tiktok) top_emotion (RoBERTa's output of the highest emotion score from the comment)emotion_scores (RoBERTa's output of all the emotions and their scores from the comment)topic (BERTopic's output for the topic number for the comment)topic_label (BERTopic's output for the topic number and topic label for the comment)probability (BERTopic's output for the probability of the topic from the comment)This dataset is a .csv file and is interoperable across many digital tools. It is the aggregated results from the RoBERTa and BERTopic Python Pipelines (see Related Works for the source code).Further detailsTo gain access to the dataset, please reach out to the author via email: [email protected]
Authors
- Lai, Ka Yee Suvini