Scholar Data

Protosemitic Root Derivations (Version: 1)

The dataset provided here culminated in a directed network graph (gexf), and is the result of a guided research seminar on Semitic Philology by Adam Anderson on behalf of Jason Moser, who at the time was a Ph.D. candidate at UC Berkeley in the spring of 2019. The primary focus of the resulting network graph was to show potential pathways for comparative linguistics within the Semitic languages, a branch of the Afro-asiatic language family, based on commonly shared roots and their etymologies. When we concluded this study in 2019, the results were by no means finalized or conclusive, which means that many of the proposed root derivations in the graph are only a working hypothesis, and that there is a higher likelihood for inaccuracies that need to be corrected and supervised by specialists in each of these languages. We move ahead with a novel dataset publication, in order to put the results of this work in Wikidata lexemes, which in turn will allow for the scholarly curation and emendation to take place with attributions.Our use of the term Protosemitic is more in line with Common-Semitic, thereby expressing a working hypothesis for the original Protosemitic roots and lemmas and the most likely related roots in different Semitic languages. Whether a root is inherited from Semitic, borrowed from another language, a Wanderwort or loanword, we don't claim to know. That said, we are fairly certain that the majority of the data shared in this version are based on true roots, which can be substantiated in the list sources. While there is a longstanding series of dictionaries for Akkadian verbal roots (e.g. AHw, CDA, CAD, etc.), the nominal roots have yet to receive the same attention. We therefore consulted a list of sources and datasets, both published and unpublished, including John Huehnergard’s Semitic Philology 140 syllabus from 2008, which we include below. The next version of this dataset will include Q-item ids and references for the etymology claims in Wikidata.In preparation for the publication of this dataset in Wikidata, we worked closely with Timo Homburg, who helped formulate the claims and RDF tripple statements and uploaded the dataset as well. We would like to thank Timo for extending his time and skillsets to help make this data more accessible.Datasets:Network graph file: protosemitic_3996.gexfSemitic Root Derivations: SemiticRootDerivations.csvNetwork Properties for the Nodes:* Statistics generated in GephiId = Hexadecimal with the first three letters corresponding to the source language (e.g. GEZ000018 is the 18th root in Ge’ez in our dataset)LabelLang_sourceIn-DegreeOut-DegreeDegreeWeighted In-DegreeWeighted Out-DegreeEccentricityCloseness CentralityHarmonic Closeness CentralityBetweenness CentralityBridging CoefficientBridging CentralityAuthorityHubModularity ClassPageRankComponent IDStrongly-Connected ID*Network Properties for the Edges:Source = source IDTarget = target IDType = directed edgesId = simple integer countLabel = blankInterval = blankWeight = 1 (if unmerged), or 2 (if merged)Lang_source = (proto-)SemiticEntry_source = lemma for (proto-)Semitic language (e.g. format: lemma[gloss]POS)Lang_target = Semitic language (e.g. geez = Ge’ez)Entry_target = lexeme for Semitic language (e.g. format: lemma[gloss]POS)Semitic Root Derivations:lang_from = source language (e.g. (proto-)Semitic)entry_from_id = unique ID (hexadecimal) for source language lemmaentry_from = lemma for the source languagePS-root = root derived from ProtosemiticPS_Lid = Wikidata lexeme ID for the Protosemitic root (when applicable)lang_to = target language (e.g. geez)entry_to_id = unique ID (hexadecimal) for target language lemmaentry_to = lemma for the target languageAkk_root = most likely Akkadian rootAkk_Lid = Wikidata lexeme ID for the Akkadian root (when applicable)List of sources consulted:Bennett, P. R. Comparative Semitic Linguistics. A Manual, 1998.Bergstrasser, G. Introduction to the Semitic Languages, 2 ed. 1983.Brockelmann, C. Review of Essai comparatif sur le vocabularie et la phonétique du Chamito-Sémitique, by M. Cohen (BiOr 7, 58-61), 1950.Cohen, E. Sentential Complementation in Akkadian (JAOS 122, 803-806), 2000.Cunchillos, J.-L. A Concordance of Ugaritic Words. Versiùn espaûola, 2003.Cunchillos, J.-L. The Texts of the Ugaritic Data Bank, 2003.del Olmo Lete, G. & Sanmartín, J. A Dictionary of the Ugaritic Language (HdO 112), 2015.Demeke, G. Origin of Amharic (Afroasiatic Studies 3), 2014.Deutscher, G. & Kouwenberg, N.J.C. The Akkadian Language in its Semitic Context, (PIHANS 106), 2006.Fox, J. Semitic Noun Patterns (HVSS 52), 2013.Goetze, A. Sequence of Two Short Syllables in Akkadian (OrNS 16, 233-238), 1946.Gordon, C. H. Ugaritic Textbook (OA 38), 1965.Grebaut, Sylvain - Les Pluriels Brises des Formations Trilitteres Qetl, Qatl, Qatal. Cahiers Pour l'Enseignement, 1947.Hospers, J. H. ed. A Basic Bibliography for the Study of the Semitic Languages, Vol. 1, 1973.Huehnergard, J. (North-)West Semitic Dialectology, 2008.Huehnergard, J. & Woods, C. Akkadian and Eblaite (CEWAL 218-280) 2004.Huehnergard, J. Afro-Asiatic (CEWAL 138-159), 2004.Huehnergard, J. Analogical Change Handout, 2008.Huehnergard, J. Semitic Languages (CANE 4, 2117-34), 1995,Huehnergard, J. Semitic Philology 140 (Unpublished Course Outline), 2008.Huehnergard, J. Trees and Waves: On the Classification of the Semitic Languages, 2008 (preprint).Knudsen, E. - Stress in Akkadian (JCS 32, 3-16), 1980.Kogan and Krebernik (eds.). Etymological Dictionary of Akkadian, 2000.Kouwenberg, N.J.C. - Reflections on the Gt-Stem in Akkadian (ZA 95, 77-103) 2005.Krebernik, M. & Streck, M.P. Der Irrealis im Altbabylonischen (JBVO 5, 51-78), 2001.Lambdin, T. O. Introduction to Classical Ethiopic (Ge'ez) (HSS 24), 1978.Leslau's Geez Dictionary also has comparative notes for etymological purposes.Lipinski, E. Glossary of Selected Linguistic Terms (OLA 80, 575-592), 1997.Mankowski, P. V. Akkadian Loanwords in Biblical Hebrew (HVSS 47), 2000.Meyer, W. R. Zum Terminativ-Adverbialis im Akkadischen, die Modaladverbien auf -ish (OrNS 64 3, 161-186), 1995.Militarev, A. & Kogan, L. Semitic Etymological Dictionary, Vol. I: Anatomy of Man and Animals (AOAT 278/1), 2000.Militarev, A. & Kogan, L. Semitic Etymological Dictionary, Vol. II: Animal Names (AOAT 278/2), 2005.Moscati, S. et al. An Introduction to the Comparative Grammar of the Semitic Languages (PLO NS 6), 1964.Pennachietti, F. Un articolo prepositivo in neosudarabico (RSO 44, 285-293), 1969.Schniedewind, W.M. & Hunt, J.H. A Primer on Ugaritic, 2007.Soden, Wolfram von - Untersuchungen zur Babylonische Metrik, Teil 1 (ZA 71, 161-204), 1982; Teil 2 (ZA 74, 213-284), 1984.von Soden, W. Akkadisches Handwörterbuch (AHw). This dictionary has comparative notes, and an English version can be found in the Concise Dictionary of Akkadian (CDA)Weninger, Stefan, ed. - The Semitic Languages, An International Handbook (HSK 36), 2011.Whiting, R. M. The Dual Personal Pronouns in Akkadian (JNES 31, 331-337), 1972.

Authors

Anderson, Adam ;
Moser, Jason

0 Citations0 Mentions77% FAIR1.7 Dataset Index

10.5281/zenodo.14160275November 2024

Protosemitic Root Derivations (Version: 1)

Authors

Anderson, Adam ;
Moser, Jason

0 Citations0 Mentions77% FAIR1.7 Dataset Index

10.5281/zenodo.14160276November 2024

Wikidata Lemmatization Dataset (Version: 1)

The Wikidata Lemmatization Dataset was collected using the following SPARQL query: https://w.wiki/9TwHLanguages included in the dataset:Akkadian : AKK (Q35518)Arabic : AR (Q13955)Czech : CS (Q9056)German : DE (Q188)English : EN (Q1860)French : FR (Q150)Hebrew : HE (Q9288)Hittite : HIT (Q35668)Italian : IT (Q652)Russian : RU (Q7737)Sumerian : SUX (Q36790)Turkish : TR (Q256)The choice of languages to include have to do with a collection of primary and secondary source documents which we have digitized (OCR) and are using as references for the FactGrid Cuneiform project. The resulting lexemes for each language are shared in CSV with the file names references each language, their Wikidata Q-ids, the number of lexemes at that date, and the date of access (MM_YYYY).The format of each CSV includes the following fields:lexeme : the Wikidata lexeme id (L-id)lexemeLabel : the label assigned to the lexeme in Wikidatalexical_category : the Wikidata Q-item for the part of speechlexical_categoryLabel : the label assigned to the lexical category (e.g. noun, verb, adjective, etc.)This dataset will be updated periodically using standard version control.

Authors

Anderson, Adam

0 Citations0 Mentions13% FAIR0.3 Dataset Index

10.5281/zenodo.10819305March 2024

Wikidata Lemmatization Dataset (Version: 1)

Authors

Anderson, Adam

0 Citations0 Mentions13% FAIR0.3 Dataset Index

10.5281/zenodo.10819306March 2024

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD)The ORACC 2 FactGrid data repository serves as a version-controlled medium between the updates taking place on the ORACC website (https://oracc.museum.upenn.edu/) and the FactGrid Cuneiform Project.Within this repository we maintain versions of the following files:1. ORACC_Projects.csv2. all_merged.csv3. factgrid_df.csv4. finaldf.csv5. RulerCatalogue.csvData Descriptions:1. ORACC_Projects.CSV is a curated list of projects found on the ORACC Project website (https://oracc.museum.upenn.edu/projectlist.html) [accessed March 8, 2024]. It contains the links and license information for each of the projects currently available on ORACC.2. all_merged.CSV is the resulting data frame from the Metacatalogue Jupyter Notebook (https://github.com/ancient-world-citation-analysis/Oracc2LoD/blob/main/7.1%20-%20Metacatalogue.ipynb). The construction and format of this data is described in the notebook, and the fields are derived from the CDLI (x) and ORACC (y) databases, respectively.3. factgrid_df.CSV is the resulting data frame from the FactGrid QuickStatement Processing Jupyter Notebook (https://github.com/ancient-world-citation-analysis/Oracc2LoD/blob/main/7.3%20-%20Factgrid%20Quickstatement%20Processing.ipynb). The construction and format of this data is described in the notebook, and the field are derived from the Metacatalogue (above) along with the Properties in FactGrid Directory of Properties for more details (https://database.factgrid.de/wiki/FactGrid:Directory_of_Properties).4. finaldf.CSV is the resulting data frame from the Large DataFrame of Unique Words Jupyter Notebook (https://github.com/ancient-world-citation-analysis/Oracc2LoD/blob/main/3-Large%20DF%20of%20Unique%20Words.ipynb). The construction and format of this data is described in the notebook.5. RulerCatalogue.CSV is the resulting data frame from the RulerCatalogue Jupyter Notebook (https://github.com/ancient-world-citation-analysis/Oracc2LoD/blob/main/6-%20RulerCatalogue.ipynb). The construction and format of this data is described in the notebook.

Authors

Anderson, Adam ;
Melinee, Her

0 Citations0 Mentions73% FAIR1.8 Dataset Index

10.5281/zenodo.10794625March 2024

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

Authors

Anderson, Adam ;
Melinee, Her

0 Citations0 Mentions13% FAIR0.3 Dataset Index

10.5281/zenodo.10794626March 2024

Automated Organization Profile
FactGrid

FactGrid

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Protosemitic Root Derivations (Version: 1)

Protosemitic Root Derivations (Version: 1)

Wikidata Lemmatization Dataset (Version: 1)

Wikidata Lemmatization Dataset (Version: 1)

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

Automated Organization ProfileFactGrid

FactGrid

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Protosemitic Root Derivations (Version: 1)

Protosemitic Root Derivations (Version: 1)

Wikidata Lemmatization Dataset (Version: 1)

Wikidata Lemmatization Dataset (Version: 1)

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

ORACC Dataset Format to LOD FactGrid Cuneiform Wikibase (Oracc2LOD) (Version: 1)

Automated Organization Profile
FactGrid