Automated Author ProfileSiciliani, Lucia
University of Bari0000-0002-1438-280x
Siciliani, Lucia
Current S-Index
Sum of Dataset Indices for all datasets
Average Dataset Index per Dataset
Average Dataset Index per dataset
Total Datasets
Total datasets for this author
Average FAIR Score
Average FAIR Score per dataset
Total Citations
Total citations to the author's datasets
Total Mentions
Total mentions of the author's datasets
S-Index Interpretation
The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.
What it means:
- A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
- The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
- It provides a single number to track your research data impact over time
Current S-Index: 17.2 (sum of 11 datasets Dataset Index scores)
More information here.
S-Index Over Time
Cumulative Citations Over Time
Cumulative Mentions Over Time
Datasets
This benchmark extends XL-WSD. Starting from XL-WSD, we build a set of prompts for evaluating Large Language Models (LLMs) in two settings. The first is a multiple-choice task, and the second is a generative task in which we assess the quality of the generated definition.The benchmark consists of three compressed archives. Two archives contain training and test data for each task and language, while another is dedicated to the output of several LLMs that we evaluate. Each dataset includes data split into two folders: FT and TT. FT contains data without machine translation, while TT contains data where missing glosses are automatically translated.More details are available in the pre-print article "Exploring the Word Sense Disambiguation Capabilities of Large Language Models," published on arXiv.org.
Authors
- Basile, Pierpaolo ;
- Siciliani, Lucia ;
- Musacchio, Elio
This benchmark extends XL-WSD. Starting from XL-WSD, we build a set of prompts for evaluating Large Language Models (LLMs) in two settings. The first is a multiple-choice task, and the second is a generative task in which we assess the quality of the generated definition.The benchmark consists of three compressed archives. Two archives contain training and test data for each task and language, while another is dedicated to the output of several LLMs that we evaluate. Each dataset includes data split into two folders: FT and TT. FT contains data without machine translation, while TT contains data where missing glosses are automatically translated.More details are available in the pre-print article "Exploring the Word Sense Disambiguation Capabilities of Large Language Models," published on arXiv.org.
Authors
- Basile, Pierpaolo ;
- Siciliani, Lucia ;
- Musacchio, Elio
Tenders are powerful means of investment of public funds and represent a strategic development resource.
Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only.
With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration.
Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited.
For evaluation purposes, we built a dataset composed of 2,000 triples extracted from Italian tenders, which have been manually annotated by two human experts. The dataset, compressed in a single zip file, is composed of: The corpus of 6,262 texts extracted from Italian public tenders (corpus_tenders) The training set of 1,600 annotated triples (training_set) The test set of 400 annotated triples (test_set) The set U of 14,096 triples used for the self-training (u_triples_dd) a compressed archive that contains both the extracted triples and the index for each supervised approach (extraction)
Authors
- Siciliani, Lucia ;
- Ghizzota, Eleonora ;
- Basile, Pierpaolo ;
- Lops, Pasquale
Tenders are powerful means of investment of public funds and represent a strategic development resource.
Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only.
With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration.
Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited.
For evaluation purposes, we built a dataset composed of 2,000 triples extracted from Italian tenders, which have been manually annotated by two human experts. The dataset, compressed in a single zip file, is composed of: The corpus of 6,262 texts extracted from Italian public tenders (corpus_tenders) The training set of 1,600 annotated triples (training_set) The test set of 400 annotated triples (test_set) The set U of 14,096 triples used for the self-training (u_triples_dd) a compressed archive that contains both the extracted triples and the index for each supervised approach (extraction)
Authors
- Siciliani, Lucia ;
- Ghizzota, Eleonora ;
- Basile, Pierpaolo ;
- Lops, Pasquale
This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by two unsupervised IE methods. The former (simple) is based only on PoS-tag patterns; the latter (simpledep) also uses syntactic dependencies.
The extraction process is provided in JSON format. More information and the Java code are available here https://github.com/pippokill/WikiOIE Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile,Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.
Authors
- Basile, Pierpaolo ;
- Siciliani, Lucia ;
- Cassotti, Pierluigi ;
- De Gemmis, Marco ;
- Lops, Pasquale
This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by two unsupervised IE methods. The former (simple) is based only on PoS-tag patterns; the latter (simpledep) also uses syntactic dependencies.
The extraction process is provided in JSON format. More information and the Java code are available here https://github.com/pippokill/WikiOIE Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile,Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.
Authors
- Basile, Pierpaolo ;
- Siciliani, Lucia ;
- Cassotti, Pierluigi ;
- De Gemmis, Marco ;
- Lops, Pasquale
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.
Authors
- Siciliani, Lucia ;
- Basile, Pierpaolo ;
- Lops, Pasquale ;
- Semeraro, Giovanni
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.
Authors
- Siciliani, Lucia ;
- Basile, Pierpaolo ;
- Lops, Pasquale ;
- Semeraro, Giovanni
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.
Authors
- Siciliani, Lucia ;
- Basile, Pierpaolo ;
- Lops, Pasquale ;
- Semeraro, Giovanni
Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
This kind of system needs to translate the question of the user, written using natural language, into a query formulated through a data query language that is compliant with the underlying KG.
The translation process is already non-trivial to solve even when trying to answer simple questions that involve a single triple pattern but becomes troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction by giving a comprehensive description of this topic and the main issues revolving around it and making publicly available a dataset designed to evaluate the performance of a QA system in translating such articulated questions into a specific data query language.
This dataset has also been used to evaluate the best QA systems available at the state of the art.
Authors
- Siciliani, Lucia ;
- Basile, Pierpaolo ;
- Lops, Pasquale ;
- Semeraro, Giovanni