Automated Author Profile

Siciliani, Lucia

University of Bari
0000-0002-1438-280x

Current S-Index

17.2

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.6

Average Dataset Index per dataset

Total Datasets

11

Total datasets for this author

Average FAIR Score

69.1%

Average FAIR Score per dataset

Total Citations

0

Total citations to the author's datasets

Total Mentions

0

Total mentions of the author's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

XL-WSD-LLM: Extending XL-WSD to evaluate Large Language Models (Version: 1.0)

This benchmark extends XL-WSD. Starting from XL-WSD, we build a set of prompts for evaluating Large Language Models (LLMs) in two settings. The first is a multiple-choice task, and the second is a generative task in which we assess the quality of the generated definition.The benchmark consists of three compressed archives. Two archives contain training and test data for each task and language, while another is dedicated to the output of several LLMs that we evaluate. Each dataset includes data split into two folders: FT and TT. FT contains data without machine translation, while TT contains data where missing glosses are automatically translated.More details are available in the pre-print article "Exploring the Word Sense Disambiguation Capabilities of Large Language Models,"  published on arXiv.org.

Authors

  • Basile, Pierpaolo ;
  • Siciliani, Lucia ;
  • Musacchio, Elio
0 Citations0 Mentions69% FAIR1.5 Dataset Index
10.5281/zenodo.15007563March 2025

XL-WSD-LLM: Extending XL-WSD to evaluate Large Language Models (Version: 1.0)

This benchmark extends XL-WSD. Starting from XL-WSD, we build a set of prompts for evaluating Large Language Models (LLMs) in two settings. The first is a multiple-choice task, and the second is a generative task in which we assess the quality of the generated definition.The benchmark consists of three compressed archives. Two archives contain training and test data for each task and language, while another is dedicated to the output of several LLMs that we evaluate. Each dataset includes data split into two folders: FT and TT. FT contains data without machine translation, while TT contains data where missing glosses are automatically translated.More details are available in the pre-print article "Exploring the Word Sense Disambiguation Capabilities of Large Language Models,"  published on arXiv.org.

Authors

  • Basile, Pierpaolo ;
  • Siciliani, Lucia ;
  • Musacchio, Elio
0 Citations0 Mentions69% FAIR1.5 Dataset Index
10.5281/zenodo.15007562March 2025

OIE4PA: Open Information Extraction for the Public Administration (Version: 1.0)

Tenders are powerful means of investment of public funds and represent a strategic development resource.
Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only.
With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration.
Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited.
For evaluation purposes, we built a dataset composed of 2,000 triples extracted from Italian tenders, which have been manually annotated by two human experts. The dataset, compressed in a single zip file, is composed of: The corpus of 6,262 texts extracted from Italian public tenders (corpus_tenders) The training set of 1,600 annotated triples (training_set) The test set of 400 annotated triples (test_set) The set U of 14,096 triples used for the self-training (u_triples_dd) a compressed archive that contains both the extracted triples and the index for each supervised approach (extraction)

Authors

  • Siciliani, Lucia ;
  • Ghizzota, Eleonora ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale
0 Citations0 Mentions77% FAIR1.7 Dataset Index
10.5281/zenodo.8331106September 2023

OIE4PA: Open Information Extraction for the Public Administration (Version: 1.0)

Tenders are powerful means of investment of public funds and represent a strategic development resource.
Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only.
With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration.
Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited.
For evaluation purposes, we built a dataset composed of 2,000 triples extracted from Italian tenders, which have been manually annotated by two human experts. The dataset, compressed in a single zip file, is composed of: The corpus of 6,262 texts extracted from Italian public tenders (corpus_tenders) The training set of 1,600 annotated triples (training_set) The test set of 400 annotated triples (test_set) The set U of 14,096 triples used for the self-training (u_triples_dd) a compressed archive that contains both the extracted triples and the index for each supervised approach (extraction)

Authors

  • Siciliani, Lucia ;
  • Ghizzota, Eleonora ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale
0 Citations0 Mentions13% FAIR0.3 Dataset Index
10.5281/zenodo.8331105September 2023

Relations from Italian Wikipedia using Unsupervised Information Extraction (Version: 1.00)

This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by two unsupervised IE methods. The former (simple) is based only on PoS-tag patterns; the latter (simpledep) also uses syntactic dependencies.
The extraction process is provided in JSON format. More information and the Java code are available here https://github.com/pippokill/WikiOIE Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile,Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.

Authors

  • Basile, Pierpaolo ;
  • Siciliani, Lucia ;
  • Cassotti, Pierluigi ;
  • De Gemmis, Marco ;
  • Lops, Pasquale
0 Citations0 Mentions73% FAIR1.6 Dataset Index
10.5281/zenodo.5498034September 2021

Relations from Italian Wikipedia using Unsupervised Information Extraction (Version: 1.00)

This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by two unsupervised IE methods. The former (simple) is based only on PoS-tag patterns; the latter (simpledep) also uses syntactic dependencies.
The extraction process is provided in JSON format. More information and the Java code are available here https://github.com/pippokill/WikiOIE Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile,Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.

Authors

  • Basile, Pierpaolo ;
  • Siciliani, Lucia ;
  • Cassotti, Pierluigi ;
  • De Gemmis, Marco ;
  • Lops, Pasquale
0 Citations0 Mentions73% FAIR1.6 Dataset Index
10.5281/zenodo.5498033September 2021

MQALD (Version: 4.0)

Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.

Authors

  • Siciliani, Lucia ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale ;
  • Semeraro, Giovanni
0 Citations0 Mentions77% FAIR1.7 Dataset Index
10.5281/zenodo.3746634April 2021

MQALD (Version: 4.0)

Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.

Authors

  • Siciliani, Lucia ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale ;
  • Semeraro, Giovanni
0 Citations0 Mentions77% FAIR1.7 Dataset Index
10.5281/zenodo.4657496April 2021

MQALD (Version: 3.0)

Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
Question Answering systems need to translate the question of the user, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG.
This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern and becomes even more troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction.
The aim of this work is to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language.
This dataset has also been used to evaluate three QA systems available at the state of the art.

Authors

  • Siciliani, Lucia ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale ;
  • Semeraro, Giovanni
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.4479876May 2020

MQALD (Version: 2.0)

Question Answering (QA) over Knowledge Graphs (KG) has the aim of developing a system that is capable of answering users' questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata and so on.
This kind of system needs to translate the question of the user, written using natural language, into a query formulated through a data query language that is compliant with the underlying KG.
The translation process is already non-trivial to solve even when trying to answer simple questions that involve a single triple pattern but becomes troublesome when trying to cope with questions that require the presence of modifiers in the final query, i.e. aggregate functions, query forms, and so on.
The attention over this aspect is growing but has never been thoroughly addressed by the existing literature.
Starting from the latest advances in this field, we want to make a further step towards this direction by giving a comprehensive description of this topic and the main issues revolving around it and making publicly available a dataset designed to evaluate the performance of a QA system in translating such articulated questions into a specific data query language.
This dataset has also been used to evaluate the best QA systems available at the state of the art.

Authors

  • Siciliani, Lucia ;
  • Basile, Pierpaolo ;
  • Lops, Pasquale ;
  • Semeraro, Giovanni
0 Citations0 Mentions77% FAIR1.9 Dataset Index
10.5281/zenodo.4050353May 2020