Automated Author Profile
Griffitt, Kira

Current S-Index

21.7

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

3.6

Average Dataset Index per dataset

Total Datasets

Total datasets for this author

Average FAIR Score

37.2%

Average FAIR Score per dataset

Total Citations

Total citations to the author's datasets

Total Mentions

Total mentions of the author's datasets

S-Index Interpretation

The S-Index (Sharing Index) is a comprehensive metric that represents the cumulative impact of all your datasets. It is calculated as the sum of Dataset Index scores across all your claimed datasets.

What it means:

A higher S-index indicates greater overall impact of your datasets relative to typical datasets in their fields of research
The S-Index grows as you add more datasets or as existing datasets gain more citations and mentions
It provides a single number to track your research data impact over time

Current S-Index: 21.7 (sum of 6 datasets Dataset Index scores)

More information here.

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Machine Reading Phase 1 IC Training Data

Introduction

Machine Reading Phase 1 IC Training Data was developed by the Linguistic Data Consortium and contains 248 English source documents and 116 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Machine Reading program.

The Machine Reading (MR) program aimed to develop automated reading systems to bridge the gap between knowledge contained in natural language texts and knowledge accessible to formal reasoning systems. The reading systems designed by program participants were required to extract and reason about facts from text in multiple domains.

The data in this release constitutes the training data for the IC (Core Domain) task. The IC Use Cases tested the core domain by extracting information about about Entities (people, organizations, geopolitical entities or "GPEs") and their involvement in four types of Relations: Attack Relations (e.g. bombings), Biographical Relations (e.g. being a citizen of a country), Affiliation Relations (e.g. being a leader of an organization), and Family Relations (e.g. having a spouse) as described in newswire text. This information was then aligned with an IC Use Cases ontology that would allow automated reasoning about the extracted Entities and Relations.

Data

This release contains 248 source documents (108,960 words) from English newswire stories in English Gigaword Fourth Edition (LDC2009T13). Roughly half of those documents (116) were annotated for IC/Core Use Cases. Annotation was non-exhaustive, but an attempt was made to provide instances of all relations and their arguments where explicitly stated in a single sentence, as well as some non-explicit relations, which were marked with an "Inferred" tag by the annotator.

Annotations are in GUI XML (traditional annotation) and RDF XML (formal knowledge representation) formats. A second set of GUI XML is provided with additional, unofficial annotations. All source and annotation files are presented as UTF-8 encoded XML files with associated dtds, schemas or ontologies.

Acknowledgments

The Linguistic Data Consortium gratefully acknowledges the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09 C-xxxx. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.

Samples

Please view the following samples:

Source

RDF XML

GUI XML

GUI XML Extended

Updates

None at this time.

Portions © 1994-1997, 2001-2006 Agence France Presse, © 2002 An Nahar, ©1995-1998, 2000-2001, 2005-2006 The Associated Press, © 1996-1998, 2004, 2006 Los Angeles Times-Washington Post News Service, Inc., © 1994-2002, 2004-2006 New York Times, © 1994 Reuters America, Inc., © 1995-2006 Xinhua News Agency, © 2009, 2020 Trustees of the University of Pennsylvania

Authors

Simpson, Heather ;
Strassel, Stephanie ;
Wright, Jonathan ;
Griffitt, Kira

0 Citations0 Mentions35% FAIR0.8 Dataset Index

10.35111/tj3x-ce202020

Abstract Meaning Representation (AMR) Annotation Release 3.0

Introduction

Abstract Meaning Representation (AMR) Annotation Release 3.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 59,255 English natural language sentences from broadcast conversations, newswire, weblogs, web discussion forums, fiction and web text. This release adds new data to, and updates material contained in, Abstract Meaning Representation 2.0 (LDC2017T10), specifically: more annotations on new and prior data, new or improved PropBank-style frames, enhanced quality control, and multi-sentence annotations.

AMR captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.

LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12), and Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).

Data

The source data includes discussion forums collected for the DARPA BOLT AND DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. New source data to AMR 3.0 includes sentences from Aesop's Fables, parallel text and the situation frame data set developed by LDC for the DARPA LORELEI program, and lead sentences from Wikipedia articles about named entities.

The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

Dataset	Training	Dev	Test	Totals
BOLT DF MT	1061	133	133	1327
Broadcast conversation	214	0	0	214
Weblog and WSJ	0	100	100	200
BOLT DF English	7379	210	229	7818
DEFT DF English	32915	0	0	32915
Aesop fables	49	0	0	49
Guidelines AMRs	970	0	0	970
LORELEI	4441	354	527	5322
2009 Open MT	204	0	0	204
Proxy reports	6603	826	823	8252
Weblog	866	0	0	866
Wikipedia	192	0	0	192
Xinhua MT	741	99	86	926
Totals	55635	1722	1898	59255

Data in the "split" directory contains 59,255 AMRs split roughly 93.9%/2.9%/3.2% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 59,255 AMRs with no train/dev/test partition.

Samples

Please view this AMR sample.

Updates

None at this time.

Acknowledgements

From University of Colorado

We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government.

From Information Sciences Institute (ISI)

Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).

From Linguistic Data Consortium (LDC)

This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government.

We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.

From Language Weaver (SDL)

This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.

Portions © 1994-1996, 2002-2010 Agence France Presse, © 2007 Al-Ahram, © 2007 Al Hayat, © 2007 Al-Quds Al-Arabi, © 2000 American Broadcasting Company, © 2007 An Nahar, © 2007 Asharq Al-Awsat, © 2007 Assabah, © 2002-2008, 2010 The Associated Press, © 2000 Cable News Network LP, LLLP, © 2003-2004, 2007-2008 Central News Agency (Taiwan), © 1997, 2004-2007 China Central TV, © 2007 China Military Online, © 2007 Chinanews.com, © 1987-1989 Dow Jones & Company, Inc., © 2007 Guangming Daily, © 1995, 2003, 2005, 2007-2008 Los Angeles Times-Washington Post News Service, Inc., © 2000 National Broadcasting Company, Inc., © 1999, 2002, 2004-2008, 2010 New York Times, © 2000 Public Radio International, © 1994-1998, 2001-2008 Xinhua News Agency, © 2020 Trustees of the University of Pennsylvania

Authors

Knight, Kevin ;
Badarau, Bianca ;
Baranescu, Laura ;
Bonial, Claire ;
Griffitt, Kira ;
Hermjakob, Ulf ;
Marcu, Daniel ;
O'Gorman, Tim ;
Palmer, Martha ;
Schneider, Nathan ;
Bardocz, Madalina

38 Citations0 Mentions50% FAIR17.5 Dataset Index

10.35111/44cy-bp512020

Machine Reading Phase 1 NFL Scoring Training Data

Introduction

Machine Reading Phase 1 NFL Scoring Training Data was developed by the Linguistic Data Consortium (LDC) and contains 110 US NFL (National Football League) scoring source documents and 110 standoff annotation files used in the DARPA (Defense Advanced Research Projects Agency) Machine Reading program.

The Machine Reading program aimed to develop automated reading systems to bridge the gap between knowledge contained in natural language texts and knowledge accessible to formal reasoning systems. The reading systems designed by program participants were required to extract and reason about facts from text in multiple domains.

The data in this release constitutes the training data for the NFL Scoring Use Cases evaluation. The NFL Scoring Use Cases tested the sports domain by extracting information about scoring events and outcomes of US NFL games and by aligning that information with an NFL Scoring ontology.

Data

This release contains 110 source documents (70,233 words) from English newswire stories. The files were manually annotated for instances of NFL Scoring annotation categories defined with respect to the NFL Scoring ontology.

Annotations are in GUI XML (traditional annotation) and RDF XML (formal knowledge representation) formats. All source and annotation files are presented as UTF-8 encoded XML files with associated dtds.

Acknowledgments

Samples

Please view the following samples:

Source Sample

GUI XML Sample

RDF XML Sample

Updates

None at this time.

Portions © 1995-1996, 2002-2005 Agence France Presse, ©1998, 2000-2001 The Associated Press, © 1994, 1996, 1998, 2005 New York Times, © 2003, 2005, 2007, 2009, 2011, 2019 Trustees of the University of Pennsylvania

Authors

Simpson, Heather ;
Strassel, Stephanie ;
Wright, Jonathan ;
Griffitt, Kira

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/8pye-2w872019

BOLT Information Retrieval Comprehensive Training and Evaluation

Introduction

BOLT Information Retrieval Comprehensive Training and Evaluation was developed by the Linguistic Data Consortium (LDC) and consists of all data produced in support of the Information Retrieval (IR) task within the DARPA Broad Operational Language Translation (BOLT) Program, including annotations, source documents and scoring software.

The BOLT program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported BOLT by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.

The material in this release relates to the IR task, which sought to support development of systems that could: (1) take as input a natural language English query sentence; (2) return relevant responses to that query from a large corpus of informal documents in the three BOLT languages (Arabic, Chinese, and English); and (3) translate responses from non-English documents into English.

Data

BOLT Information Retrieval Comprehensive Training and Evaluation contains the pilot, dry run, and evaluation data developed for each phase of the BOLT IR task, including: (1) natural-language IR queries, system responses to queries, and manually-generated assessment judgments for system responses; (2) discussion forum source documents in Arabic, Chinese and English; (3) scoring software for each evaluation phase; and (4) experimental data developed in Phase 2.

Source data is presented as a series of zip archives containing xml files. Queries and responses data are presented as XML as well. Judgments are included as tab delimited files.

Acknowledgement

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Samples

Please view the following samples:

Source Data

Query

Assessment

Response Assessment

Updates

None at this time.

Authors

Griffitt, Kira ;
Strassel, Stephanie

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/sd50-6m362018

Abstract Meaning Representation (AMR) Annotation Release 2.0

Introduction

Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.

AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.

LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12).

Data

The source data includes discussion forums collected for the DARPA BOLT and DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

Dataset	Training	Dev	Test	Totals
BOLT DF MT	1061	133	133	1327
Broadcast conversation	214	0	0	214
Weblog and WSJ	0	100	100	200
BOLT DF English	6455	210	229	6894
DEFT DF English	19558	0	0	19558
Guidelines AMRs	819	0	0	819
2009 Open MT	204	0	0	204
Proxy reports	6603	826	823	8252
Weblog	866	0	0	866
Xinhua MT	741	99	86	926
Totals	36521	1368	1371	39260

For those interested in utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 39,260 AMRs split roughly 93%/3.5%/3.5% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 39,260 AMRs with no train/dev/test partition.

Samples

Please view this sample.

Updates

None at this time.

Acknowledgements

From University of Colorado

From Information Sciences Institute (ISI)

From Linguistic Data Consortium (LDC)

From Language Weaver (SDL)

Portions © 2002-2005, 2007-2008 Agence France Presse, © 2007 Al Ahram, © 2007 Al Hayat, © 2007 Al-Quds Al-Arabi, © 2007 Asharq Al-Awsat, © 2007 An Nahar, © 2007 Assabah, © 2002-2008 The Associated Press, © 2003-2004, 2007-2008 Central News Agency (Taiwan), © 1997, 2004-2007 China Central TV, © 2007 China Military Online, © 2007 Chinanews.com, © 1987-1989 Dow Jones & Company, Inc., © 2007 Guangming Daily, © 1995, 2003, 2007-2008 Los Angeles Times-Washington Post News Service, Inc., © 2002, 2004-2005, 2007-2008 New York Times, © 1994-1998, 2001-2008 Xinhua News Agency, © 2014, 2017 Language Weaver, Inc., © 2014, 2017 University of Colorado, © 2014, 2017 University of Southern California, © 2003, 2005, 2006, 2007, 2009, 2011, 2013, 2014, 2017 Trustees of the University of Pennsylvania

Authors

Knight, Kevin ;
Badarau, Bianca ;
Baranescu, Laura ;
Bonial, Claire ;
Bardocz, Madalina ;
Griffitt, Kira ;
Hermjakob, Ulf ;
Marcu, Daniel ;
Palmer, Martha ;
O'Gorman, Tim ;
Schneider, Nathan

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/s444-np872017

Abstract Meaning Representation (AMR) Annotation Release 1.0

Introduction

Abstract Meaning Representation (AMR) Annotation Release 1.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums.

LDC also released Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).

Data

The source data includes discussion forums collected for the DARPA BOLT program, Wall Street Journal and translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

Dataset	Training	Dev	Test	Totals
BOLT DF MT	1061	133	133	1327
Weblog and WSJ	0	100	100	200
BOLT DF English	1703	210	229	2142
2009 Open MT	204	0	0	204
Xinhua MT	741	99	86	926
Totals	3709	542	548	4799

For those interested in a utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 13,051 AMRs divided roughly 80/10/10 into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 13,051 AMRs with no train/dev/test partition.

Samples

Please view this sample.

Updates

None at this time.

Acknowledgements

From University of Colorado

From Information Sciences Institute

From Linguistic Data Consortium

From Language Weaver (SDL)

Portions © 2007 Agence France Presse, Al-Ahram, Al Hayat, Al-Quds Al-Arabi, Asharq Al-Awsat, An Nahar, Assabah, China Military Online, Chinanews.com, Guangming Daily, © 1987-1989 Dow Jones & Company, Inc., © 1994-1998, 2007 Xinhua News Agency, © 2014 Language Weaver, Inc., © 2014 University of Colorado, © 2014 University of Southern California, © 2004, 2007, 2013, 2014 Trustees of the University of Pennsylvania

Authors

Palmer, Martha ;
Marcu, Daniel ;
Griffitt, Kira ;
Knight, Kevin ;
Baranescu, Laura ;
Bonial, Claire ;
Georgescu, Madalina ;
Hermjakob, Ulf ;
Schneider, Nathan

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/0ync-74042014

Automated Author ProfileGriffitt, Kira

Griffitt, Kira

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Machine Reading Phase 1 IC Training Data

Introduction

Data

Acknowledgments

Samples

Updates

Abstract Meaning Representation (AMR) Annotation Release 3.0

Introduction

Data

Samples

Updates

Acknowledgements

Machine Reading Phase 1 NFL Scoring Training Data

Introduction

Data

Acknowledgments

Samples

Updates

BOLT Information Retrieval Comprehensive Training and Evaluation

Introduction

Data

Acknowledgement

Samples

Updates

Abstract Meaning Representation (AMR) Annotation Release 2.0

Introduction

Data

Samples

Updates

Acknowledgements

Abstract Meaning Representation (AMR) Annotation Release 1.0

Introduction

Data

Samples

Updates

Acknowledgements

Automated Author Profile
Griffitt, Kira