Automated Author Profile

Schneider, Nathan

Current S-Index

19.3

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

6.4

Average Dataset Index per dataset

Total Datasets

3

Total datasets for this author

Average FAIR Score

39.7%

Average FAIR Score per dataset

Total Citations

38

Total citations to the author's datasets

Total Mentions

0

Total mentions of the author's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Abstract Meaning Representation (AMR) Annotation Release 3.0

Introduction


Abstract Meaning Representation (AMR) Annotation Release 3.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 59,255 English natural language sentences from broadcast conversations, newswire, weblogs, web discussion forums, fiction and web text. This release adds new data to, and updates material contained in, Abstract Meaning Representation 2.0 (LDC2017T10), specifically: more annotations on new and prior data, new or improved PropBank-style frames, enhanced quality control, and multi-sentence annotations.


AMR captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.


LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12), and Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).


Data


The source data includes discussion forums collected for the DARPA BOLT AND DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. New source data to AMR 3.0 includes sentences from Aesop's Fables, parallel text and the situation frame data set developed by LDC for the DARPA LORELEI program, and lead sentences from Wikipedia articles about named entities.


The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:














































































































DatasetTrainingDevTestTotals
BOLT DF MT10611331331327
Broadcast conversation21400214
Weblog and WSJ0100100200
BOLT DF English73792102297818
DEFT DF English329150032915
Aesop fables490049
Guidelines AMRs97000970
LORELEI44413545275322
2009 Open MT20400204
Proxy reports66038268238252
Weblog86600866
Wikipedia19200192
Xinhua MT7419986926
Totals556351722189859255

Data in the "split" directory contains 59,255 AMRs split roughly 93.9%/2.9%/3.2% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 59,255 AMRs with no train/dev/test partition.


Samples


Please view this AMR sample.


Updates


None at this time.


Acknowledgements


From University of Colorado


We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government.


From Information Sciences Institute (ISI)


Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).


From Linguistic Data Consortium (LDC)


This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government.


We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.


From Language Weaver (SDL)


This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.


Portions © 1994-1996, 2002-2010 Agence France Presse, © 2007 Al-Ahram, © 2007 Al Hayat, © 2007 Al-Quds Al-Arabi, © 2000 American Broadcasting Company, © 2007 An Nahar, © 2007 Asharq Al-Awsat, © 2007 Assabah, © 2002-2008, 2010 The Associated Press, © 2000 Cable News Network LP, LLLP, © 2003-2004, 2007-2008 Central News Agency (Taiwan), © 1997, 2004-2007 China Central TV, © 2007 China Military Online, © 2007 Chinanews.com, © 1987-1989 Dow Jones & Company, Inc., © 2007 Guangming Daily, © 1995, 2003, 2005, 2007-2008 Los Angeles Times-Washington Post News Service, Inc., © 2000 National Broadcasting Company, Inc., © 1999, 2002, 2004-2008, 2010 New York Times, © 2000 Public Radio International, © 1994-1998, 2001-2008 Xinhua News Agency, © 2020 Trustees of the University of Pennsylvania

Authors

  • Knight, Kevin ;
  • Badarau, Bianca ;
  • Baranescu, Laura ;
  • Bonial, Claire ;
  • Griffitt, Kira ;
  • Hermjakob, Ulf ;
  • Marcu, Daniel ;
  • O'Gorman, Tim ;
  • Palmer, Martha ;
  • Schneider, Nathan ;
  • Bardocz, Madalina
38 Citations0 Mentions50% FAIR17.5 Dataset Index
10.35111/44cy-bp51January 2020

Abstract Meaning Representation (AMR) Annotation Release 2.0

Introduction


Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.


AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.


LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12).


Data


The source data includes discussion forums collected for the DARPA BOLT and DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

























































































DatasetTrainingDevTestTotals
BOLT DF MT10611331331327
Broadcast conversation21400214
Weblog and WSJ0100100200
BOLT DF English64552102296894
DEFT DF English195580019558
Guidelines AMRs81900819
2009 Open MT20400204
Proxy reports66038268238252
Weblog86600866
Xinhua MT7419986926
Totals365211368137139260

 


For those interested in utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 39,260 AMRs split roughly 93%/3.5%/3.5% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 39,260 AMRs with no train/dev/test partition.


Samples


Please view this sample.


Updates


None at this time.


Acknowledgements


From University of Colorado


We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government.


From Information Sciences Institute (ISI)


Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).


From Linguistic Data Consortium (LDC)


This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government.


We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.


From Language Weaver (SDL)


This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.


Portions © 2002-2005, 2007-2008 Agence France Presse, © 2007 Al Ahram, © 2007 Al Hayat, © 2007 Al-Quds Al-Arabi, © 2007 Asharq Al-Awsat, © 2007 An Nahar, © 2007 Assabah, © 2002-2008 The Associated Press, © 2003-2004, 2007-2008 Central News Agency (Taiwan), © 1997, 2004-2007 China Central TV, © 2007 China Military Online, © 2007 Chinanews.com, © 1987-1989 Dow Jones & Company, Inc., © 2007 Guangming Daily, © 1995, 2003, 2007-2008 Los Angeles Times-Washington Post News Service, Inc., © 2002, 2004-2005, 2007-2008 New York Times, © 1994-1998, 2001-2008 Xinhua News Agency, © 2014, 2017 Language Weaver, Inc., © 2014, 2017 University of Colorado, © 2014, 2017 University of Southern California, © 2003, 2005, 2006, 2007, 2009, 2011, 2013, 2014, 2017 Trustees of the University of Pennsylvania

Authors

  • Knight, Kevin ;
  • Badarau, Bianca ;
  • Baranescu, Laura ;
  • Bonial, Claire ;
  • Bardocz, Madalina ;
  • Griffitt, Kira ;
  • Hermjakob, Ulf ;
  • Marcu, Daniel ;
  • Palmer, Martha ;
  • O'Gorman, Tim ;
  • Schneider, Nathan
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/s444-np87June 2017

Abstract Meaning Representation (AMR) Annotation Release 1.0

Introduction


Abstract Meaning Representation (AMR) Annotation Release 1.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums.


AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.


LDC also released Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10).


 


Data


The source data includes discussion forums collected for the DARPA BOLT program, Wall Street Journal and translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:






















































DatasetTrainingDevTestTotals
BOLT DF MT10611331331327
Weblog and WSJ0100100200
BOLT DF English17032102292142
2009 Open MT20400204
Xinhua MT7419986926
Totals37095425484799

 


For those interested in a utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 13,051 AMRs divided roughly 80/10/10 into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 13,051 AMRs with no train/dev/test partition.


Samples


Please view this sample.


Updates


None at this time.


Acknowledgements


From University of Colorado


We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government.


From Information Sciences Institute


Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).


From Linguistic Data Consortium


This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government.


We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government.


From Language Weaver (SDL)


This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.


Portions © 2007 Agence France Presse, Al-Ahram, Al Hayat, Al-Quds Al-Arabi, Asharq Al-Awsat, An Nahar, Assabah, China Military Online, Chinanews.com, Guangming Daily, © 1987-1989 Dow Jones & Company, Inc., © 1994-1998, 2007 Xinhua News Agency, © 2014 Language Weaver, Inc., © 2014 University of Colorado, © 2014 University of Southern California, © 2004, 2007, 2013, 2014 Trustees of the University of Pennsylvania

Authors

  • Palmer, Martha ;
  • Marcu, Daniel ;
  • Griffitt, Kira ;
  • Knight, Kevin ;
  • Baranescu, Laura ;
  • Bonial, Claire ;
  • Georgescu, Madalina ;
  • Hermjakob, Ulf ;
  • Schneider, Nathan
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/0ync-7404June 2014