Automated Author Profile

Jones, Karen

Current S-Index

16.5

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

1.0

Average Dataset Index per dataset

Total Datasets

17

Total datasets for this author

Average FAIR Score

34.6%

Average FAIR Score per dataset

Total Citations

4

Total citations to the author's datasets

Total Mentions

0

Total mentions of the author's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese

Introduction


Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 25 hours of telephone speech in Mandarin Chinese.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:



















grouplng#calls#hours#MB
chinesecmn13225.61260

Samples


Please listen to this audio sample.


Updates


None at this time.


Portions © 2020 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.8 Dataset Index
10.35111/h61y-f6362020

2018 NIST Speaker Recognition Evaluation Test Set

Introduction


2018 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 396 hours of Tunisian Arabic telephone recordings and English web video speech used as development and test data in the NIST-sponsored 2018 Speaker Recognition Evaluation (SRE).


The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported and to be accessible to those wishing to participate.


The SRE task is speaker detection, that is, to determine whether a specified target speaker is speaking during a segment of speech. In addition to the traditional focus on telephone speech recorded over a variety of handset types for the training and test conditions, SRE18 added voice over IP data and audio from video. Further information about the evaluation, including the features added in SRE18, is contained in the evaluation plan included in this release.


Data


The telephone speech data was drawn from the Call My Net 2 (CMN2) collection conducted by LDC in Tunisia in which Tunisian Arabic speakers called friends or relatives who agreed to record their telephone conversations lasting between 8-10 minutes. The speech segments include PSTN (public switched telephone network) and VOIP (voice over IP) data.


The English audio was sampled from amateur web videos collected by LDC as part of the Video Annotation for Speech Technology (VAST) project.


Telephone speech is presented as 8 bit a-law with a sample rate of 8000.


The VAST data are presented as 16 bit FLAC files sampled at 44 kHz.


In addition to development and evaluation data, this corpus also contains answer keys, trial and train files, development data and evaluation documentation.


Samples


Please view this telephone sample (SPH) and audio from video sample (FLAC).


Updates


None at this time.


Portions © 2011-2018 YouTube, LLC, © 2020 Trustees of the University of Pennsylvania

Authors

  • Greenberg, Craig ;
  • Sadjadi, Omid ;
  • Walker, Kevin ;
  • Jones, Karen ;
  • Wright, Jonathan ;
  • Strassel, Stephanie ;
  • Singer, Elliot
1 Citation0 Mentions35% FAIR1.2 Dataset Index
10.35111/secv-qh252020

2016 NIST Speaker Recognition Evaluation Test Set

Introduction


2016 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains approximately 340 hours of short segments of Tagalog, Cantonese, Cebuano and Mandarin telephone speech used as development and test data in the NIST-sponsored 2016 Speaker Recognition Evaluation (SRE).


The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported and to be accessible to those wishing to participate.


The SRE task is speaker detection, that is, to determine whether a specified target speaker is speaking during a given segment of speech. As in previous evaluations, SRE16 focused on telephone speech recorded over a variety of handset types for the training and test conditions. Further information about the evaluation, including some features added in SRE16, is contained in the evaluation plan included in this release.


Data


The telephone speech data was drawn from the Call My Net 2015 Corpus collected by LDC. Native speakers of Tagalog, Cantonese, Cebuano or Mandarin (220 unique speakers) made a total of ten telephone calls each, talking to people within their existing social networks. Speakers were encouraged to use different telephone instruments in a variety of acoustic settings and were instructed to talk for 8-10 minutes per call on a topic of their choice. All conversations were collected outside North America.


Speech data is encoded as a-law, sampled at 8kHz, and stored in SPHERE formatted files.


In addition to development and evaluation data, this corpus also contains trial lists, their associated keys, tables containing metadata information, and evaluation documentation.


Samples


Please view this speech sample.


Updates


None at this time.


Portions © 2015, 2019 Trustees of the University of Pennsylvania

Authors

  • Greenberg, Craig ;
  • Jones, Karen ;
  • Walker, Kevin ;
  • Strassel, Stephanie ;
  • Graff, David ;
  • Sadjadi, Omid ;
  • Kheyrkhah, Timothee
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/bd9y-k6192019

Multi-Language Conversational Telephone Speech 2011 -- East Asian

Introduction


Multi-Language Conversational Telephone Speech 2011 -- East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:

































grouplng#calls#hours#MB
e_asianlao6312.4539
e_asiantha386.9354
 totals10119.3893

Samples


Please view this sample.


Updates


None at this time.


Portions © 2019 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/e67e-j6272019

Multi-Language Conversational Telephone Speech 2011 -- English Group

Introduction


Multi-Language Conversational Telephone Speech 2011 -- English Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 18 hours of telephone speech in two general varieties of English: American and South Asian.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:

































grouplng#calls#hours#MB
englisheng6213.5589
englisheni264.9242
englishtotals8818.4831

Samples


Please listen to this audio sample.


Updates


None at this time.


Portions © 2019 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/grhk-n3392019

Multi-Language Conversational Telephone Speech 2011 -- Arabic Group

Introduction


Multi-Language Conversational Telephone Speech 2011 -- Arabic Group was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 117 hours of telephone speech in distinct dialects of colloquial Arabic: Iraqi, Levantine and Maghrebi.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:








































grouplng#calls#hours#MB
arabiciraqi21037.41908
arabiclevantine22541.12041
arabicmaghrebi20738.62024
arabictotals642117.15973

Samples


Please view this audio sample.


Updates


None at this time.


Portions © 2019 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/w8hd-px332019

Multi-Language Conversational Telephone Speech 2011 -- Spanish

Introduction


Multi-Language Conversational Telephone Speech 2011 -- Spanish was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 23 hours of telephone speech in Spanish.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:



















grouplng#calls#hours#MB
spanishspa12523.61148

Samples


Please listen to this audio sample.


Updates


None at this time.


Portions © 2018 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/yfn1-44022018

2011 NIST Language Recognition Evaluation Test Set

Introduction


2011 NIST Language Recognition Evaluation Test Set contains selected training data and the evaluation test set for the 2011 NIST Language Recognition Evaluation. It consists of approximately 204 hours of conversational telephone speech and broadcast audio collected by the Linguistic Data Consortium (LDC) in the following 24 languages and dialects: Arabic (Iraqi), Arabic (Levantine), Arabic (Maghrebi), Arabic (Standard), Bengali, Czech, Dari, English (American), English (Indian), Farsi, Hindi, Lao, Mandarin, Punjabi, Pashto, Polish, Russian, Slovak, Spanish, Tamil, Thai, Turkish, Ukrainian and Urdu.


The goal of the NIST (National Institute of Standards and Technology) Language Recognition Evaluation (LRE) is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. NIST conducted language recognition evaluations in 1996, 2003, 2005, 2007, and 2009. The 2011 evaluation emphasized the language pair condition and involved both conversational telephone speech (CTS) and broadcast narrow-band speech (BNBS). Further information regarding this evaluation can be found in the evaluation plan which is also included in the documentation for this release.


LDC released the prior LREs as:



  • 2003 NIST Language Recognition Evaluation (LDC2006S31)

  • 2005 NIST Language Recognition Evaluation (LDC2008S05)

  • 2007 NIST Language Recognition Evaluation Test Set (LDC2009S04)

  • 2007 NIST Language Recognition Evaluation Supplemental Training Set (LDC2009S05)

  • 2009 NIST Language Recognition Evaluation Test Set (LDC2014S06)


Data


This release includes training data for nine language varieties that had not been represented in prior LRE cycles -- Arabic (Iraqi), Arabic (Levantine), Arabic (Maghrebi), Arabic (Standard), Czech, Lao, Punjabi, Polish and Slovak -- contained in 893 audited segments of roughly 30 seconds duration and in 400 full-length CTS recordings. The evaluation test set comprises a total of 29,511 audio files, all manually audited at LDC for language and divided equally into three different test conditions according to the nominal amount of speech content per segment.


Data was collected by LDC between 2009 and 2011. The CTS data was obtained using a "claque" collection model in which speakers (claques) called friends or relatives in their social network for a 10-minute conversation in the claque's native language, such that each call would involve a unique callee. Participants were free to speak on topics of their own choosing. All calls were routed through a telephone collection system at LDC which stored the raw mu-law sample stream into separate audio files for each call side. Auditing and selection were applied to the callee side of every call and to the caller (claque) side in at most one call made by each claque. Contiguous regions containing between 25 and 35 seconds of speech were identified by signal analysis and extracted for manual audit. In some cases, shorter segments were also selected for audit.


Broadcast audio was recorded via capture of satellite-receiver MPEG streams or analog audio receivers digitizing at 16 KHz. Platforms for data capture were located at LDC and in Tunisia and India. Recordings were analyzed to extract contiguous segments of narrow-band speech of at least 33 seconds duration; longer segments were trimmed to a maximum length of 35 seconds for audit.


All audited segments for training and test are presented as 8-KHz, 16-bit PCM, single-channel audio files with NIST SPHERE headers. The full-length CTS data is the same, except that it consists of two channels.


Samples


Please listen to this Urdu sample, Pashto sample, and English sample.


Updates


None at this time.


Portions © 2011 ABP News, © 2010 Alkass Sports Channel, © 2010-2011 Aljazeera, © 2010 Al Mustakillah TV, © 2010-2011 Al-Shirkatul Islamiyyah, © 2010 Alsumaria TV, © 2010 American Broadcasting Company, © 2011 Amrit Bani TV, © 2010 Android Television Network, © 2011 Appadana International Broadcasting Corp., © 2010-2011 Ariamehr International TV, © 2010-2011 Ariana Afghanistan International Television Network, © 2010 Assyria Sat, © 2010-2011 Atimemedia Co., Ltd, © 2011 Bayyinah Productions LLC, © 2009-2011 BBC, © 2010 Cable News Network, LP, LLLP, © 2010-2011 Channel One TV, © 2009-2010 China Central TV, © 2011 Czech Television, © 2011 ET Now, © 2011 Frequency 1, © 2010 Impact Television Network, © 2011 Independent News Service, © 2010 Iran TVNetwork.com, © 2010 IRINN, © 2010 Jiangsu Radio and Television General Station, © 2010 National Broadcasting Company, Inc., © 2010-2011 NATTV.com, © 2011 Natural TV, © 2010 New Tang Dynasty TV, © 2010-2011 Persian Radio & Iranian Live TV, © 2010 Persian TV One, © 2010 Phoenix TV, © 2010-2011 Polskie Radio S.A., © 2010 Qatar Radio, © 2011 Radio National Television of Laos, © 2010-2011 Radio Sedaye Ashena, © 2010-2011 Radio Television of Afghanistan, © 2010 Radio Tunis, © 2010-2011 Rangarang, © 2010 RAZ-E-ZINDAGI, © 2011 Rajya Sahba TV, © 2011 RTVS - Rozhlas a televízia Slovenska, © 2010 SAT-7 International, © 2010 Sharjah Media Corporation, © 2009 Spanish Radio and Television Corporation, © 2010 Syria TV, © 2010-2011 Thai TV Global Network, © 2011 TOLOnews, © 2010-2011 TRT World, © 2010 UTR, © 2010 Yemen TV, © 2009, 2010, 2011, 2018 Trustees of the University of Pennsylvania

Authors

  • Greenberg, Craig ;
  • Martin, Alvin ;
  • Graff, David ;
  • Walker, Kevin ;
  • Jones, Karen ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/8j8e-vy572018

RATS Language Identification

Introduction


RATS Language Identification was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 5,400 hours of Levantine Arabic, Farsi, Dari, Pashto and Urdu conversational telephone speech with annotation of speech segments. The corpus was created to provide training, development and initial test sets for the Language Identification (LID) task in the DARPA RATS (Robust Automatic Transcription of Speech) program.


The goal of the RATS program was to develop human language technology systems capable of performing speech detection, language identification, speaker identification and keyword spotting on the severely degraded audio signals that are typical of various radio communication channels, especially those employing various types of handheld portable transceiver systems. To support that goal, LDC assembled a system for the transmission, reception and digital capture of audio data that allowed a single source audio signal to be distributed and recorded over eight distinct transceiver configurations simultaneously. Those configurations included three frequencies -- high, very high and ultra high -- variously combined with amplitude modulation, frequency hopping spread spectrum, narrow-band frequency modulation, single-side-band or wide-band frequency modulation. Annotations on the clear source audio signal, e.g., time boundaries for the duration of speech activity, were projected onto the corresponding eight channels recorded from the radio receivers.


Data


The source audio consists of conversational telephone speech recordings from: (1) conversational telephone speech (CTS) recordings, taken either from previous LDC CTS corpora, or from CTS data collected specifically for the RATS program from Levantine Arabic, Pashto, Urdu, Farsi and Dari native speakers; and (2) portions of VOA broadcast news recordings, taken from data used in the 2009 NIST Language Recognition Evaluation. The 2009 LRE Test Set is available from LDC as LDC2014S06.


CTS recordings were audited by annotators who listened to short segments and determined whether the audio was in the target language. Annotations on the audio files include start time, end time, speech activity detection (SAD) label, SAD provenance, language ID and LID provenance.


All audio files are presented as single-channel, 16-bit PCM, 16000 samples per second; lossless FLAC compression is used on all files; when uncompressed, the files have typical "MS-WAV" (RIFF) file headers.


The data is divided for use as training, initial development set, and initial evaluation set.


Samples


Please view this audio sample.


Updates


None at this time.


Acknowledgment


This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D10PC20016. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.


Portions © 2000, 2001, 2004, 2005, 2007, 2014, 2018 Trustees of the University of Pennsylvania

Authors

  • Graff, David ;
  • Ma, Xiaoyi ;
  • Strassel, Stephanie ;
  • Walker, Kevin ;
  • Jones, Karen
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/xjdn-0g132018

Multi-Language Conversational Telephone Speech 2011 -- Central European

Introduction


Multi-Language Conversational Telephone Speech 2011 -- Central European was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 44 hours of telephone speech in two distinct language varieties of Central Europe: Czech and Slovak.


The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.


LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:



Data


Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.


All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:

































grouplng#calls#hours#MB
c_europces9221.91076
c_europslk9322.61104
c_europTotals18544.52180

Samples


Please listen to this sample.


Updates


None at this time.


Portions © 2018 Trustees of the University of Pennsylvania

Authors

  • Jones, Karen ;
  • Graff, David ;
  • Walker, Kevin ;
  • Strassel, Stephanie
0 Citations0 Mentions35% FAIR0.9 Dataset Index
10.35111/gf0w-dw702018