Automated Author Profile

Sessa, Stephanie

Current S-Index

2.0

Sum of Dataset Indices for all datasets

Average Dataset Index per Dataset

2.0

Average Dataset Index per dataset

Total Datasets

1

Total datasets for this author

Average FAIR Score

34.6%

Average FAIR Score per dataset

Total Citations

2

Total citations to the author's datasets

Total Mentions

0

Total mentions of the author's datasets

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

RATS Speech Activity Detection

Introduction


RATS Speech Activity Detection was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 3,000 hours of Levantine Arabic, English, Farsi, Pashto, and Urdu conversational telephone speech with automatic and manual annotation of speech segments. The corpus was created to provide training, development and initial test sets for the Speech Activity Detection (SAD) task in the DARPA RATS (Robust Automatic Transcription of Speech) program.


The goal of the RATS program was to develop human language technology systems capable of performing speech detection, language identification, speaker identification and keyword spotting on the severely degraded audio signals that are typical of various radio communication channels, especially those employing various types of handheld portable transceiver systems. To support that goal, LDC assembled a system for the transmission, reception and digital capture of audio data that allowed a single source audio signal to be distributed and recorded over eight distinct transceiver configurations simultaneously. Those configurations included three frequencies -- high, very high and ultra high -- variously combined with amplitude modulation, frequency hopping spread spectrum, narrow-band frequency modulation, single-side-band or wide-band frequency modulation. Annotations on the clear source audio signal, e.g., time boundaries for the duration of speech activity, were projected onto the corresponding eight channels recorded from the radio receivers.


Data


The source audio consists of conversational telephone speech recordings collected by LDC: (1) data collected for the RATS program from Levantine Arabic, Farsi, Pashto and Urdu speakers; and (2) material from the Fisher English (LDC2004S13, LDC2005S13), and Fisher Levantine Arabic telephone studies (LDC2007S02), as well as from CALLFRIEND Farsi (LDC2014S01).


Annotation was performed in three steps. LDC's automatic speech activity detector was run against the audio data to produce a speech segmentation for each file. Manual first pass annotation was then performed as a quick correction of the automatic speech activity detection output. Finally, in a manual second pass annotation step, annotators reviewed first pass output and made adjustments to segments as needed.


All audio files are presented as single-channel, 16-bit PCM, 16000 samples per second; lossless FLAC compression is used on all files; when uncompressed, the files have typical "MS-WAV" (RIFF) file headers.


Samples


Please view this audio sample and annotation sample.


Updates


None at this time.


Acknowledgment


This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D10PC20016. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.


Portions © 2015 Trustees of the University of Pennsylvania

Authors

  • Walker, Kevin ;
  • Ma, Xiaoyi ;
  • Graff, David ;
  • Strassel, Stephanie ;
  • Sessa, Stephanie ;
  • Jones, Karen
2 Citations0 Mentions35% FAIR2.0 Dataset Index
10.35111/x1mr-fq112015