Scholar Data

Mixer 4 and 5 Speech

Description

Introduction

Mixer 4 and 5 Speech was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 14,185 hours of audio recordings of conversational telephone speech, interviews, elicitation exercises and transcript readings involving 616 distinct speakers. The material was collected in 2007 as part of the Mixer project and recordings in this corpus were used in the 2008 NIST Speaker Recognition Evaluation (SRE).

The data in this release was collected in 2007 by LDC at its Human Subjects Data Collection Laboratories in Philadelphia and by the International Computer Science Institute (ICSI) at the University of California, Berkeley. The Mixer 4 and Mixer 5 collections were conducted simultaneously, as a collaborative, carefully coordinated activity at both recording sites.

The telephone protocol connected recruited speakers through a robot operator to carry on casual conversations. In Mixer 4, 400 subjects made ten 10-minute calls; half of those subjects also visited one of the collection sites where they made two telephone calls while also being recorded on a cross-channel platform. In Mixer 5, 300 subjects each completed ten calls and six interview sessions at either LDC or ICSI; those sessions were conducted on a cross channel platform and included a telephone call in one of three vocal-effort conditions - normal, high and low. Mixer participants were nearly all native English speakers, the rest being bilingual English speakers.

Researchers interested in applying NIST 2008 SRE benchmark test sets should consult the respective NIST Evaluation Plans for guidelines on allowable training data for those tests. Training, evaluation and supplemental data from 2008 SRE are available in the LDC Catalog: 2008 NIST Speaker Recognition Evaluation Training Set Part 1 (LDC2011S05), 2008 NIST Speaker Recognition Evaluation Training Set Part 2 (LDC2011S07), 2008 NIST Speaker Recognition Evaluation Test Set (LDC2011S08) and 2008 NIST Speaker Recognition Evaluation Supplemental Set (LDC2011S11).

Data

The Mixer 4 and 5 collection contains 2,568 recordings made via the public telephone network and 2,152 sessions of multiple microphone recordings in office-room settings. The telephone recordings are presented as 8-KHz 2-channel NIST SPHERE files, and the microphone recordings are 16-KHz 1-channel flac/ms-wav files.

When the microphone recording flac files are uncompressed, they become ms-wav/RIFF files (flac compression does not presently support SPHERE file format).

The telephone audio is presented in SPHERE format because this is consistent with other LDC telephone audio releases and because flac does not support ulaw sample encoding. The open-source SoX utility is able to handle both formats as input. Other utilities are available for flac and SPHERE formats.

Metadata about the calls and speakers is also included in this release, along with time-aligned entries for many of the component portions of the recording sessions.

Samples

Please listen to this telephone sample (SPH) and microphone sample (FLAC).

Updates

None at this time.

Authors

Mirghafori, Nikki ;
Brandschain, Linda ;
Walker, Kevin ;
Graff, David ;
Cieri, Christopher ;
Neely, Abby ;
Peskin, Barbara ;
King, Mike ;
Godfrey, Jack ;
Strassel, Stephanie ;
Goodman, Fred ;
Doddington, George R.

0 Citations0 Mentions35% FAIR0.8 Dataset Index

10.35111/xq98-yj912020

2010 NIST Speaker Recognition Evaluation Test Set

Introduction

2010 NIST Speaker Recognition Evaluation Test Set was developed by the Linguistic Data Consortium (LDC) and NIST (National Institute of Standards and Technology). It contains 2,255 hours of American English telephone speech and speech recorded over a microphone channel involving an interview scenario used as test data in the NIST-sponsored 2010 Speaker Recognition Evaluation (SRE).

The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported and to be accessible to those wishing to participate.

The 2010 evaluation was similar to the 2008 evaluation by including in the training and test conditions for the core test not only conversational telephone speech (CTS) recorded over ordinary telephone channels, but also CTS and conversational interview speech recorded over a room microphone channel. Unlike prior evaluations, some of the conversational telephone style speech was collected in a manner to produce particularly high, or particularly low, vocal effort on the part of the speaker of interest.

Data

The speech recordings in this release were collected in 2009 and 2010 by LDC at its Human Subjects Collection facility in Philadelphia. This collection was part of the Mixer 6 project, which was designed to support the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones.

The telephone speech segments include two-channel excerpts of approximately 10 seconds and 5 minutes. There are also summed-channel excerpts in the range of 5 minutes. The microphone excerpts are 3-15 minutes in duration. As in prior evaluations, intervals of silence were not removed. The data included in this release is 8 bit ulaw with a sample rate of 8000.

In addition to evaluation data, this package also consists of answer keys, trial and train files, development data and evaluation documentation.

Samples

Please listen to this audio sample.

Updates

None at this time.

Authors

Greenberg, Craig ;
Martin, Alvin ;
Graff, David ;
Brandschain, Linda ;
Walker, Kevin

2 Citations0 Mentions35% FAIR1.9 Dataset Index

10.35111/fjsq-a1172017

2009 NIST Language Recognition Evaluation Test Set

Introduction

2009 NIST Language Recognition Evaluation Test Set contains approximately 215 hours of conversational telephone speech and radio broadcast conversation collected by the Linguistic Data Consortium (LDC) in the following 23 languages and dialects: Amharic, Bosnian, Cantonese, Creole (Haitian), Croatian, Dari, English (American), English (Indian), Farsi, French, Georgian, Hausa, Hindi, Korean, Mandarin, Pashto, Portuguese, Russian, Spanish, Turkish, Ukrainian, Urdu and Vietnamese.

The goal of the NIST (National Institute of Standards and Technology) Language Recognition Evaluation (LRE) is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. NIST conducted language recognition evaluations in 1996, 2003, 2005 and 2007. The 2009 evaluation increased the number of target languages. Most of the test data originated from multilingual Voice of America (VOA) radio broadcasts assessed as being of telephone bandwidth in addition to conversational telephone speech. Further information regarding this evaluation can be found in the evaluation plan which is included in the documentation for this release.

LDC released other LREs as:

2003 NIST Language Recognition Evaluation (LDC2006S31)

2005 NIST Language Recognition Evaluation (LDC2008S05)

2007 NIST Language Recognition Evaluation Test Set (LDC2009S04)

2007 NIST Language Recognition Evaluation Supplemental Training Set (LDC2009S05)

2011 NIST Language Recognition Evaluation Test Set (LDC2018S06)

Data

The VOA speech data was collected by LDC in 2000 and 2001 and constitutes approximately 75% of the test set. The telephone speech was taken from LDC's Mixer 3 collection recorded between 2005 and 2007.

All test speech segments are presented as a sampled data stream in standard 8-bit 8-kHz μ-law format. Each segment is stored separately in a single channel SPHERE format file.

The test segments contain three nominal durations of speech: 3 seconds, 10 seconds and 30 seconds. Actual speech durations vary, but were constrained to be within the ranges of 2-4 seconds, 7-13 seconds and 23-35 seconds, respectively. Non-speech portions of each segment were included in each segment so that a segment contained a continuous sample of the source recording. Therefore, the test segments may be significantly longer than the speech duration, depending on how much non-speech was included.

Samples

Please listen to this audio sample.

Updates

None at this time.

Authors

Martin, Alvin ;
Greenberg, Craig ;
Graff, David ;
Walker, Kevin ;
Brandschain, Linda

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/qv7y-50262014

Mixer 6 Speech

Introduction

Mixer 6 Speech was developed by the Linguistic Data Consortium (LDC) and comprises 15,863 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 594 distinct native English speakers. This material was collected by LDC in 2009 and 2010 as part of the Mixer project, specifically phase 6, the focus of which was on native American English speakers local to the Philadelphia area.

The speech data in this release was collected by LDC at its Human Subjects Collection facilities in Philadelphia. The telephone collection protocol was similar to other LDC telephone studies (e.g., Switchboard-2 Phase III Audio - LDC2002S06): recruited speakers were connected through a robot operator to carry on casual conversations lasting up to 10 minutes, usually about a daily topic announced by the robot operator at the start of the call. The raw digital audio content for each call side was captured as a separate channel, and each full conversation was presented as a 2-channel interleaved audio file, with 8000 samples/second and u-law sample encoding. Each speaker was asked to complete 15 calls.

The multi-microphone portion of the collection utilized 14 distinct microphones installed identically in two mutli-channel audio recording rooms at LDC. Each session was guided by collection staff using prompting and recording software to conduct the following activities: (1) repeat questions (less than one minute), (2) informal conversation (typically 15 minutes), (3) transcript reading (approximately 15 minutes) and (4) telephone call (generally 10 minutes). Speakers recorded up to three 45-minute sessions on distinct days. The 14 channels were recorded synchronously into separate single-channel files, using 16-bit PCM sample encoding at 16000 samples/second.

Certain demographic information about the speakers was collected, including date of birth, level of education, native language, other language capability, place of birth, place of residence and occupation.

The recordings in this corpus were used in NIST Speaker Recognition Evaluation (SRE) test sets for 2010. Researchers interested in applying those benchmark test sets should consult the respective NIST Evaluation Plans for guidelines on allowable training data for those tests.

Data

The collection contains 4,410 recordings made via the public telephone network and 1,425 sessions of multiple microphone recordings in office-room settings. The telephone recordings are presented as 8-KHz 2-channel NIST SPHERE files, and the microphone recordings are 16-KHz 1-channel flac/ms-wav files. All audio files names indicate the date and time when the recording began, along with other identifying information, as follows:

Telephone: {yyyymmdd}_{hrmnsc}_{callid}.sph

Microphone: {yyyymmdd}_{hrmnsc}_{room}_{subjid}_CH{nn}.flac

yyyymmdd is the year, month and date of recording.

hrmnsc is the hour, minute and second when recording began

callid is a unique, incremental number assigned to each call

room is either LDC or HRM, indicating which office was used

subjid is a numeric identifier assigned to the speaker

When the flac files are uncompressed, they become ms-wav/RIFF files (flac compression does not presently support SPHERE file format).

The telephone audio is presented in SPHERE format because (a) this is consistent with other telephone audio releases from LDC, and (b) flac does not support ulaw sample encoding. The current release of the open-source SoX utility is able to handle both formats as input. Other utilities are available for both flac and SPHERE formats.

Samples

Please listen to this audio sample.

Updates

None at this time.

Additional Licensing Instructions

This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact [email protected] for information about becoming a member.

Authors

Brandschain, Linda ;
Graff, David ;
Walker, Kevin ;
Cieri, Christopher

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/s1w9-y4112013

Greybeard

Introduction

Greybeard was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 590 hours of English telephone conversation speech collected in October and November 2008 by LDC. The goal was to record new telephone conversations among subjects who had participated in one or more previous LDC telephone collections, from Switchboard-1 (1991) through the Mixer studies (2006).

A total of 172 subjects were enrolled in the Greybeard collection, all of whom had participated in one of the following:

Switchboard-1 (LDC97S62) 1991-1992: 2 subjects
Switchboard-2 (LDC98S75, LDC99S79, LDC2002S06) 1996-1997: 16 subjects
Mixer 1 and 2 2003-2005: 103 subjects
Mixer 3 2006: 51 subjects

Most Greybeard participants completed 12 calls. Some subjects completed up to 24 calls. Calls were made or received via an automatic operator system at LDC which connected two participants and announced a topic for discussion.

Data

This releases consists of 4680 calls -- the complete set of calls recorded during the Greybeard collection (1098 calls) as well as all calls from the legacy collections that involved the Greybeard speakers.

The audio from each call was captured digitally by the operator system and stored in a separate file as raw mu-law sample data. As the recordings were uploaded daily from the robot operator to network disk storage, automated processes reformatted the audio into a 2-channel SPHERE-format file for each conversation and queued the recordings for manual audit to verify speaker identification and to check other aspects of the recording. Auditors provided impressionistic judgments on overall audio quality, presence of background noise and cross-channel echo and any other technical difficulty with the call, in addition to confirming the speaker-ID on each channel. These auditor decisions are provided in the call_info tables, described in more detail in the included documentation.

For this release, each 2-channel recording was converted from SPHERE to MS-WAV file format and compressed using FLAC. All audio files are 2-channel, 8 KHz, 16-bit PCM sample data, in FLAC-compressed form (http://flac.sourceforge.net). When uncompressed, they have MS-WAV/RIFF headers.

Samples

Please listen to the following audio sample.

Updates

None at this time.

Authors

Brandschain, Linda ;
Graff, David

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/tahq-9n252013

Automated Author Profile
Brandschain, Linda

Brandschain, Linda

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Mixer 4 and 5 Speech

Description

Introduction

Data

Samples

Updates

2010 NIST Speaker Recognition Evaluation Test Set

Introduction

Data

Samples

Updates

2009 NIST Language Recognition Evaluation Test Set

Introduction

Samples

Updates

Mixer 6 Speech

Introduction

Data

Samples

Updates

Additional Licensing Instructions

Greybeard

Introduction

Data

Samples

Updates

Automated Author ProfileBrandschain, Linda

Brandschain, Linda

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Mixer 4 and 5 Speech

Description

Introduction

Data

Samples

Updates

2010 NIST Speaker Recognition Evaluation Test Set

Introduction

Data

Samples

Updates

2009 NIST Language Recognition Evaluation Test Set

Introduction

Samples

Updates

Mixer 6 Speech

Introduction

Data

Samples

Updates

Additional Licensing Instructions

Greybeard

Introduction

Data

Samples

Updates

Automated Author Profile
Brandschain, Linda