Summary: | "This file contains documentation on the CSLU Speaker Recognition Corpus, version 1.1, Linguistic Data Consortium (LDC) catalog number LDC2006S26 and isbn 1-58563-382-8. The Speaker Recognition corpus (formerly known as Speaker Verification), consists of telephone speech from 91 participants. Each participant has recorded speech in twelve sessions over a two-year period answering questions like "what is your eye color" or respond to prompts like "describe a typical day in your life." Most of the utterances in the release of the corpus have corresponding non-time-aligned word level transcriptions. In most of the CSLU data collections, each participant calls a toll free telephone number and answers a few question. CSLU records the speech, transcribes it, then packages it as a released corpus. The Speaker Recognition data collection was quite a bit more complicated. The goal of the data collection was to collect speech from each participant over a two year period. Each participant called call the data collection system twelve times over the two-year period and say the same utterances each time. Some of the recording sessions were only a few days apart and others several weeks apart. Participant followed the following calling schedule. During the first month, they called twice in a week. No calls were made in the second and third months. In the fourth month they made one call. No calls were made in the fifth and sixth months. This pattern repeated three more times for a total of twelve calls per participant. In order to balance the workload required to remind participants to call and to avoid large data collection bursts on the system, the participants were divided into twelve groups. Each group began the two-year schedule on subsequent months. The first group started in September, 1996. The second group started in October, 1996. And so on."--index.html.
|