Summary: | "Produce[d] by Center for Spoken Language Understanding and distributed by the Linguistic Data Consortium, the 22 Language corpus consists of telephone speech from 21 languages: Eastern Arabic, Cantonese, Czech, Farsi, German, Hindi, Hungarian, Japanese, Korean, Malay, Mandarin, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Tamil, Vietnamese, and English. The corpus contains fixed vocabulary utterances (e.g. days of the week) as well as fluent continuous speech. Each of the 50191 utterances is verified by a native speaker to determine if the caller followed instructions when answering the prompts. For this release, approximately 19758 utterances have corresponding orthographic transcriptions."--Introd.
|