MDE RT-04 training data speech /

Saved in:
Bibliographic Details
Imprint:[Philadelphia, PA] : Linguistic Data Consortium, c2005.
Description:2 DVD-ROMs ; 4 3/4 in.
Language:English
Subject:
Format: DVD Video E-Resource
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/7729193
Hidden Bibliographic Details
Varying Form of Title:Metadata extraction RT-04 training data speech
Other authors / contributors:Lee, Haejoong.
Strassel, Stephanie.
Linguistic Data Consortium.
ISBN:1585633577
9781585633579
Computer file characteristics:Computer data.
Notes:Title from index.html on DVD.
"LDC2005S16."
System requirements: DVD-ROM drive; software to process .wav files.
Summary:"This corpus was created by Linguistic Data Consortium to provide training data for the RT-04 Fall Metadata Extraction (MDE) Evaluation, part of the DARPA EARS (Efficient, Affordable, Reusable Speech-to-Text) Program. This data set has been created and distributed by Linguistic Data Consortium. This data was previously released to the EARS MDE community as LDC2004E31. The goal of MDE is to enable technology that can take raw Speech-to-Text output and refine it into forms that are of more use to humans and to downstream automatic processes. In simple terms, this means the creation of automatic transcripts that are maximally readable. This readability might be achieved in a number of ways: flagging non-content words like filled pauses and discourse markers for optional removal; marking sections of disfluent speech; and creating boundaries between natural breakpoints in the flow of speech so that each sentence or other meaningful unit of speech might be presented on a separate line within the resulting transcript. Natural capitalization, punctuation and standardized spelling, plus sensible conventions for representing speaker turns and identity are further elements in the readable transcript. LDC has defined a SimpleMDE annotation task specification and has annotated English telephone and broadcast news data to provide training data for MDE. The transcript and annotation files corresponding to this release are available as LDC2005ST4 (MDE RT04 Training Text/Annotations). In general, there is one-to-one correspondence between speech files and annotation files with one exception: several .ag.xml annotation files correspond to one speech file in the bnews corpus. It's because the bnews files were divided into roughly 5 minute chunks and then each chunk was annotated as a unit. These chunks are labeled with "-split001", "-split002", etc. Note that the ag-to-rttm script combines these chunks together, so the one-to-one correspondence is kept between the speech files and .rttm files."--index.html

Mansueto

Loading map link
Holdings details from Mansueto
Call Number: DVD P98.M455 2005
c.1 Available Loan period: standard loan  Request from Mansueto Scan and Deliver Need help? - Ask a Librarian

Mansueto

Loading map link
Holdings details from Mansueto
Call Number: DVD P98.M455 2005
c.2 Available Loan period: standard loan  Request from Mansueto Scan and Deliver Need help? - Ask a Librarian