GALE phase 2 Chinese broadcast news transcripts.

Saved in:
Bibliographic Details
Imprint:[Philadelphia, PA] : Linguistic Data Consortium, c2013.
Description:1 CD-ROM ; 4 3/4 in.
Language:Chinese
Subject:
Format: E-Resource
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/9352101
Hidden Bibliographic Details
Other authors / contributors:Glenn, Meghan.
Linguistic Data Consortium.
ISBN:1585636576
9781585636570
Notes:LDC corpora LDC2013T20
Title from disc label.
Data type: Text.
Data source: Broadcast news.
Application: Speech recognition.
Authors: Meghan Glenn, Haejoong Lee, Stephanie Strassel, Kazuaki Maeda.
Also available on the Internet.
Chinese.
Summary:"The source broadcast recordings feature news broadcasts focusing principally on current events from the following sources: Anhui TV, a regional television station in Mainland China, Anhui Province, China Central TV (CCTV), a national and international broadcaster in Mainland China and Phoenix TV, a Hong Kong-based satellite television station .... The transcript files are in plain-text, tab-delimited format (TDF) with UTF-8 encoding, and the transcribed data totals 1,593,049 tokens." -- LDC online catalogue.
GALE Phase 2 Chinese Broadcast News Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 110 hours of Chinese broadcast news speech collected in 2006 and 2007 by LDC and Hong University of Science and Technology (HKUST), Hong Kong, during Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program.
The source broadcast recordings feature news broadcasts focusing principally on current events from the following sources: Anhui TV, a regional television station in Mainland China, Anhui Province, China Central TV (CCTV), a national and international broadcaster in Mainland China and Phoenix TV, a Hong Kong-based satellite television station.

MARC

LEADER 00000cmm a2200000Ia 4500
001 9352101
003 ICU
005 20140423152200.0
007 co nga---uuuuu
008 131121s2013 pau d chi d
020 |a 1585636576 
020 |a 9781585636570 
035 |a (OCoLC)863639772 
035 |a 9352101 
040 |a UAB  |b eng  |c UAB  |d UAB  |d UtOrBLW 
049 |a CGUA 
090 |a DATA LIB 00840 LDC 2013T20  |b AEU 
245 0 0 |a GALE phase 2 Chinese broadcast news transcripts. 
260 |a [Philadelphia, PA] :  |b Linguistic Data Consortium,  |c c2013. 
300 |a 1 CD-ROM ;  |c 4 3/4 in. 
336 |a computer dataset  |b cod  |2 rdacontent  |0 http://id.loc.gov/vocabulary/contentTypes/cod 
337 |a computer  |b c  |2 rdamedia  |0 http://id.loc.gov/vocabulary/mediaTypes/c 
338 |a other  |b cz  |2 rdacarrier 
500 |a LDC corpora LDC2013T20 
500 |a Title from disc label. 
546 |a Chinese. 
500 |a Data type: Text. 
500 |a Data source: Broadcast news. 
500 |a Application: Speech recognition. 
500 |a Authors: Meghan Glenn, Haejoong Lee, Stephanie Strassel, Kazuaki Maeda. 
520 |a "The source broadcast recordings feature news broadcasts focusing principally on current events from the following sources: Anhui TV, a regional television station in Mainland China, Anhui Province, China Central TV (CCTV), a national and international broadcaster in Mainland China and Phoenix TV, a Hong Kong-based satellite television station .... The transcript files are in plain-text, tab-delimited format (TDF) with UTF-8 encoding, and the transcribed data totals 1,593,049 tokens." -- LDC online catalogue. 
520 |a GALE Phase 2 Chinese Broadcast News Transcripts was developed by the Linguistic Data Consortium (LDC) and contains transcriptions of approximately 110 hours of Chinese broadcast news speech collected in 2006 and 2007 by LDC and Hong University of Science and Technology (HKUST), Hong Kong, during Phase 2 of the DARPA GALE (Global Autonomous Language Exploitation) Program. 
520 |a The source broadcast recordings feature news broadcasts focusing principally on current events from the following sources: Anhui TV, a regional television station in Mainland China, Anhui Province, China Central TV (CCTV), a national and international broadcaster in Mainland China and Phoenix TV, a Hong Kong-based satellite television station. 
530 |a Also available on the Internet. 
650 0 |a Chinese language  |x Spoken Chinese  |v Databases. 
650 0 |a Automatic speech recognition.  |0 http://id.loc.gov/authorities/subjects/sh85010109 
650 0 |a Linguistics  |x Research.  |0 http://id.loc.gov/authorities/subjects/sh2008106989 
650 7 |a Automatic speech recognition.  |2 fast  |0 http://id.worldcat.org/fast/fst00822769 
650 7 |a Chinese language  |x Spoken Chinese.  |2 fast  |0 http://id.worldcat.org/fast/fst00857534 
655 7 |a Databases.  |2 fast  |0 http://id.worldcat.org/fast/fst01411643 
700 1 |a Glenn, Meghan. 
710 2 |a Linguistic Data Consortium.  |0 http://id.loc.gov/authorities/names/no2003104537  |1 http://viaf.org/viaf/130534201 
856 4 1 |z For information on downloading data files, see the LDC website:  |u http://catalog.ldc.upenn.edu/LDC2013T20 
903 |a HeVa 
929 |a cat 
999 f f |i f64a1409-e113-5700-b4a8-c7df233df1c9  |s 3f7ab2eb-a870-51ba-9ee4-193e88e97376 
928 |t Library of Congress classification  |a PL1074.5.G3842 2013  |p CDRom  |l ASR  |c ASR-JRLASR  |i 1154086 
928 |t Library of Congress classification  |a PL1074.5.G3842 2013  |p CDRom  |l ASR  |c ASR-JRLASR  |i 1154087 
928 |t Library of Congress classification  |a P121.G3522 2013  |l Online  |c UC-FullText  |n For information on downloading data files, see the LDC website:  |u http://catalog.ldc.upenn.edu/LDC2013T20  |g ebooks  |i 7645874 
927 |t Library of Congress classification  |a PL1074.5.G3842 2013  |p CDRom  |l ASR  |c ASR-JRLASR  |b 69204726  |i 9219854 
927 |t Library of Congress classification  |a PL1074.5.G3842 2013  |p CDRom  |l ASR  |c ASR-JRLASR  |b 69204668  |i 9219855