GALE Chinese-English word alignment and tagging training part 1. : newswire and Web.

Saved in:
Bibliographic Details
Author / Creator:Li, Xuansong.
Imprint:[Philadelphia, PA] : Linguistic Data Consortium, c2012.
Description:1 CD-ROM ; 4 3/4 in.
Language:Chinese
English
Subject:
Format: E-Resource
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/8926013
Hidden Bibliographic Details
Other authors / contributors:Grimes, Stephen M.
Strassel, Stephanie M.
Linguistic Data Consortium.
ISBN:158563624X
9781585636242
Notes:Title from disc label.
Data type: Text.
Data sources: Newswire, weblogs.
Application: Automatic content extraction, content-based retrieval, machine translation, tagging.
"LDC2012T16".
Authors: Xuansong Li, Stephen Grimes, Stephanie Strassel.
Also available on the Internet.
Chinese, English.
Summary:"... contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program ...." -- LDC online catalogue.
GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web was developed by the Linguistic Data Consortium (LDC) and contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program.
Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.

MARC

LEADER 00000cmm a2200000 a 4500
001 8926013
003 ICU
005 20131127105500.0
007 co ng|---uuuuu
008 121029s2012 pau d chi d
020 |a 158563624X 
020 |a 9781585636242 
035 |a (OCoLC)815276805 
035 |a 8926013 
040 |a UAB  |c UAB  |d UAB  |d UtOrBLW 
041 0 |a chi  |a eng 
049 |a CGUA 
090 |a PL1074.5  |b .L551 2012 
100 1 |a Li, Xuansong. 
245 1 0 |a GALE Chinese-English word alignment and tagging training part 1. :  |b newswire and Web. 
260 |a [Philadelphia, PA] :  |b Linguistic Data Consortium,  |c c2012. 
300 |a 1 CD-ROM ;  |c 4 3/4 in. 
336 |a computer dataset  |b cod  |2 rdacontent  |0 http://id.loc.gov/vocabulary/contentTypes/cod 
337 |a computer  |b c  |2 rdamedia  |0 http://id.loc.gov/vocabulary/mediaTypes/c 
338 |a other  |b cz  |2 rdacarrier 
500 |a Title from disc label. 
546 |a Chinese, English. 
500 |a Data type: Text. 
500 |a Data sources: Newswire, weblogs. 
500 |a Application: Automatic content extraction, content-based retrieval, machine translation, tagging. 
500 |a "LDC2012T16". 
500 |a Authors: Xuansong Li, Stephen Grimes, Stephanie Strassel. 
520 |a "... contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program ...." -- LDC online catalogue. 
520 |a GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web was developed by the Linguistic Data Consortium (LDC) and contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program. 
520 |a Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation. 
530 |a Also available on the Internet. 
650 0 |a Chinese language  |x Data processing.  |0 http://id.loc.gov/authorities/subjects/sh86008027 
650 0 |a Chinese language  |x Machine translating  |x Data processing. 
650 0 |a Linguistics  |x Research  |0 http://id.loc.gov/authorities/subjects/sh2008106989 
650 7 |a Chinese language  |x Data processing.  |2 fast  |0 http://id.worldcat.org/fast/fst00857415 
650 7 |a Computational linguistics.  |2 fast  |0 http://id.worldcat.org/fast/fst00871998 
700 1 |a Grimes, Stephen M.  |0 http://id.loc.gov/authorities/names/no2011160548  |1 http://viaf.org/viaf/187281423 
700 1 |a Strassel, Stephanie M.  |0 http://id.loc.gov/authorities/names/no2008036255  |1 http://viaf.org/viaf/21945618 
710 2 |a Linguistic Data Consortium.  |0 http://id.loc.gov/authorities/names/no2003104537  |1 http://viaf.org/viaf/130534201 
856 4 1 |z For additional information on data files, see the LDC website:  |u http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2012T16 
903 |a HeVa 
929 |a cat 
999 f f |i 78ad3c66-e77a-56de-8ae9-8968b9de339f  |s b70282bc-6de2-5bcb-a643-409fb1c90ecc 
928 |t Library of Congress classification  |a PL1074.5.L551 2012  |p CDRom  |l ASR  |c ASR-JRLASR  |i 1153310 
928 |t Library of Congress classification  |a PL1074.5.L551 2012  |p CDRom  |l ASR  |c ASR-JRLASR  |i 1153311 
928 |t Library of Congress classification  |a PL1074.5.L551 2012  |l Online  |c UC-FullText  |n For additional information on data files, see the LDC website:  |u http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2012T16  |g ebooks  |i 7663293 
927 |t Library of Congress classification  |a PL1074.5.L551 2012  |p CDRom  |l ASR  |c ASR-JRLASR  |b 103951659  |i 9098113 
927 |t Library of Congress classification  |a PL1074.5.L551 2012  |p CDRom  |l ASR  |c ASR-JRLASR  |b 103951641  |i 9098114