GALE Chinese-English word alignment and tagging training part 1. : newswire and Web.

Saved in:
Bibliographic Details
Author / Creator:Li, Xuansong.
Imprint:[Philadelphia, PA] : Linguistic Data Consortium, c2012.
Description:1 CD-ROM ; 4 3/4 in.
Language:Chinese
English
Subject:
Format: E-Resource
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/8926013
Hidden Bibliographic Details
Other authors / contributors:Grimes, Stephen M.
Strassel, Stephanie M.
Linguistic Data Consortium.
ISBN:158563624X
9781585636242
Notes:Title from disc label.
Data type: Text.
Data sources: Newswire, weblogs.
Application: Automatic content extraction, content-based retrieval, machine translation, tagging.
"LDC2012T16".
Authors: Xuansong Li, Stephen Grimes, Stephanie Strassel.
Also available on the Internet.
Chinese, English.
Summary:"... contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program ...." -- LDC online catalogue.
GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web was developed by the Linguistic Data Consortium (LDC) and contains 150,068 tokens of word aligned Chinese and English parallel text enriched with linguistic tags. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation) program.
Some approaches to statistical machine translation include the incorporation of linguistic knowledge in word aligned text as a means to improve automatic word alignment and machine translation quality. This is accomplished with two annotation schemes: alignment and tagging. Alignment identifies minimum translation units and translation relations by using minimum-match and attachment annotation approaches. A set of word tags and alignment link tags are designed in the tagging scheme to describe these translation units and relations. Tagging adds contextual, syntactic and language-specific features to the alignment annotation.
Description
Item Description:Title from disc label.
Data type: Text.
Data sources: Newswire, weblogs.
Application: Automatic content extraction, content-based retrieval, machine translation, tagging.
"LDC2012T16".
Authors: Xuansong Li, Stephen Grimes, Stephanie Strassel.
Physical Description:1 CD-ROM ; 4 3/4 in.
ISBN:158563624X
9781585636242