Summary: | The Unified linguistic annotation text collection, Linguistic Data Consortium (LDC) catalog number LDC2009T07 and isbn 1-58563-511-1, consists of two separate corpora: The Language Understanding Annotation Corpus (LDC2009T10) and REFLEX EntityTranslation Training/DevTest (LDC2009T11). Most recent annotation efforts for language have focused on small pieces of the larger problem of semantic annotation rather than producing a single unified representation. The Unified Linguistic Annotation (ULA) project, sponsored by the National Science Foundation, seeks to integrate into one framework different layers of annotation (e.g., semantics, discourse, temporal, opinions) using various existing resources, including PropBank, NomBank, TimeBank, Penn Discourse Treebank and coreference and opinion annotations. The project represents a concerted effort of researchers from several institutions to develop a large word corpus with balanced and annotated data. The ULA Text Collection is provided as a resource for the ULA effort. It consists of two datasets, the Language Understanding Annotation Corpus from the Johns Hopkins Center of Excellence in Human Language Technology and ACE Reflex Entity Translation Training Dev/Test developed by LDC.
|