Notes: | "LDC2012T07." Title from disc label. "Authors: Mohamed Maamouri ... [et al.]" -- LDC catalogue. Data type: Text. Data source: Newswire. Applications: Automatic content extraction, cross-lingual information retrieval, information detection, natural language processing. "This release contains 432,976 source tokens before clitics were split, and 517,080 tree tokens after clitics were separated for treebank annotation. The source materials are Arabic broadcast news stories collected by LDC during the period 2005-2008 from the following sources: Abu Dhabi TV, Al Alam News Channel, Al Arabiya, Al Baghdadya TV, Al Fayha, Alhurra, Al Iraqiyah, Aljazeera, Al Ordiniyah, Al Sharqiyah, Dubai TV, Kuwait TV, Lebanese Broadcasting Corp., Oman TV, Radio Sawa, Saudi TV and Syria TV. The transcripts were produced by LDC." -- LDC online catalogue. Also available on the Internet.
|