Compiling and Annotating a Learner Corpus for a Morphologically Rich Language

Saved in:
Bibliographic Details
Author / Creator:Rosen, Alexandr.
Imprint:Prague : Karolinum Press, 2020.
Description:1 online resource (281 pages)
Language:English
Subject:
Format: E-Resource Book
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/12873857
Hidden Bibliographic Details
Other authors / contributors:Hana, Jiří.
Vidová Hladká, Barbora.
ISBN:9788024647654
8024647656
9788024647593
Notes:Automatic annotation checking.
Print version record.
Other form:Print version: Rosen, Alexandr. Compiling and Annotating a Learner Corpus for a Morphologically Rich Language. Prague : Karolinum Press, ©2020 9788024647593
Table of Contents:
  • Cover
  • Contents
  • List of abbreviations
  • Introduction
  • About this book
  • Reasons to study non-native Czech
  • Some properties of non-native Czech
  • Morphology
  • Syntax
  • Word segmentation
  • Learner corpus
  • Roadmap
  • Learner corpora
  • Terminology
  • Various types of learner corpora
  • The choice of texts
  • Annotation
  • Textual annotation
  • Linguistic annotation
  • Error annotation
  • correction
  • Error annotation
  • categorization
  • Annotation scheme
  • Data access
  • Some learner corpora
  • ASK
  • CLC
  • COPLE2
  • CroLTeC
  • Falko
  • ICLE
  • MERLIN
  • RLC
  • SweLL
  • Relationships of CzeSL with other learner corpora
  • Introducing the CzeSL project
  • Specifications of CzeSL
  • Intended usage
  • AKCES
  • the umbrella project
  • Procurement of texts
  • Text collection
  • Transcription
  • Anonymization
  • Metadata
  • Error annotation
  • Errors and learner language
  • More than one way to annotate errors in CzeSL
  • A wishlist for error annotation
  • Interference and other types of explanation
  • Interpretation in terms of TH
  • Word order
  • Style
  • Communication goal
  • The two-tier annotation scheme
  • Annotation scheme as a compromise
  • Why multiple tiers
  • How many tiers
  • Multiple tiers in a tabular format
  • Content of the tiers
  • A sample text with T1 vs. T2 corrections
  • Links between tiers
  • Error tags
  • Morphosyntactic references
  • Follow-up corrections
  • Alternative target hypotheses
  • Error tagset
  • Based on linguistic categories
  • Grammar-based vs. formal errors
  • Extent of the annotated unit
  • Grammar-based tags
  • Errors at T1
  • Errors at T2
  • Coarse-grained
  • An example of complex annotation
  • Evaluation of the manual tiered error annotation
  • Inter-annotator agreement (IAA)
  • A pilot annotation
  • IAA on all doubly-annotated texts
  • Error tags depend on target hypothesis
  • Possible causes of the annotators' disagreements
  • Formal tags
  • Automatic extension and modification of error annotation
  • Automatic detection of formal errors on T1
  • Formal orthographic errors
  • Formal errors sometimes influencing pronunciation
  • Formal errors influencing pronunciation
  • Other types of errors
  • Automatic classification of word-boundary errors
  • Implicit error annotation
  • Multi-dimensional error annotation (MD)
  • Focus on morphology
  • All annotation applied to the source text
  • Extent of the annotated unit
  • Alternative error domains
  • Source text, target hypothesis, annotated strings
  • Domains and features
  • Linguistic annotation
  • Annotation with tools for Standard Czech
  • Annotation of target hypothesis
  • Annotation of T1
  • Annotation of source texts
  • Annotation of interlanguage in UD
  • Tokenization
  • Part-of-speech and morphology
  • Lemmata
  • Syntactic Structure
  • Evaluation
  • Annotation process
  • Overview of the annotation process
  • Transcription and anonymization of manuscripts
  • Tiered error annotation
  • Manual error annotation