Académique Documents
Professionnel Documents
Culture Documents
http://tinyurl.com/669o4zt
http://tinyurl.com/669o4zt
You can only find what is in the corpus...
It can:
• be incorrect
• be inconsistent
• follow the ‘wrong’ theory
• have the 'wrong' level of granularity
• use the 'wrong' tag-set
• introduce subjective interpretations
How do you annotate?
• What are you marking up? (POS, lemma, clause?)
• How are you annotating? (manually/automatically?)
• With which tag-set? (CLAWS, Penn Treebank?)
• Format of annotation? (HTML, XML, Chat?)
• Whose linguistic analysis? (mine, or a more established,
standard and 'consensus-backed' way of doing it?)
• How are you going to use the annotations for your
analysis?
• How are your annotations going to be shared with other
researchers?
Good practice in annotation
(NNP John) (VBD was) (RB very) (VBN offended) (IN by) (PP$ her) (NNS remarks) (. .)
([ John_NNP ])
<: was_VBD :>
very_RB offended_VBN by_IN ([ her_PRP$ remarks_NNS ])._.
Next week: Creating a corpus
Reading tip: