nessie.detectors.variational_principle
Module Contents
Classes
Detecting Inconsistencies in Treebanks |
|
We use the implementation described in |
- class nessie.detectors.variational_principle.VariationNGrams
Bases:
nessie.detectors.error_detector.Detector
Detecting Inconsistencies in Treebanks Markus Dickinson and Walt Detmar Meurers Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003). Växjö, Sweden.
- correct(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray) awkward.Array
- error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
- score(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray, **kwargs) awkward.Array
Collects n-grams and their respective label sequences, if there are disagreements, then flag them if they are in the minority.
We use the implementation described in
Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project by Guillaume Wisniewski (LREC 2018) and “How Bad are PoS Taggers in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project” by Guillaume Wisniewski, François Yvon (NAACL 2018)
It uses generalized suffix trees to find repetitions across sentences which are flagged if the repetitions are labeled differently.
- Parameters
sentences – a (num_instances, num_tokens) ragged string sequence containing the text/surface form of each instance
tags – a (num_instances, num_tokens) ragged string sequence containing the noisy label for each instance
- Returns
a (num_samples, num_tokens) ragged boolean array containing the flag for each instance
- supports_correction(self) bool
- class nessie.detectors.variational_principle.VariationNGramsSpan(k: int = 1)
Bases:
nessie.detectors.error_detector.Detector
We use the implementation described in
Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods by Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld Proceedings of the 28th International Conference on Computational Linguistics COLING 2020
- correct(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray) awkward.Array
- error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
- score(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray, **kwargs) awkward.Array
Collects n-grams and their respective label sequences, if there are disagreements, then flag them if they are in the minority.
We use the implementation described in Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods by Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld Proceedings of the 28th International Conference on Computational Linguistics COLING 2020
It uses a window of k to the left and right of a span, if the window has the same surface form but a different label for the span, then we flag it (unless it is the majority label).
- Parameters
sentences – a (num_instances, num_tokens) ragged string sequence containing the text/surface form of each instance
tags – a (num_instances, num_tokens) ragged string sequence containing the noisy label for each instance
- Returns
a (num_samples, num_tokens) ragged boolean array containing the flag for each instance
- supports_correction(self) bool