`nessie.detectors.variational_principle`

Module Contents

Classes

`VariationNGrams`	Detecting Inconsistencies in Treebanks
`VariationNGramsSpan`	We use the implementation described in

class nessie.detectors.variational_principle.VariationNGrams

Bases: nessie.detectors.error_detector.Detector

Detecting Inconsistencies in Treebanks Markus Dickinson and Walt Detmar Meurers Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003). Växjö, Sweden.

correct(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray) → awkward.Array

error_detector_kind(self) → nessie.detectors.error_detector.DetectorKind

score(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray, **kwargs) → awkward.Array

Collects n-grams and their respective label sequences, if there are disagreements, then flag them if they are in the minority.

We use the implementation described in

Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project by Guillaume Wisniewski (LREC 2018) and “How Bad are PoS Taggers in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project” by Guillaume Wisniewski, François Yvon (NAACL 2018)

It uses generalized suffix trees to find repetitions across sentences which are flagged if the repetitions are labeled differently.

Parameters

sentences – a (num_instances, num_tokens) ragged string sequence containing the text/surface form of each instance
tags – a (num_instances, num_tokens) ragged string sequence containing the noisy label for each instance

Returns

a (num_samples, num_tokens) ragged boolean array containing the flag for each instance

supports_correction(self) → bool

class nessie.detectors.variational_principle.VariationNGramsSpan(k: int = 1)

Bases: nessie.detectors.error_detector.Detector

We use the implementation described in

Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods by Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld Proceedings of the 28th International Conference on Computational Linguistics COLING 2020

correct(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray) → awkward.Array

error_detector_kind(self) → nessie.detectors.error_detector.DetectorKind

score(self, sentences: nessie.types.RaggedStringArray, tags: nessie.types.RaggedStringArray, **kwargs) → awkward.Array

Collects n-grams and their respective label sequences, if there are disagreements, then flag them if they are in the minority.

We use the implementation described in Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods by Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach, Jonathan K. Kummerfeld Proceedings of the 28th International Conference on Computational Linguistics COLING 2020

It uses a window of k to the left and right of a span, if the window has the same surface form but a different label for the span, then we flag it (unless it is the majority label).

Parameters

sentences – a (num_instances, num_tokens) ragged string sequence containing the text/surface form of each instance
tags – a (num_instances, num_tokens) ragged string sequence containing the noisy label for each instance

Returns

a (num_samples, num_tokens) ragged boolean array containing the flag for each instance

supports_correction(self) → bool

nessie.detectors.variational_principle

Module Contents

Classes

`nessie.detectors.variational_principle`