nessie.detectors.projection_ensemble

Module Contents

Classes

MaxEntProjectionEnsemble

Identifying Incorrect Labels in the CoNLL-2003 Corpus

Functions

tqdm_joblib(tqdm_object)

Context manager to patch joblib to report into tqdm progress bar given as argument

class nessie.detectors.projection_ensemble.MaxEntProjectionEnsemble(n_components: List[int] = None, seeds: List[int] = None, num_jobs: int = 4, max_iter: int = 10000)

Bases: nessie.detectors.error_detector.Detector

Identifying Incorrect Labels in the CoNLL-2003 Corpus Frederick Reiss, Hong Xu, Bryan Cutler, Karthik Muthuraman, Zachary Eichenberger Proceedings of the 24th Conference on Computational Natural Language Learning - 2020 https://aclanthology.org/2020.conll-1.16/

property ensemble_size(self) int
error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
score(self, X_train_embedded: nessie.types.FloatArray2D, y_train_encoded: numpy.typing.NDArray[int], X_eval_embedded: nessie.types.FloatArray2D, y_eval_encoded: numpy.typing.NDArray[int], **kwargs) Tuple[List[str], List[List[str]], numpy.typing.NDArray[bool]]

Uses an ensemble of logistic regression models that use different Gaussian projections of dense embeddings as input. These are aggregated via majority vote and instances are flagged whose label disagree.

Parameters
  • X_train_embedded – shape (n_instances, encoding_dim)

  • y_train_encoded – shape (n_instances)

  • X_eval_embedded – shape (n_instances, encoding_dim)

  • y_eval_encoded – shape (n_instances)

Returns

A string list of the predictions for every instance after majority vote ensemble:predictions: A list of string lists containing the predictions for every instance before majority vote flags: A boolean sequence containing the flags

Return type

predictions

nessie.detectors.projection_ensemble.tqdm_joblib(tqdm_object)

Context manager to patch joblib to report into tqdm progress bar given as argument