nessie.detectors.mean_distance
Module Contents
Classes
Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. |
- class nessie.detectors.mean_distance.MeanDistance(metric: str = 'euclidean')
Bases:
nessie.detectors.error_detector.Detector
Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars Outlier Detection for Improved Data Quality and Diversity in Dialog Systems In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 517–527).
https://www.aclweb.org/anthology/N19-1051.pdf
- error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
- score(self, labels: nessie.types.StringArray, embedded_instances: nessie.types.FloatArray2D, **kwargs) numpy.typing.NDArray[float]
Compute the mean vector for each instance grouped by label, the outlier score for each instance is the distance to the mean vector for its label.
For each class, do
Generate a vector representation of each instance.
Average vectors to get a mean representation.
Calculate the distance of each instance from the mean.
Rank by distance in ascending order.
(Cut off the list, keeping only the top k% as outliers.
- Parameters
labels – a (num_instances, ) string sequence containing the noisy label for each instance
embedded_instances – 2d numpy array of shape (num_items, encoding_dim)
- Returns
a (num_samples, ) numpy array containing the scores for each instance