nessie.detectors.irt

Module Contents

Classes

ItemResponseTheoryFlagger

Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?

TwoParamLog

2PL IRT model taken from https://github.com/nd-ball/py-irt

class nessie.detectors.irt.ItemResponseTheoryFlagger(device: str = 'cpu', num_iters: int = 10000)

Bases: nessie.detectors.error_detector.Detector

Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards? Pedro Rodriguez and Joe Barrow and Alexander Hoyle and John P. Lalor and Robin Jia and Jordan Boyd-Graber ACL 2021

https://research.fb.com/wp-content/uploads/2021/07/Evaluation-Examples-Are-Not-Equally-Informative-How-Should-That-Change-NLP-Leaderboards.pdf

convert_data(self, data: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
optimize(self, model, guide, items: numpy.ndarray, subjects: numpy.ndarray, correctnesses: numpy.ndarray)
score(self, labels: nessie.types.StringArray, ensemble_predictions: nessie.types.StringArray2D, **kwargs) numpy.typing.NDArray[bool]

Flags instances with negative discrimination as computed by an IRT model. This is typically applied to the predictions of several different models, similarly to ensembling.

Parameters
  • labels – a (num_samples, ) numpy array containing the noisy labels to be corrected

  • ensemble_predictions – a (num_models, num_samples) numpy array containing predictions for each model

Returns

a (num_samples, ) numpy array containing flagging instances having negative discrimination

uses_probabilities(self) bool
class nessie.detectors.irt.TwoParamLog(*, num_items: int, num_subjects: int, device: str = 'cpu')

2PL IRT model taken from https://github.com/nd-ball/py-irt

export(self)
get_guide(self)
get_model(self)
guide_hierarchical(self, subjects, items, obs)

Initialize a 2PL guide with hierarchical priors

model_hierarchical(self, subjects, items, obs)

Initialize a 2PL model with hierarchical priors