nessie.detectors.irt
Module Contents
Classes
Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards? |
|
2PL IRT model taken from https://github.com/nd-ball/py-irt |
- class nessie.detectors.irt.ItemResponseTheoryFlagger(device: str = 'cpu', num_iters: int = 10000)
Bases:
nessie.detectors.error_detector.Detector
Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards? Pedro Rodriguez and Joe Barrow and Alexander Hoyle and John P. Lalor and Robin Jia and Jordan Boyd-Graber ACL 2021
- convert_data(self, data: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
- error_detector_kind(self) nessie.detectors.error_detector.DetectorKind
- optimize(self, model, guide, items: numpy.ndarray, subjects: numpy.ndarray, correctnesses: numpy.ndarray)
- score(self, labels: nessie.types.StringArray, ensemble_predictions: nessie.types.StringArray2D, **kwargs) numpy.typing.NDArray[bool]
Flags instances with negative discrimination as computed by an IRT model. This is typically applied to the predictions of several different models, similarly to ensembling.
- Parameters
labels – a (num_samples, ) numpy array containing the noisy labels to be corrected
ensemble_predictions – a (num_models, num_samples) numpy array containing predictions for each model
- Returns
a (num_samples, ) numpy array containing flagging instances having negative discrimination
- uses_probabilities(self) bool
- class nessie.detectors.irt.TwoParamLog(*, num_items: int, num_subjects: int, device: str = 'cpu')
2PL IRT model taken from https://github.com/nd-ball/py-irt
- export(self)
- get_guide(self)
- get_model(self)
- guide_hierarchical(self, subjects, items, obs)
Initialize a 2PL guide with hierarchical priors
- model_hierarchical(self, subjects, items, obs)
Initialize a 2PL model with hierarchical priors