nessie.helper

Module Contents

Classes

Callback

CallbackList

CrossValidationHelper

RaggedResult

Result

SingeSplitCV

State

Functions

get_cross_validator(n_splits: int, stratified: bool = True) → Union[sklearn.model_selection.BaseCrossValidator, SingeSplitCV]

obtain_repeated_probabilities_flat(model: nessie.models.Model, X: nessie.types.StringArray, num_repetitions: int) → numpy.typing.NDArray[float]

Uses Monte-Carlo dropout to obtain several probability estimates per instance.

obtain_repeated_probabilities_ragged_flattened(model: nessie.models.SequenceTagger, X: nessie.types.StringArray2D, num_repetitions: int) → numpy.typing.NDArray[float]

Uses Monte-Carlo dropout to obtain several probability estimates per instance.

Attributes

logger

nessie.helper.logger
class nessie.helper.Callback
on_after_fitting(self, state: State)
on_after_predicting(self, state: State)
on_before_fitting(self, state: State)
on_before_predicting(self, state: State)
on_begin(self, state: State)
class nessie.helper.CallbackList

Bases: Callback

add_callback(self, cb: Callback)
add_callbacks(self, cb: List[Callback])
on_after_fitting(self, state: State)
on_after_predicting(self, state: State)
on_before_fitting(self, state: State)
on_before_predicting(self, state: State)
on_begin(self, state: State)
class nessie.helper.CrossValidationHelper(n_splits: int = 10, num_repetitions: Optional[int] = 50)
add_callback(self, cb: Callback)
run(self, X: nessie.types.StringArray, y_noisy: nessie.types.StringArray, model: nessie.models.Model) Result

Uses cross-validation to obtain predictions and probabilities from the given model on the given data.

Parameters
  • X – The training data for training the model

  • y_noisy – The labels for training the model

  • model – The model that is trained during cross-validation and whose outputs are used for the detectors

Returns

Model results evaluated via cross-validation.

run_for_ragged(self, X: nessie.types.RaggedStringArray, y_noisy: nessie.types.RaggedStringArray, model: nessie.models.Model) RaggedResult

Uses cross-validation to obtain predictions and probabilities from the given model on the given data. This is used for tasks with ragged inputs and outputs like sequence labeling.

Parameters
  • X – The training data for training the model

  • y_noisy – The labels for training the model

  • model – The model that is trained during cross-validation and whose outputs are used for the detectors

Returns

Model results evaluated via cross-validation.

class nessie.helper.RaggedResult
le :sklearn.preprocessing.LabelEncoder
predictions :awkward.Array
probabilities :awkward.Array
repeated_probabilities :awkward.Array
flatten(self) Result
property sizes(self) numpy.typing.NDArray[int]
class nessie.helper.Result
le :sklearn.preprocessing.LabelEncoder
predictions :numpy.typing.NDArray[str]
probabilities :numpy.typing.NDArray[float]
repeated_probabilities :Optional[numpy.typing.NDArray[float]]
unflatten(self, sizes: nessie.types.IntArray) RaggedResult
class nessie.helper.SingeSplitCV
split(self, X, *args, **kwargs)
class nessie.helper.State
eval_indices :numpy.typing.NDArray[int]
labels_eval :numpy.typing.NDArray[int]
num_labels :int
num_repetitions :int
num_samples :int
probas_eval :numpy.typing.NDArray[float]
repeated_probabilities :numpy.typing.NDArray[float]
should_compute_repeated_probabilities :bool
nessie.helper.get_cross_validator(n_splits: int, stratified: bool = True) Union[sklearn.model_selection.BaseCrossValidator, SingeSplitCV]
nessie.helper.obtain_repeated_probabilities_flat(model: nessie.models.Model, X: nessie.types.StringArray, num_repetitions: int) numpy.typing.NDArray[float]

Uses Monte-Carlo dropout to obtain several probability estimates per instance.

Parameters
  • model – The model to use

  • X – The input

  • num_repetitions – number of repetitions

Returns: A ndarray of shape (|X|, num_repetitions, |classes|)

nessie.helper.obtain_repeated_probabilities_ragged_flattened(model: nessie.models.SequenceTagger, X: nessie.types.StringArray2D, num_repetitions: int) numpy.typing.NDArray[float]

Uses Monte-Carlo dropout to obtain several probability estimates per instance.

Parameters
  • model – The model to use

  • X – The inputs (need to be ragged, e.g. for token labeling)

  • num_repetitions – Number of repetitions

Returns: A ndarray of shape (|X|, num_repetitions, |classes|)