nessie.dataloader

Module Contents

Classes

SequenceLabelingDataset

Dataset containing tokens, gold and noisy labels for tasks like POS tagging or NER.

TextClassificationDataset

Dataset containing texts, gold and noisy labels for tasks like sentiment analysis or topic classification.

Functions

load_example_span_classification_data() → SequenceLabelingDataset

load_example_text_classification_data() → TextClassificationDataset

load_example_token_labeling_data() → SequenceLabelingDataset

load_sequence_labeling_dataset(path: Union[str, pathlib.Path]) → SequenceLabelingDataset

load_text_classification_tsv(path: Union[str, pathlib.Path]) → TextClassificationDataset

class nessie.dataloader.SequenceLabelingDataset

Dataset containing tokens, gold and noisy labels for tasks like POS tagging or NER.

Parameters
  • sentences – List of list of strings

  • gold_labels – List of list of strings

  • noisy_labels – List of list of strings

gold_labels :awkward.Array
noisy_labels :awkward.Array
sentences :awkward.Array
__post_init__(self)
flatten(self) TextClassificationDataset
property num_instances(self) int
property num_labels(self) int
property num_sentences(self) int
property sizes(self) numpy.typing.NDArray[int]
subset(self, n: int) SequenceLabelingDataset
property tagset_noisy(self) Set[str]
class nessie.dataloader.TextClassificationDataset

Dataset containing texts, gold and noisy labels for tasks like sentiment analysis or topic classification.

Parameters
  • texts – String sequence like List, numpy array, …

  • gold_labels – String sequence like List, numpy array, …

  • gold_labels – String sequence like List, numpy array, …

gold_labels :numpy.typing.NDArray[str]
noisy_labels :numpy.typing.NDArray[str]
texts :numpy.typing.NDArray[str]
__post_init__(self)
property flags(self) numpy.typing.NDArray[bool]

Returns an array that indicates differences between gold and noisy labels.

Returns

a (num_instances,) numpy array containing True if gold labels disagree with noisy labels, else False

property num_instances(self) int
property num_labels(self) int
subset(self, n: int) TextClassificationDataset
property tagset_noisy(self) Set[str]
nessie.dataloader.load_example_span_classification_data() SequenceLabelingDataset
nessie.dataloader.load_example_text_classification_data() TextClassificationDataset
nessie.dataloader.load_example_token_labeling_data() SequenceLabelingDataset
nessie.dataloader.load_sequence_labeling_dataset(path: Union[str, pathlib.Path]) SequenceLabelingDataset
nessie.dataloader.load_text_classification_tsv(path: Union[str, pathlib.Path]) TextClassificationDataset