nessie.dataloader
Module Contents
Classes
Dataset containing tokens, gold and noisy labels for tasks like POS tagging or NER. |
|
Dataset containing texts, gold and noisy labels for tasks like sentiment analysis or topic classification. |
Functions
|
|
|
|
|
|
|
|
|
- class nessie.dataloader.SequenceLabelingDataset
Dataset containing tokens, gold and noisy labels for tasks like POS tagging or NER.
- Parameters
sentences – List of list of strings
gold_labels – List of list of strings
noisy_labels – List of list of strings
- gold_labels :awkward.Array
- noisy_labels :awkward.Array
- sentences :awkward.Array
- __post_init__(self)
- flatten(self) TextClassificationDataset
- property num_instances(self) int
- property num_labels(self) int
- property num_sentences(self) int
- property sizes(self) numpy.typing.NDArray[int]
- subset(self, n: int) SequenceLabelingDataset
- property tagset_noisy(self) Set[str]
- class nessie.dataloader.TextClassificationDataset
Dataset containing texts, gold and noisy labels for tasks like sentiment analysis or topic classification.
- Parameters
texts – String sequence like List, numpy array, …
gold_labels – String sequence like List, numpy array, …
gold_labels – String sequence like List, numpy array, …
- gold_labels :numpy.typing.NDArray[str]
- noisy_labels :numpy.typing.NDArray[str]
- texts :numpy.typing.NDArray[str]
- __post_init__(self)
- property flags(self) numpy.typing.NDArray[bool]
Returns an array that indicates differences between gold and noisy labels.
- Returns
a (num_instances,) numpy array containing True if gold labels disagree with noisy labels, else False
- property num_instances(self) int
- property num_labels(self) int
- subset(self, n: int) TextClassificationDataset
- property tagset_noisy(self) Set[str]
- nessie.dataloader.load_example_span_classification_data() SequenceLabelingDataset
- nessie.dataloader.load_example_text_classification_data() TextClassificationDataset
- nessie.dataloader.load_example_token_labeling_data() SequenceLabelingDataset
- nessie.dataloader.load_sequence_labeling_dataset(path: Union[str, pathlib.Path]) SequenceLabelingDataset
- nessie.dataloader.load_text_classification_tsv(path: Union[str, pathlib.Path]) TextClassificationDataset