`nessie.task_support.span_labeling`

Module Contents

Classes

`AlignedData`
`AlignmentResult`
`SpanId`

Functions

`aggregate_scores_to_spans`(labels: nessie.types.RaggedStringArray, scores: nessie.types.RaggedFloatArray, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None) → awkward.Array
`align_for_span_labeling`(noisy_labels: nessie.types.RaggedStringArray, predictions: nessie.types.RaggedStringArray, probabilities: nessie.types.RaggedFloatArray2D, repeated_probabilities: nessie.types.RaggedFloatArray3D, le: sklearn.preprocessing.LabelEncoder, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None, function_aggregator: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → AlignmentResult	The goal of this function is to align existing and predicted sequence labeling annotations
`align_span_labeling_data`(tokens: nessie.types.RaggedStringArray, gold_labels: nessie.types.RaggedStringArray, noisy_labels: nessie.types.RaggedStringArray) → AlignedData	Aligns spans from gold data with noisy ones.
`align_span_labeling_result`(noisy_labels: nessie.types.RaggedStringArray, result: nessie.helper.RaggedResult) → AlignmentResult
`embed_spans`(X: nessie.types.RaggedStringArray, y: nessie.types.RaggedStringArray, embedder: nessie.models.featurizer.FlairTokenEmbeddingsWrapper, aggregate: Callable[[List[numpy.ndarray]], numpy.ndarray] = None) → awkward.Array
`span_matching`(tagging_A: List[Tuple[int, int]], tagging_B: List[Tuple[int, int]], keep_A: bool = False) → Dict[int, int]	Assume we have a list of tokens which was tagged with spans by two different approaches A and B.

Attributes

`RaggedArray`
`UNALIGNED_LABEL`

nessie.task_support.span_labeling.RaggedArray

nessie.task_support.span_labeling.UNALIGNED_LABEL = ___NESSIE_NO_ALIGNMENT___

class nessie.task_support.span_labeling.AlignedData

gold_labels :List[str]

noisy_labels :List[str]

span_ids :List[SpanId]

surface_forms :List[str]

__len__(self) → int

property flags(self) → List[bool]

class nessie.task_support.span_labeling.AlignmentResult

labels :numpy.typing.NDArray[str]

le :sklearn.preprocessing.LabelEncoder

predictions :numpy.typing.NDArray[str]

probabilities :numpy.typing.NDArray[float]

repeated_probabilities :Optional[numpy.typing.NDArray[float]]

span_ids :List[SpanId]

class nessie.task_support.span_labeling.SpanId

end :int

sentence :int

start :int

nessie.task_support.span_labeling.aggregate_scores_to_spans(labels: nessie.types.RaggedStringArray, scores: nessie.types.RaggedFloatArray, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None) → awkward.Array

nessie.task_support.span_labeling.align_for_span_labeling(noisy_labels: nessie.types.RaggedStringArray, predictions: nessie.types.RaggedStringArray, probabilities: nessie.types.RaggedFloatArray2D, repeated_probabilities: nessie.types.RaggedFloatArray3D, le: sklearn.preprocessing.LabelEncoder, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None, function_aggregator: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → AlignmentResult

The goal of this function is to align existing and predicted sequence labeling annotations and their respective probabilities.

Original and predicted sequence labeling are aligned in a way that maximizes the overlap between them (see span_matching)
BIO tagged sequences are reduced to a list of spans with their type, e.g. [O, B-PER, I-PER, O, B-LOC] becomes [PER, LOC]
Probabilities for spans are aggregated
Spans that exist in the original sequence and have no counterpart in the predicted one get a special label and are assigned the probability of the O label

Parameters

noisy_labels –
predictions –
probabilities –
repeated_probabilities –
le –
span_aggregator –
function_aggregator –

Returns:

nessie.task_support.span_labeling.align_span_labeling_data(tokens: nessie.types.RaggedStringArray, gold_labels: nessie.types.RaggedStringArray, noisy_labels: nessie.types.RaggedStringArray) → AlignedData

Aligns spans from gold data with noisy ones.

If a span in the noisy labels has no match in gold, then a special label is assigned that is not in the original data. Surface forms returned use gold boundaries if a match exists, else from the noisy data.