nessie.task_support.span_labeling
Module Contents
Classes
Functions
|
|
|
The goal of this function is to align existing and predicted sequence labeling annotations |
|
Aligns spans from gold data with noisy ones. |
|
|
|
|
|
Assume we have a list of tokens which was tagged with spans by two different approaches A and B. |
Attributes
- nessie.task_support.span_labeling.RaggedArray
- nessie.task_support.span_labeling.UNALIGNED_LABEL = ___NESSIE_NO_ALIGNMENT___
- class nessie.task_support.span_labeling.AlignedData
- gold_labels :List[str]
- noisy_labels :List[str]
- span_ids :List[SpanId]
- surface_forms :List[str]
- __len__(self) int
- property flags(self) List[bool]
- class nessie.task_support.span_labeling.AlignmentResult
- labels :numpy.typing.NDArray[str]
- le :sklearn.preprocessing.LabelEncoder
- predictions :numpy.typing.NDArray[str]
- probabilities :numpy.typing.NDArray[float]
- repeated_probabilities :Optional[numpy.typing.NDArray[float]]
- span_ids :List[SpanId]
- nessie.task_support.span_labeling.aggregate_scores_to_spans(labels: nessie.types.RaggedStringArray, scores: nessie.types.RaggedFloatArray, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None) awkward.Array
- nessie.task_support.span_labeling.align_for_span_labeling(noisy_labels: nessie.types.RaggedStringArray, predictions: nessie.types.RaggedStringArray, probabilities: nessie.types.RaggedFloatArray2D, repeated_probabilities: nessie.types.RaggedFloatArray3D, le: sklearn.preprocessing.LabelEncoder, span_aggregator: Optional[Callable[[List[numpy.ndarray]], numpy.ndarray]] = None, function_aggregator: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) AlignmentResult
The goal of this function is to align existing and predicted sequence labeling annotations and their respective probabilities.
Original and predicted sequence labeling are aligned in a way that maximizes the overlap between them (see span_matching)
BIO tagged sequences are reduced to a list of spans with their type, e.g. [O, B-PER, I-PER, O, B-LOC] becomes [PER, LOC]
Probabilities for spans are aggregated
Spans that exist in the original sequence and have no counterpart in the predicted one get a special label and are assigned the probability of the O label
- Parameters
noisy_labels –
predictions –
probabilities –
repeated_probabilities –
le –
span_aggregator –
function_aggregator –
Returns:
- nessie.task_support.span_labeling.align_span_labeling_data(tokens: nessie.types.RaggedStringArray, gold_labels: nessie.types.RaggedStringArray, noisy_labels: nessie.types.RaggedStringArray) AlignedData
Aligns spans from gold data with noisy ones.
If a span in the noisy labels has no match in gold, then a special label is assigned that is not in the original data. Surface forms returned use gold boundaries if a match exists, else from the noisy data.
- Parameters
tokens – The tokens that contain the text
gold_labels – Gold labels in BIO format
noisy_labels – Noisy labels in BIO format
Returns: The alignment between gold and noisy data
- nessie.task_support.span_labeling.align_span_labeling_result(noisy_labels: nessie.types.RaggedStringArray, result: nessie.helper.RaggedResult) AlignmentResult
- nessie.task_support.span_labeling.embed_spans(X: nessie.types.RaggedStringArray, y: nessie.types.RaggedStringArray, embedder: nessie.models.featurizer.FlairTokenEmbeddingsWrapper, aggregate: Callable[[List[numpy.ndarray]], numpy.ndarray] = None) awkward.Array
- nessie.task_support.span_labeling.span_matching(tagging_A: List[Tuple[int, int]], tagging_B: List[Tuple[int, int]], keep_A: bool = False) Dict[int, int]
Assume we have a list of tokens which was tagged with spans by two different approaches A and B. This method tries to find the best 1:1 assignment of spans from B to spans from A. If there are more spans in A than in B, then spans from B will go unused and vice versa. The quality of an assignment between two spans depends on their overlap in tokens. This method removes entirely disjunct pairs of spans. Note: In case A contains two (or more) spans of the same length which are a single span in B (or vice versa), either of the spans from A may be mapped to the span in B. Which exact span from A is mapped is undefined. :param tagging_A: list of spans, defined by (start, end) token offsets (exclusive!), must be non-overlapping! :param tagging_B: a second list of spans over the same sequence in the same format as tagging_A :param keep_A: include unmatched spans from A as [idx_A, None] in the returned value :return: Dict[int,int] where keys are indices from A and values are indices from B