compare_backends

Compare CLTK NLP backends on the same text and report differences.

Example

from cltk.evaluation.compare_backends import compare_backends report = compare_backends( ... "lati1261", ... "Amor vincit omnia.", ... ["stanza", "openai"], ... ) print(report["summary"]["agreement_rates"]["upos"])

Report schema (high-level): report = { "meta": { "language": str, "backends": list[str], "base_backend": str, "timestamp": str, "text_hash": str, "cltk_version": str | None, }, "backends": { backend: { "model": str | None, "backend_config": dict | None, "metadata": dict, }, }, "sentences": [ { "index": int, "text": str | None, "alignment": { "base_backend": str, "ops": {backend: list[dict]}, "strategy": {backend: str}, "edit_distance": {backend: int}, }, "tokens": [ { "row": int, "base_index": int | None, "by_backend": { backend: { "index": int | None, "string": str | None, "lemma": str | None, "upos": str | None, "feats": str | None, "head": int | None, "deprel": str | None, } | None, }, "diff": { field: { "agree": bool, "values": {backend: str | int | None}, }, }, }, ], "metrics": { "agreement_rates": {field: {pair: dict}}, }, }, ], "summary": { "agreement_rates": {field: {pair: dict}}, "most_disagreed_tokens": list[dict], "confusion": { "upos": {pair: {tag_a: {tag_b: int}}}, "deprel": {pair: {tag_a: {tag_b: int}}}, }, }, }

FieldName `module-attribute`

FieldName = str

COMPARE_FIELDS `module-attribute`

COMPARE_FIELDS: tuple[FieldName, ...] = (
    "tokenization",
    "lemma",
    "upos",
    "feats",
    "head",
    "deprel",
)

NormalizedToken `dataclass`

NormalizedToken(
    index: Optional[int],
    string: Optional[str],
    lemma: Optional[str],
    upos: Optional[str],
    feats: Optional[str],
    head: Optional[int],
    deprel: Optional[str],
)

Comparable token representation extracted from a CLTK word.

index `instance-attribute`

index: Optional[int]

string `instance-attribute`

string: Optional[str]

lemma `instance-attribute`

lemma: Optional[str]

upos `instance-attribute`

upos: Optional[str]

feats `instance-attribute`

feats: Optional[str]

head `instance-attribute`

head: Optional[int]

deprel `instance-attribute`

deprel: Optional[str]

NormalizedSentence `dataclass`

NormalizedSentence(
    index: int,
    text: Optional[str],
    tokens: list[NormalizedToken],
)

Comparable sentence representation with normalized tokens.

index `instance-attribute`

index: int

text `instance-attribute`

text: Optional[str]

tokens `instance-attribute`

tokens: list[NormalizedToken]

AlignmentOp `dataclass`

AlignmentOp(
    op: str,
    base_index: Optional[int],
    other_index: Optional[int],
    base_token: Optional[str],
    other_token: Optional[str],
)

Single alignment operation between base and other token lists.

op `instance-attribute`

op: str

base_index `instance-attribute`

base_index: Optional[int]

other_index `instance-attribute`

other_index: Optional[int]

base_token `instance-attribute`

base_token: Optional[str]

other_token `instance-attribute`

other_token: Optional[str]

AlignmentResult `dataclass`

AlignmentResult(
    strategy: str, cost: int, ops: list[AlignmentOp]
)

Alignment output with ops and edit cost metadata.

strategy `instance-attribute`

strategy: str

cost `instance-attribute`

cost: int

ops `instance-attribute`

ops: list[AlignmentOp]

AlignmentRow

Bases: TypedDict

Row structure for aligned token comparisons.

base_index `instance-attribute`

base_index: Optional[int]

by_backend `instance-attribute`

by_backend: dict[str, dict[str, Any] | None]