ud_pos
Universal Dependencies (UD) part‑of‑speech (POS) tags.
This module defines the core UD POS tag inventory and provides small, validated data models and helpers for working with POS tags in CLTK.
References
- UD POS: https://universaldependencies.org/u/pos/index.html
UD_POS_TAGS
module-attribute
UD_POS_TAGS: dict[str, UDPartOfSpeech] = {
"ADJ": UDPartOfSpeech(
tag="ADJ",
name="adjective",
description="Adjectives are words that typically modify nouns and specify their properties or attributes.",
open_class=True,
),
"ADP": UDPartOfSpeech(
tag="ADP",
name="adposition",
description="Adpositions are words that introduce prepositional or postpositional phrases. They typically express relationships in space or time.",
open_class=False,
),
"ADV": UDPartOfSpeech(
tag="ADV",
name="adverb",
description="Adverbs modify verbs, adjectives, other adverbs, or whole clauses. They often express time, manner, place, or degree.",
open_class=True,
),
"AUX": UDPartOfSpeech(
tag="AUX",
name="auxiliary",
description="Auxiliary verbs accompany the main verb and express grammatical distinctions such as tense, aspect, mood, or voice.",
open_class=False,
),
"CCONJ": UDPartOfSpeech(
tag="CCONJ",
name="coordinating conjunction",
description="Coordinating conjunctions link words, phrases, or clauses that are syntactically equal.",
open_class=False,
),
"DET": UDPartOfSpeech(
tag="DET",
name="determiner",
description="Determiners modify nouns and express reference, quantity, possession, etc. They include articles, demonstratives, possessives, etc.",
open_class=False,
),
"INTJ": UDPartOfSpeech(
tag="INTJ",
name="interjection",
description="Interjections are words or phrases that express emotion, hesitation, or fillers. They often stand alone outside sentence structure.",
open_class=True,
),
"NOUN": UDPartOfSpeech(
tag="NOUN",
name="noun",
description="Nouns are a part of speech typically denoting a person, place, thing, animal or idea.",
open_class=True,
),
"NUM": UDPartOfSpeech(
tag="NUM",
name="numeral",
description="Numerals are words that express numbers or quantities.",
open_class=False,
),
"PART": UDPartOfSpeech(
tag="PART",
name="particle",
description="Particles are function words that do not fit well into other categories and are often used to express grammatical relationships.",
open_class=False,
),
"PRON": UDPartOfSpeech(
tag="PRON",
name="pronoun",
description="Pronouns substitute for nouns or noun phrases and often encode grammatical features such as person, number, and gender.",
open_class=False,
),
"PROPN": UDPartOfSpeech(
tag="PROPN",
name="proper noun",
description="Proper nouns are names of specific people, places, organizations, etc., and are usually capitalized.",
open_class=True,
),
"PUNCT": UDPartOfSpeech(
tag="PUNCT",
name="punctuation",
description="Punctuation marks are non-alphabetic symbols that structure and organize written language.",
open_class=False,
),
"SCONJ": UDPartOfSpeech(
tag="SCONJ",
name="subordinating conjunction",
description="Subordinating conjunctions introduce dependent (subordinate) clauses and indicate relationships such as cause, time, or condition.",
open_class=False,
),
"SYM": UDPartOfSpeech(
tag="SYM",
name="symbol",
description="Symbols are non-verbal characters used to represent concepts or quantities (e.g., currency, math, music).",
open_class=False,
),
"VERB": UDPartOfSpeech(
tag="VERB",
name="verb",
description="Verbs are words that typically denote actions, processes, or states and agree with the subject in person and number.",
open_class=True,
),
"X": UDPartOfSpeech(
tag="X",
name="other",
description="This category is used for words that do not fit into any other category, such as foreign words or unclassified items.",
open_class=False,
),
}
UDPartOfSpeech
Bases: BaseModel
Canonical UD POS definition.
Encapsulates a single UD POS tag together with its human‑readable name, brief description, and whether it is considered an open or closed class in the UD taxonomy.
Attributes:
-
tag(str) –Short UD tag (e.g.,
"ADJ","NOUN"). -
name(str) –Human‑readable name (e.g.,
"adjective"). -
description(str) –Official UD description for the POS tag.
-
open_class(bool) –Whether the POS is an open class (True) or closed (False).
UDPartOfSpeechTag
Bases: BaseModel
Concrete tag instance attached to a token.
Validates that the provided tag is a known UD POS tag (optionally
normalizing common variants), and fills in convenience fields like
name and open_class from the canonical registry.
Attributes:
-
tag(str) –UD POS tag abbreviation (e.g.,
"ADJ"). -
name(Optional[str]) –Auto‑filled human‑readable name once validated.
-
open_class(Optional[bool]) –Auto‑filled open/closed class flag once validated.
normalize_ud_pos_tag
staticmethod
Normalize a POS tag to the standard UD tag used in UD_POS_TAGS.
Handles common LLM and upstream errors, e.g., "CONJ" -> "CCONJ", "SCONJ", etc.
Parameters:
-
tag(str) –The POS tag to normalize (e.g., "CONJ", "N", "V", "PRP").
Returns:
-
str(str) –The normalized UD POS tag (e.g., "NOUN", "VERB").
Raises:
-
ValueError–If the tag cannot be normalized to a known UD POS tag.
Source code in cltk/morphosyntax/ud_pos.py
validate_tag
classmethod
Normalize and validate the UD POS tag.
Converts common alternates (e.g., "CONJ" → "CCONJ") and ensures
the final value appears in the UD_POS_TAGS registry.
Parameters:
-
v(str) –Candidate POS tag value.
Raises:
-
ValueError–If the tag cannot be normalized to a known UD POS tag.
Returns:
-
str–The validated and normalized POS tag.
Source code in cltk/morphosyntax/ud_pos.py
fill_fields
Populate derived fields from the canonical registry.
After a successful tag validation, look up the canonical entry in
UD_POS_TAGS and set name and open_class accordingly.
Returns:
-
UDPartOfSpeechTag–The model instance with enriched fields.