processes
Sentence splitting processes.
This module exposes a lightweight, language‑aware sentence splitter built
around regular expressions per language (identified by Glottolog codes).
It defines a generic SentenceSplittingProcess and many concrete
subclasses, one per language or stage.
SentenceSplittingProcess
Bases: Process
Base class for sentence splitting processes.
Subclasses set glottolog_id and inherit the default algorithm,
which delegates to a multi‑language regex splitter.
Attributes:
-
glottolog_id(Optional[str]) –Target language Glottolog code used to choose punctuation rules for sentence boundaries.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LycianASentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Lycian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LydianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Lydian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
PalaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Palaic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
CarianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Carian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
CuneiformLuwianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Cuneiform Luwian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
HieroglyphicLuwianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Hieroglyphic Luwian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalArmenianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Armenian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleArmenianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Armenian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AkkadianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Akkadian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AncientGreekSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Ancient Greek.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AncientHebrewSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Ancient Hebrew.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalSyriacSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Syriac.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalTibetanSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Tibetan.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
CopticSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Coptic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LatinSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Latin.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OfficialAramaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Official Aramaic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldEnglishSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old English.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldNorseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Norse.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
PaliSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Pali.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalSanskritSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Sanskrit.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
VedicSanskritSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Vedic Sanskrit.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalArabicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Arabic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ChurchSlavonicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Church Slavonic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleEnglishSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle English.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleFrenchSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle French.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddlePersianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Persian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldFrenchSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old French.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleHighGermanSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle High German.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldHighGermanSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old High German.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
GothicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Gothic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
HindiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Hindi.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
KhariBoliSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Khari Boli (Hindi dialect).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
BrajSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Braj Bhasha.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AwadhiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Awadhi.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
UrduSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Urdu.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LiteraryChineseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Chinese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldChineseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Chinese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleChineseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Chinese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
BaihuaChineseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Early Vernacular Chinese (Baihua).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
PanjabiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Panjabi.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ParthianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Parthian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
DemoticSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Egyptian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
BengaliSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Bengali.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OdiaSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Odia (Oriya).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AssameseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Assamese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
GujaratiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Gujarati.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MarathiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Marathi.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
BagriSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Bagri (Rajasthani).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SinhalaSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Sinhala.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SindhiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Sindhi.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
KashmiriSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Kashmiri.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldBurmeseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Burmese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalBurmeseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Burmese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
TangutSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Tangut.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
NewarSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Newar (Classical Nepal Bhasa).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MeiteiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Meitei (Classical Manipuri).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SgawKarenSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Sgaw Karen.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleMongolSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Mongol.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalMongolianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Mongolian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MogholiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Mogholi (Moghol).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
NumidianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Numidian (Ancient Berber).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
TaitaSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Cushitic Taita.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
HausaSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Hausa.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldJurchenSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Jurchen.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldJapaneseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Japanese.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldHungarianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Hungarian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ChagataiSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Chagatai.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldTurkicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Turkic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldTamilSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Tamil.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
HittiteSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Hittite (hit1242).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
TocharianASentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Tocharian A (toch1238).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
TocharianBSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Tocharian B (toch1237).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AvestanSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Avestan (aves1237).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
BactrianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Bactrian (bact1239).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SogdianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Sogdian (sogd1245).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
KhotaneseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Khotanese (khot1251).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
TumshuqeseSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Tumshuqese (tums1237).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldPersianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Old Persian (oldp1254).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
EarlyIrishSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Old Irish (oldi1245).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
UgariticSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Ugaritic (ugar1238).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
PhoenicianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Phoenician (phoe1239).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
GeezSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Geez (geez1241).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleEgyptianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Middle Egyptian (midd1369).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldEgyptianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Old Egyptian (olde1242).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LateEgyptianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitter for Late Egyptian (late1256).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldMiddleWelshSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Welsh.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleBretonSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Breton.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleCornishSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Cornish.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldPrussianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Prussian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LithuanianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Lithuanian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
LatvianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Latvian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AlbanianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Albanian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SauraseniPrakritSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Sauraseni Prakrit.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MaharastriPrakritSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Maharastri Prakrit.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MagadhiPrakritSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Magadhi Prakrit.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
GandhariSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Gandhari.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MoabiteSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Moabite.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
AmmoniteSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Ammonite.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
EdomiteSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Edomite.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldAramaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Aramaic (up to 700 BCE).
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
OldAramaicSamalianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Old Aramaic–Samʾalian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
MiddleAramaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Middle Aramaic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
ClassicalMandaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Classical Mandaic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
HatranSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Hatran.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
JewishBabylonianAramaicSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Jewish Babylonian Aramaic.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.
Source code in cltk/sentence/processes.py
SamalianSentenceSplittingProcess
Bases: SentenceSplittingProcess
Sentence splitting process for Samʾalian.
algorithm
cached
property
Return the language‑appropriate sentence boundary function.
The returned callable takes (text, glottolog_id) and
returns a list of (start, stop) character offsets for each
sentence.
Returns:
-
Callable[[str, str], list[tuple[int, int]]]–A callable implementing sentence boundary detection.
Raises:
-
ValueError–If the
glottolog_idis not supported.
run
Compute sentence boundaries and return an updated document.
Parameters:
-
input_doc(Doc) –Document whose
normalized_textwill be segmented.
Returns:
-
Doc–A shallow copy of
input_docwithsentence_boundariesset to -
Doc–a list of
(start, stop)character indices.
Raises:
-
ValueError–If
normalized_textis missing or ifglottolog_idis not set on the process.