6. Pipelines, Processes, Docs, and Words

Tip

See notebook https://github.com/cltk/cltk/blob/master/notebooks/CLTK%20data%20types.ipynb for a detailed walkthrough of CLTK data types.

The CLTK contains four important, native data types:

digraph Pipeline { fontname = "Bitstream Vera Sans" fontsize = 8 node [ fontname = "Bitstream Vera Sans" fontsize = 8 shape = "record" ] edge [ arrowtail = "empty" ] Pipeline [ label = "{Pipeline|\l| run(): Doc}" ] LatinPipeline [ label = "{LatinPipeline|\l|processes: [LatinStanzaProcess,\l LatinEmbeddingsProcess,\l StopsProcess,\l LatinNERProcess]}" ] GreekPipeline [ label = "{GreekPipeline|\l|processes: [GreekStanzaProcess,\l GreekEmbeddingsProcess,\l StopsProcess,\l GreekNERProcess]}" ] EtcPipeline [ label = "{…|\l|processes: list[Process]}" ] Pipeline -> LatinPipeline [dir=back] Pipeline -> GreekPipeline [dir=back] Pipeline -> EtcPipeline [dir=back] }

Inheritance of Pipeline class