8.1.17. cltk.tag package

8.1.17.1. Submodules

8.1.17.2. cltk.tag.ner module

Named entity recognition (NER).

class cltk.tag.ner.NamedEntityReplacer[source]

Bases: object

tag_ner_fr(input_text, output_type=<class 'list'>)[source]
cltk.tag.ner._check_latest_data(lang)[source]

Check for presence of proper names dir, clone if not.

cltk.tag.ner.tag_ner(lang, input_text, output_type=<class 'list'>)[source]

Run NER for chosen language.

8.1.17.3. cltk.tag.pos module

Tag part of speech (POS) using CLTK taggers.

class cltk.tag.pos.POSTag(language)[source]

Bases: object

Tag words’ parts-of-speech.

_setup_language_variables(lang)[source]

Check for language availability and presence of tagger files. :type lang: str :param lang: The language argument given to the class. :type lang: str :rtype : dict

tag_unigram(untagged_string)[source]

Tag POS with unigram tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_bigram(untagged_string)[source]

Tag POS with bigram tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_trigram(untagged_string)[source]

Tag POS with trigram tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_ngram_123_backoff(untagged_string)[source]

Tag POS with 1-, 2-, 3-gram tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_ngram_12_backoff(untagged_string)[source]

Tag POS with 1-, 2-gram tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_tnt(untagged_string)[source]

Tag POS with TnT tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_crf(untagged_string)[source]

Tag POS with CRF tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

tag_perceptron(untagged_string)[source]

Tag POS with Perceptron tagger. :type untagged_string: str :param : An untagged, untokenized string of text. :rtype tagged_text: str

8.1.17.4. cltk.tag.treebanks module

Generate a Python dict from input tags from a treebank, in str. As of this version, only treebanks following the Penn notation are supported.

cltk.tag.treebanks.set_path(dicts, keys, v)[source]

Helper function for modifying nested dictionaries

Parameters:
  • dicts – dict: the given dictionary

  • keys – list str: path to added value

  • v – str: value to be added

>>> d = dict()
>>> set_path(d, ['a', 'b', 'c'],  'd')
>>> d
{'a': {'b': {'c': ['d']}}}

In case of duplicate paths, the additional value will be added to the leaf node rather than simply replace it:

>>> set_path(d, ['a', 'b', 'c'],  'e')
>>> d
{'a': {'b': {'c': ['d', 'e']}}}
cltk.tag.treebanks.get_paths(src)[source]

Generates root-to-leaf paths, given a treebank in string format. Note that get_path is an iterator and does not return all the paths simultaneously.

Parameters:

src – str: treebank

cltk.tag.treebanks.parse_treebanks(st)[source]

Returns the corresponding tree of the treebank, in the form of a nested dictionary :param st: str: treebank using Penn notation