8.1.20. cltk.utils package

Init for cltk.utils.

8.1.20.1. Submodules

8.1.20.2. cltk.utils.feature_extraction module

Helper functions for extracting features from CLTK data structures, especially for the purpose of preparing data for machine learning.

cltk.utils.feature_extraction.cltk_doc_to_features_table(cltk_doc)[source]

Take a CLTK Doc and return a list of lists ready for machine learning.

This expects the default features available for Greek and Latin (word embeddings, morphology, syntax, lemmata). This should be improved to fail gracefully when less features available in the input Doc.

TODO: Fail gracefully when missing info in Doc.

Return type

Tuple[List[str], List[List[Union[str, int, float, None]]]]

8.1.20.3. cltk.utils.file_operations module

Miscellaneous file operations used by various parts of the CLTK.

cltk.utils.file_operations.make_cltk_path(*fp_list)[source]

Take arbitrary number of str arguments (not list) and return expanded, absolute path to a user’s (or user-defined) cltk_data dir.

Example: In [8]: make_cltk_path(‘greek’, ‘model’, ‘greek_models_cltk’) Out[8]: ‘/Users/kyle/cltk_data/greek/model/greek_models_cltk’

Param

: fp_list tokens to join together beginning from cltk_root folder

Return type

str

cltk.utils.file_operations.open_pickle(path)[source]

Open a pickle and return loaded pickle object. :type path: str :param : path: File path to pickle file to be opened. :rtype : object

Return type

Any

cltk.utils.file_operations.md5(filename)[source]

Given a filename produce an md5 hash of the contents. >>> import tempfile, os >>> f = tempfile.NamedTemporaryFile(delete=False) >>> f.write(b’Hello Wirld!’) 12 >>> f.close() >>> md5(f.name) ‘997c62b6afe9712cad3baffb49cb8c8a’ >>> os.unlink(f.name)

Return type

str

8.1.20.4. cltk.utils.utils module

Module for commonly reused classes and functions.

class cltk.utils.utils.CLTKEnumMeta(cls, bases, classdict)[source]

Bases: enum.EnumMeta

class cltk.utils.utils.CLTKEnum(value)[source]

Bases: enum.IntEnum

An enumeration.

cltk.utils.utils.file_exists(file_path, is_dir=False)[source]

Try to expand ~/ and check if a file or dir exists. Optionally check if it’s a dir.

>>> file_exists('~/fake_file')
False
>>> file_exists('~/', is_dir=True)
True
Return type

bool

cltk.utils.utils.reverse_dict(input_dict, ignore_keys=None)[source]

Take a dict and reverse its keys and values. Optional parameter to ignore certain keys.

>>> ids_lang = dict(anci1242='Ancient Greek', lati1261='Latin', unlabeled=['Ottoman'])
>>> reverse_dict(ids_lang, ignore_keys=['unlabeled'])
{'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(dict(anci1242='Ancient Greek', lati1261='Latin'))
{'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(ids_lang)
Traceback (most recent call last):
  ...
TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
>>> reverse_dict(ids_lang, ignore_keys='unlabeled')
Traceback (most recent call last):
  ...
TypeError: The `ignore_key` parameter must be either types None or list. Received type `<class 'str'>` instead.
>>> reverse_dict(ids_lang, ignore_keys=['UNUSED-KEY'])
Traceback (most recent call last):
  ...
TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
Return type

Dict[str, str]

cltk.utils.utils.suppress_stdout()[source]

Wrap a function with this to suppress its printing to screen.

Source: https://thesmithfam.org/blog/2012/10/25/temporarily-suppress-console-output-in-python/.

>>> print("You can see this")
You can see this
>>> with suppress_stdout():
...     print("YY")
>>> print("And you can see this again")
And you can see this again
cltk.utils.utils.get_cltk_data_dir()[source]

Defines where to look for the cltk_data dir. By default, this is located in a user’s home directory and the directory is created there (~/cltk_data). However a user may customize where this goes with the OS environment variable $CLTK_DATA. If the variable is found, then its value is used.

>>> from cltk.utils import CLTK_DATA_DIR
>>> import os
>>> os.environ["CLTK_DATA"] = os.path.expanduser("~/cltk_data")
>>> cltk_data_dir = get_cltk_data_dir()
>>> os.path.split(cltk_data_dir)[1]
'cltk_data'
>>> del os.environ["CLTK_DATA"]
>>> os.environ["CLTK_DATA"] = os.path.expanduser("~/custom_dir")
>>> cltk_data_dir = os.environ.get("CLTK_DATA")
>>> os.path.split(cltk_data_dir)[1]
'custom_dir'
>>> del os.environ["CLTK_DATA"]
Return type

str

cltk.utils.utils.query_yes_no(question, default='yes')[source]

Ask a yes/no question via input()` and return ``True/False..

Source: https://stackoverflow.com/a/3041990.

Parameters
  • question (str) – Question string presented to the user.

  • default (Optional[str]) – Presumed answer if the user just hits <Enter>. It must be “yes” (the default), “no”, or None (meaning an answer is required of the user).

Return type

bool

Returns

True for “yes” or False for “no”.

cltk.utils.utils.mk_dirs_for_file(file_path)[source]

Make all dirs specified for final file. If dir already exists, then silently continue.

Parameters

file_path (str) – Paths of dirs to be created (i.e., mkdir -p)

Return type

None

Returns

None

cltk.utils.utils.get_file_with_progress_bar(model_url, file_path)[source]

Download file with a progress bar.

Source: https://stackoverflow.com/a/37573701

Parameters
  • model_url (str) – URL from which to downloaded file.

  • file_path (str) – Location at which to save file.

Raises

IOError – If size of downloaded file differs from that in remote’s content-length header.

Return type

None

Returns

None