The Persian digits and alphabet are placed in cltk/corpus/persian/alphabet.py.
The digits are placed in a list
DIGITS with the digit the same as the list index (0-9). For example, the persian digit for 5 can be accessed in this manner:
In : from cltk.corpus.persian.alphabet import DIGITS In : DIGITS Out: '۵'
Persian has three
SHORT_VOWELS that are essentially diacritics used in the script. It also has three
LONG_VOWELS that are actually part of the alphabet. The corresponding lists can be imported:
In : from cltk.corpus.persian.alphabet import SHORT_VOWELS In : SHORT_VOWELS Out: ['َ', 'ِ', 'ُ'] In : from cltk.corpus.persian.alphabet import LONG_VOWELS In : LONG_VOWELS Out: ['ا', 'و', 'ی']
The rest of the alphabet are
CONSONANTS that can be accessed in a similar way.
There are three
SPECIAL characters that are ligatures or different orthographical shapes of the alphabet.
In : from cltk.corpus.persian.alphabet import SPECIAL In : SPECIAL Out: ['ﺁ', 'ۀ', 'ﻻ']