Old English is the earliest historical form of the English language, spoken in England and southern and eastern Scotland in the early Middle Ages. It was brought to Great Britain by Anglo-Saxon settlers probably in the mid 5th century, and the first Old English literary works date from the mid-7th century. (Source: Wikipedia)
To use the CLTK's built-in stopwords list, We use an example from Beowulf:
In : from nltk.tokenize.punkt import PunktLanguageVars In : from cltk.stop.old_english.stops import STOPS_LIST In : sentence = 'þe hie ær drugon aldorlease lange hwile.' In : p = PunktLanguageVars() In : tokens = p.word_tokenize(sentence.lower()) In : [w for w in tokens if not w in STOPS_LIST] Out: ['hie', 'drugon', 'aldorlease', 'catilina', 'lange', 'hwile', '.']
The corpus module has a class for generating a Swadesh list for Old English.
In : from cltk.corpus.swadesh import Swadesh In : swadesh = Swadesh('eng_old') In : swadesh.words()[:10] Out: ['ic, iċċ, ih', 'þū', 'hē', 'wē', 'ġē', 'hīe', 'þēs, þēos, þis', 'sē, sēo, þæt', 'hēr', 'þār, þāra, þǣr, þēr']