Chinese

Corpora

Use CorpusImporter() or browse the CLTK GitHub organization (anything beginning with chinese_) to discover available Chinese corpora.

In [1]: from cltk.corpus.utils.importer import CorpusImporter

In [2]: c = CorpusImporter('chinese')

In [3]: c.list_corpora
Out[3]:
['chinese_text_cbeta_01',
 'chinese_text_cbeta_02',
 'chinese_text_cbeta_indices']