CBOLD Downloadable Data Sources

CBOLD has gathered data from a variety of sources. While we a still working on much of it, we are making it available for preliminary evaluation.

Please let us know what you think and if you find any of the material useful.

Notes on the use of our data: Please refer to the The Bantuists' Manifesto for the conditions upon which we make this data available to the public. To make sure that everyone who uses the data contained here acts in a manner consistent with the Bantuist's Manifesto, please inform anyone to whom you distribute any of our files of the terms contained in the manifesto. If you redistribute the material on the web please refer any users directly to the manifesto or put a copy of our terms of use on your own page.

Approximate statistics for the CBOLD data sources as of June 20, 1999:


Number of Sources:70
Number of Languages:200
Number of Lexical Items:445,000


Dictionaries and Word Lists This is where to download the dictionaries and word lists we have received and processed. Note that often the data for a particular language exists in more than one form: the original document may be a word processor file, but CBOLD may have created a FileMaker database on the basis of the original document. The quality of these data sources varies. Some are in excellent condition for doing online research while others are scanned dictionaries which we have proofread but not completely processed.

List of data sources by language This is a list of data sources by language.


Return to home.

20.6.99 jcgood