Raw data downloads extracted from Wiktionary

This page contains download links for the raw data extracted from Wiktionary using Wiktextract. This data is updated regularly (usually at least once a week).

English-language edition of Wiktionary

The current version was extracted from the enwiktionary dump dated 2024-07-01. It contains data for hundreds of languages, and has glosses and other metadata in English. The data formats are documented at https://github.com/tatuylonen/wiktextract.

For post-processed data, please look at the download links at the end of the main page for each language (or the page for all languages combined) under https://kaikki.org/dictionary/.

Raw downloads for other Wiktionary editions

Because each different edition of Wiktionary requires a lot of work so that Wiktextract can process it, there are still only a few other editions that are currently supported. These are currently work in progress.

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2024-07-13 from the enwiktionary dump dated 2024-07-01 using wiktextract (f8674bc and 7cfad79).

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.