Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Bahas Melayu 1 0 (0.00%) Bahasa Melayu 15 18 (3.28%)
Bahasa Afrikaans 3 8 (48.00%) Bahasa Jerman 7 12 (3.44%)
Bahasa Ainu 1 2 (100.00%) Bahasa Inggeris 6 4 (4.80%)
Bahasa Amuzgo San Pedro Amuzgos 1 0 (0.00%) Bahasa Indonesia 4 2 (8.33%)
Bahasa Arab 3 4 (86.67%) Bahasa Jawa 4 8 (89.13%)
Bahasa Arab Hijaz 1 0 (0.00%) Bahasa Sepanyol 4 10 (20.39%)
Bahasa Asturia 1 0 (0.00%) Bahasa Jepun 4 2 (30.00%)
Bahasa Azerbaijan 1 2 (0.00%) Rentas bahasa 4 4 (33.33%)
Bahasa Azeri 2 8 (20.00%) Bahasa Korea 4 4 (76.36%)
Bahasa Bali 1 6 (100.00%) Bahasa Mandarin 4 6 (1.02%)
Bahasa Banjar 1 0 (0.00%) Bahasa Perancis 4 10 (9.95%)
Bahasa Belanda 3 4 (96.00%) Translingual 3 2 (0.00%)
Bahasa Bugis 1 2 (100.00%) Bahasa Belanda 3 4 (96.00%)
Bahasa Burma 1 2 (100.00%) Bahasa Arab 3 4 (86.67%)
Bahasa Cam Barat 2 2 (0.00%) Bahasa Parsi 3 6 (70.71%)
Bahasa Catalan 1 0 (0.00%) Bahasa Afrikaans 3 8 (48.00%)
Bahasa Cina 1 0 (0.00%) Bahasa Turkmen 3 10 (13.11%)
Bahasa Denmark 1 10 (100.00%) Bahasa Mooré 2 0 (0.00%)
Bahasa Dhivehi 1 0 (0.00%) Bahasa Turki 2 4 (4.35%)
Bahasa Estonia 2 4 (1.75%) Bahasa Minangkabau 2 2 (50.00%)
Bahasa Farefare 1 0 (0.00%) Bahasa Punjabi 2 2 (87.50%)
Bahasa Georgia 1 2 (100.00%) Bahasa Estonia 2 4 (1.75%)
Bahasa Ghotuo 1 0 (0.00%) Bahasa Cam Barat 2 2 (0.00%)
Bahasa Hindi 2 2 (0.69%) Bahasa Urdu 2 4 (88.89%)
Bahasa Hungary 1 0 (0.00%) Bahasa Hindi 2 2 (0.69%)
Bahasa Iceland 1 4 (100.00%) Bahasa Azeri 2 8 (20.00%)
Bahasa Igbo 2 2 (50.00%) Bahasa Melayu Brunei 2 2 (71.43%)
Bahasa Ilocano 1 2 (100.00%) Bahasa Igbo 2 2 (50.00%)
Bahasa Indonesia 4 2 (8.33%) Bahasa Semai 1 0 (0.00%)
Bahasa Ingeris 1 0 (0.00%) Bahasa Sunda 1 2 (100.00%)
Bahasa Inggeris 6 4 (4.80%) Bahasa Wales 1 0 (0.00%)
Bahasa Ireland 1 2 (100.00%) Bahasa Banjar 1 0 (0.00%)
Bahasa Itali 1 0 (0.00%) Bahasa Portugis 1 0 (0.00%)
Bahasa Jawa 4 8 (89.13%) Bahasa Bugis 1 2 (100.00%)
Bahasa Jepun 4 2 (30.00%) Bahasa Itali 1 0 (0.00%)
Bahasa Jerman 7 12 (3.44%) Bahasa Melayu Kelantan-Patani 1 0 (0.00%)
Bahasa Kazakh 1 2 (100.00%) Bahasa Melayu Sarawak 1 0 (0.00%)
Bahasa Kikuyu 1 0 (0.00%) Bahasa Cina 1 0 (0.00%)
Bahasa Korea 4 4 (76.36%) Bahasa Ilocano 1 2 (100.00%)
Bahasa Kunigami 1 2 (100.00%) Bahasa Serbia 1 2 (100.00%)
Bahasa Kurdi Utara 1 2 (100.00%) Bahasa Slovene 1 2 (100.00%)
Bahasa Ladino 1 2 (100.00%) Bahasa Swahili 1 2 (100.00%)
Bahasa Lower Sorbian 1 2 (100.00%) Bahasa Tagalog 1 2 (100.00%)
Bahasa Makassar 1 2 (100.00%) Bahasa Yonaguni 1 2 (100.00%)
Bahasa Makau 1 0 (0.00%) Bahasa Malta 1 0 (0.00%)
Bahasa Malta 1 0 (0.00%) Bahasa Maori 1 2 (100.00%)
Bahasa Mandarin 4 6 (1.02%) Bahasa Rungus 1 0 (0.00%)
Bahasa Maori 1 2 (100.00%) Bahasa Thai 1 4 (100.00%)
Bahasa Melayu 15 18 (3.28%) Bahasa Melayu Kedah 1 2 (100.00%)
Bahasa Melayu Brunei 2 2 (71.43%) Bahasa Catalan 1 0 (0.00%)
Bahasa Melayu Kedah 1 2 (100.00%) Bahasa Poland 1 4 (100.00%)
Bahasa Melayu Kelantan-Patani 1 0 (0.00%) Bahasa Provençal Lama 1 6 (100.00%)
Bahasa Melayu Sarawak 1 0 (0.00%) Bahasa Burma 1 2 (100.00%)
Bahasa Melayu Terengganu Pesisir 1 0 (0.00%) Bahasa Tajik 1 2 (100.00%)
Bahasa Minangkabau 2 2 (50.00%) Bahasa Tatar 1 2 (100.00%)
Bahasa Miranda 1 0 (0.00%) Bahasa Kazakh 1 2 (100.00%)
Bahasa Moore 1 0 (0.00%) Bahasa Iceland 1 4 (100.00%)
Bahasa Mooré 2 0 (0.00%) Bahasa Telugu 1 0 (0.00%)
Bahasa Norman 1 0 (0.00%) Bahasa Georgia 1 2 (100.00%)
Bahasa Norway Bokmål 1 2 (100.00%) Bahasa Asturia 1 0 (0.00%)
Bahasa Norway Nynorsk 1 8 (100.00%) Bahasa Denmark 1 10 (100.00%)
Bahasa Okinawa 1 4 (100.00%) Bahasa Hungary 1 0 (0.00%)
Bahasa Ottoman Turkish 1 2 (100.00%) Bahasa Ladino 1 2 (100.00%)
Bahasa Parsi 3 6 (70.71%) Bahasa Scots 1 0 (0.00%)
Bahasa Perancis 4 10 (9.95%) Bahasa Perancis Lama 1 6 (100.00%)
Bahasa Perancis Lama 1 6 (100.00%) Persian 1 0 (0.00%)
Bahasa Piedmont 1 0 (0.00%) Bahasa Ireland 1 2 (100.00%)
Bahasa Poland 1 4 (100.00%) Bahasa Ottoman Turkish 1 2 (100.00%)
Bahasa Portugis 1 0 (0.00%) Bahasa Norway Bokmål 1 2 (100.00%)
Bahasa Provençal Lama 1 6 (100.00%) Bahasa Norway Nynorsk 1 8 (100.00%)
Bahasa Punic 1 0 (0.00%) Bahasa Yunani 1 2 (100.00%)
Bahasa Punjabi 2 2 (87.50%) Bahasa Piedmont 1 0 (0.00%)
Bahasa Rungus 1 0 (0.00%) Bahasa Okinawa 1 4 (100.00%)
Bahasa Scots 1 0 (0.00%) Bahasa Punic 1 0 (0.00%)
Bahasa Semai 1 0 (0.00%) Bahasa Norman 1 0 (0.00%)
Bahasa Sepanyol 4 10 (20.39%) Inggeris 1 0 (0.00%)
Bahasa Serbia 1 2 (100.00%) Bahasa Bali 1 6 (100.00%)
Bahasa Sinhala 1 0 (0.00%) Bahasa Suryani Klasik 1 0 (0.00%)
Bahasa Slovene 1 2 (100.00%) Bahasa Uyghur 1 0 (0.00%)
Bahasa Sunda 1 2 (100.00%) Bahasa Uzbek 1 0 (0.00%)
Bahasa Suryani Klasik 1 0 (0.00%) Bahasa Ghotuo 1 0 (0.00%)
Bahasa Swahili 1 2 (100.00%) Bahasa Kikuyu 1 0 (0.00%)
Bahasa Tagalog 1 2 (100.00%) Bahasa Arab Hijaz 1 0 (0.00%)
Bahasa Tajik 1 2 (100.00%) Bahasa Azerbaijan 1 2 (0.00%)
Bahasa Tatar 1 2 (100.00%) Bahasa Miranda 1 0 (0.00%)
Bahasa Telugu 1 0 (0.00%) Bahasa Amuzgo San Pedro Amuzgos 1 0 (0.00%)
Bahasa Thai 1 4 (100.00%) Bahasa Kurdi Utara 1 2 (100.00%)
Bahasa Turki 2 4 (4.35%) Bahasa Melayu Terengganu Pesisir 1 0 (0.00%)
Bahasa Turkmen 3 10 (13.11%) Bahasa Lower Sorbian 1 2 (100.00%)
Bahasa Urdu 2 4 (88.89%) Bahasa Makau 1 0 (0.00%)
Bahasa Uyghur 1 0 (0.00%) Bahas Melayu 1 0 (0.00%)
Bahasa Uzbek 1 0 (0.00%) Bahasa Ainu 1 2 (100.00%)
Bahasa Wales 1 0 (0.00%) Bahasa Ingeris 1 0 (0.00%)
Bahasa Yonaguni 1 2 (100.00%) Bahasa Kunigami 1 2 (100.00%)
Bahasa Yunani 1 2 (100.00%) Bahasa Makassar 1 2 (100.00%)
Bahasa Yup'ik 1 0 (0.00%) English 1 0 (0.00%)
English 1 0 (0.00%) Bahasa Sinhala 1 0 (0.00%)
Inggeris 1 0 (0.00%) Bahasa Moore 1 0 (0.00%)
Persian 1 0 (0.00%) Bahasa Yup'ik 1 0 (0.00%)
Rentas bahasa 4 4 (33.33%) Bahasa Farefare 1 0 (0.00%)
Translingual 3 2 (0.00%) Bahasa Dhivehi 1 0 (0.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-08-09 from the mswiktionary dump dated 2025-07-21 using wiktextract (99a4ed9 and 3c020d2). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.