Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Bahas Melayu 1 0 (0.00%) Bahasa Melayu 15 22 (3.25%)
Bahasa Afrikaans 3 8 (48.00%) Bahasa Jerman 7 12 (3.44%)
Bahasa Ainu 1 2 (100.00%) Bahasa Inggeris 6 4 (4.83%)
Bahasa Amuzgo San Pedro Amuzgos 1 0 (0.00%) Bahasa Indonesia 4 2 (8.00%)
Bahasa Arab 3 8 (86.67%) Bahasa Jawa 4 8 (89.13%)
Bahasa Arab Hijaz 1 0 (0.00%) Bahasa Sepanyol 4 10 (20.39%)
Bahasa Asturia 1 0 (0.00%) Bahasa Jepun 4 2 (30.00%)
Bahasa Azerbaijan 1 2 (0.00%) Rentas bahasa 4 6 (33.33%)
Bahasa Azeri 2 8 (20.00%) Bahasa Korea 4 6 (76.36%)
Bahasa Bali 1 6 (100.00%) Bahasa Mandarin 4 6 (1.02%)
Bahasa Banjar 1 0 (0.00%) Bahasa Perancis 4 10 (9.95%)
Bahasa Belanda 3 6 (96.00%) Translingual 3 2 (0.00%)
Bahasa Bugis 1 2 (100.00%) Bahasa Belanda 3 6 (96.00%)
Bahasa Burma 1 2 (100.00%) Bahasa Arab 3 8 (86.67%)
Bahasa Cam Barat 2 2 (0.00%) Bahasa Parsi 3 4 (70.71%)
Bahasa Catalan 1 0 (0.00%) Bahasa Afrikaans 3 8 (48.00%)
Bahasa Cina 1 0 (0.00%) Bahasa Turkmen 3 10 (13.11%)
Bahasa Denmark 1 10 (100.00%) Bahasa Mooré 2 0 (0.00%)
Bahasa Dhivehi 1 0 (0.00%) Bahasa Turki 2 4 (4.35%)
Bahasa Estonia 2 4 (1.75%) Bahasa Minangkabau 2 2 (50.00%)
Bahasa Farefare 1 0 (0.00%) Bahasa Melayu Kelantan-Patani 2 2 (0.00%)
Bahasa Georgia 1 2 (100.00%) Bahasa Punjabi 2 2 (87.50%)
Bahasa Ghotuo 1 0 (0.00%) Bahasa Estonia 2 4 (1.75%)
Bahasa Hindi 2 2 (0.69%) Bahasa Cam Barat 2 2 (0.00%)
Bahasa Hungary 1 0 (0.00%) Bahasa Urdu 2 4 (88.89%)
Bahasa Iceland 1 2 (100.00%) Bahasa Hindi 2 2 (0.69%)
Bahasa Igbo 2 2 (50.00%) Bahasa Azeri 2 8 (20.00%)
Bahasa Ilocano 1 2 (100.00%) Bahasa Melayu Brunei 2 2 (71.43%)
Bahasa Indonesia 4 2 (8.00%) Bahasa Igbo 2 2 (50.00%)
Bahasa Ingeris 1 0 (0.00%) Bahasa Semai 1 0 (0.00%)
Bahasa Inggeris 6 4 (4.83%) Bahasa Sunda 1 2 (100.00%)
Bahasa Ireland 1 4 (100.00%) Bahasa Wales 1 0 (0.00%)
Bahasa Itali 1 0 (0.00%) Bahasa Banjar 1 0 (0.00%)
Bahasa Jawa 4 8 (89.13%) Bahasa Portugis 1 0 (0.00%)
Bahasa Jepun 4 2 (30.00%) Bahasa Bugis 1 2 (100.00%)
Bahasa Jerman 7 12 (3.44%) Bahasa Itali 1 0 (0.00%)
Bahasa Kazakh 1 2 (100.00%) Bahasa Melayu Sarawak 1 0 (0.00%)
Bahasa Kelantan 1 0 (0.00%) Bahasa Cina 1 0 (0.00%)
Bahasa Kikuyu 1 0 (0.00%) Bahasa Ilocano 1 2 (100.00%)
Bahasa Korea 4 6 (76.36%) Bahasa Serbia 1 2 (100.00%)
Bahasa Kunigami 1 2 (100.00%) Bahasa Slovene 1 2 (100.00%)
Bahasa Kurdi Utara 1 2 (100.00%) Bahasa Swahili 1 2 (100.00%)
Bahasa Ladino 1 2 (100.00%) Bahasa Tagalog 1 2 (100.00%)
Bahasa Lower Sorbian 1 2 (100.00%) Bahasa Yonaguni 1 2 (100.00%)
Bahasa Makassar 1 2 (100.00%) Bahasa Malta 1 0 (0.00%)
Bahasa Makau 1 0 (0.00%) Bahasa Maori 1 2 (100.00%)
Bahasa Malta 1 0 (0.00%) Bahasa Rungus 1 0 (0.00%)
Bahasa Mandarin 4 6 (1.02%) Bahasa Thai 1 2 (100.00%)
Bahasa Maori 1 2 (100.00%) Bahasa Melayu Kedah 1 2 (100.00%)
Bahasa Melayu 15 22 (3.25%) Bahasa Catalan 1 0 (0.00%)
Bahasa Melayu Brunei 2 2 (71.43%) Bahasa Poland 1 4 (100.00%)
Bahasa Melayu Kedah 1 2 (100.00%) Bahasa Provençal Lama 1 6 (100.00%)
Bahasa Melayu Kelantan-Patani 2 2 (0.00%) Bahasa Burma 1 2 (100.00%)
Bahasa Melayu Sarawak 1 0 (0.00%) Bahasa Tajik 1 2 (100.00%)
Bahasa Melayu Terengganu Pesisir 1 0 (0.00%) Bahasa Tatar 1 2 (100.00%)
Bahasa Minangkabau 2 2 (50.00%) Bahasa Kazakh 1 2 (100.00%)
Bahasa Miranda 1 0 (0.00%) Bahasa Iceland 1 2 (100.00%)
Bahasa Moore 1 0 (0.00%) Bahasa Telugu 1 0 (0.00%)
Bahasa Mooré 2 0 (0.00%) Bahasa Georgia 1 2 (100.00%)
Bahasa Norman 1 0 (0.00%) Bahasa Asturia 1 0 (0.00%)
Bahasa Norway Bokmål 1 8 (100.00%) Bahasa Denmark 1 10 (100.00%)
Bahasa Norway Nynorsk 1 8 (100.00%) Bahasa Hungary 1 0 (0.00%)
Bahasa Okinawa 1 4 (100.00%) Bahasa Ladino 1 2 (100.00%)
Bahasa Ottoman Turkish 1 2 (100.00%) Bahasa Scots 1 0 (0.00%)
Bahasa Parsi 3 4 (70.71%) Bahasa Perancis Lama 1 10 (100.00%)
Bahasa Perancis 4 10 (9.95%) Persian 1 0 (0.00%)
Bahasa Perancis Lama 1 10 (100.00%) Bahasa Ireland 1 4 (100.00%)
Bahasa Piedmont 1 0 (0.00%) Bahasa Ottoman Turkish 1 2 (100.00%)
Bahasa Poland 1 4 (100.00%) Bahasa Norway Bokmål 1 8 (100.00%)
Bahasa Portugis 1 0 (0.00%) Bahasa Norway Nynorsk 1 8 (100.00%)
Bahasa Provençal Lama 1 6 (100.00%) Bahasa Yunani 1 2 (100.00%)
Bahasa Punic 1 0 (0.00%) Bahasa Piedmont 1 0 (0.00%)
Bahasa Punjabi 2 2 (87.50%) Bahasa Okinawa 1 4 (100.00%)
Bahasa Rungus 1 0 (0.00%) Bahasa Punic 1 0 (0.00%)
Bahasa Scots 1 0 (0.00%) Bahasa Norman 1 0 (0.00%)
Bahasa Semai 1 0 (0.00%) Inggeris 1 0 (0.00%)
Bahasa Sepanyol 4 10 (20.39%) Bahasa Bali 1 6 (100.00%)
Bahasa Serbia 1 2 (100.00%) Bahasa Suryani Klasik 1 0 (0.00%)
Bahasa Sinhala 1 0 (0.00%) Bahasa Uyghur 1 0 (0.00%)
Bahasa Slovene 1 2 (100.00%) Bahasa Uzbek 1 0 (0.00%)
Bahasa Sunda 1 2 (100.00%) Bahasa Ghotuo 1 0 (0.00%)
Bahasa Suryani Klasik 1 0 (0.00%) Bahasa Kikuyu 1 0 (0.00%)
Bahasa Swahili 1 2 (100.00%) Bahasa Arab Hijaz 1 0 (0.00%)
Bahasa Tagalog 1 2 (100.00%) Bahasa Azerbaijan 1 2 (0.00%)
Bahasa Tajik 1 2 (100.00%) Bahasa Miranda 1 0 (0.00%)
Bahasa Tatar 1 2 (100.00%) Bahasa Amuzgo San Pedro Amuzgos 1 0 (0.00%)
Bahasa Telugu 1 0 (0.00%) Bahasa Kurdi Utara 1 2 (100.00%)
Bahasa Thai 1 2 (100.00%) Bahasa Melayu Terengganu Pesisir 1 0 (0.00%)
Bahasa Turki 2 4 (4.35%) Bahasa Lower Sorbian 1 2 (100.00%)
Bahasa Turkmen 3 10 (13.11%) Bahasa Makau 1 0 (0.00%)
Bahasa Urdu 2 4 (88.89%) Bahas Melayu 1 0 (0.00%)
Bahasa Uyghur 1 0 (0.00%) Bahasa Ainu 1 2 (100.00%)
Bahasa Uzbek 1 0 (0.00%) Bahasa Ingeris 1 0 (0.00%)
Bahasa Wales 1 0 (0.00%) Bahasa Kunigami 1 2 (100.00%)
Bahasa Yonaguni 1 2 (100.00%) Bahasa Makassar 1 2 (100.00%)
Bahasa Yunani 1 2 (100.00%) English 1 0 (0.00%)
Bahasa Yup'ik 1 0 (0.00%) Bahasa Sinhala 1 0 (0.00%)
English 1 0 (0.00%) Bahasa Moore 1 0 (0.00%)
Inggeris 1 0 (0.00%) Bahasa Yup'ik 1 0 (0.00%)
Persian 1 0 (0.00%) Bahasa Farefare 1 0 (0.00%)
Rentas bahasa 4 6 (33.33%) Bahasa Dhivehi 1 0 (0.00%)
Translingual 3 2 (0.00%) Bahasa Kelantan 1 0 (0.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-09-19 from the mswiktionary dump dated 2025-09-02 using wiktextract (740530f and 0c495c6). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.