Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Bahas Melayu 1 0 (0.00%) Bahasa Melayu 9 10 (1.74%)
Bahasa Afrikaans 1 2 (100.00%) Bahasa Inggeris 4 4 (98.52%)
Bahasa Ainu 1 2 (100.00%) Bahasa Indonesia 3 2 (25.00%)
Bahasa Amuzgo San Pedro Amuzgos 1 2 (100.00%) Bahasa Jawa 2 4 (97.62%)
Bahasa Arab 1 2 (100.00%) Bahasa Sepanyol 2 2 (99.03%)
Bahasa Arab Hijaz 1 2 (100.00%) Bahasa Jepun 2 4 (90.00%)
Bahasa Asturia 1 2 (100.00%) Bahasa Mooré 2 2 (50.00%)
Bahasa Azerbaijan 1 4 (100.00%) Bahasa Minangkabau 2 2 (50.00%)
Bahasa Azeri 1 6 (100.00%) Translingual 2 6 (66.67%)
Bahasa Bali 1 6 (100.00%) Rentas bahasa 2 2 (75.00%)
Bahasa Banjar 1 0 (0.00%) Bahasa Korea 2 2 (96.36%)
Bahasa Belanda 1 2 (100.00%) Bahasa Cam Barat 2 2 (0.00%)
Bahasa Bugis 1 2 (100.00%) Bahasa Perancis 2 6 (100.00%)
Bahasa Burma 1 2 (100.00%) Bahasa Melayu Brunei 2 2 (83.33%)
Bahasa Cam Barat 2 2 (0.00%) Bahasa Semai 1 2 (100.00%)
Bahasa Catalan 1 2 (100.00%) Bahasa Sunda 1 2 (100.00%)
Bahasa Cina 1 0 (0.00%) Bahasa Wales 1 2 (100.00%)
Bahasa Denmark 1 10 (100.00%) Bahasa Banjar 1 0 (0.00%)
Bahasa Dhivehi 1 2 (100.00%) Bahasa Portugis 1 2 (100.00%)
Bahasa Estonia 1 4 (100.00%) Bahasa Turki 1 4 (100.00%)
Bahasa Farefare 1 2 (100.00%) Bahasa Bugis 1 2 (100.00%)
Bahasa Georgia 1 2 (100.00%) Bahasa Itali 1 2 (100.00%)
Bahasa Ghotuo 1 2 (100.00%) Bahasa Melayu Kelantan-Patani 1 0 (0.00%)
Bahasa Hindi 1 2 (100.00%) Bahasa Melayu Sarawak 1 0 (0.00%)
Bahasa Hungary 1 2 (100.00%) Bahasa Punjabi 1 2 (100.00%)
Bahasa Iceland 1 4 (100.00%) Bahasa Belanda 1 2 (100.00%)
Bahasa Igbo 1 2 (100.00%) Bahasa Mandarin 1 4 (100.00%)
Bahasa Ilocano 1 2 (100.00%) Bahasa Estonia 1 4 (100.00%)
Bahasa Indonesia 3 2 (25.00%) Bahasa Cina 1 0 (0.00%)
Bahasa Ingeris 1 2 (100.00%) Bahasa Ilocano 1 2 (100.00%)
Bahasa Inggeris 4 4 (98.52%) Bahasa Serbia 1 2 (100.00%)
Bahasa Ireland 1 4 (100.00%) Bahasa Slovene 1 4 (100.00%)
Bahasa Itali 1 2 (100.00%) Bahasa Swahili 1 2 (100.00%)
Bahasa Jawa 2 4 (97.62%) Bahasa Tagalog 1 2 (100.00%)
Bahasa Jepun 2 4 (90.00%) Bahasa Yonaguni 1 2 (100.00%)
Bahasa Jerman 1 4 (100.00%) Bahasa Malta 1 2 (100.00%)
Bahasa Kazakh 1 2 (100.00%) Bahasa Maori 1 2 (100.00%)
Bahasa Kikuyu 1 2 (100.00%) Bahasa Rungus 1 2 (100.00%)
Bahasa Korea 2 2 (96.36%) Bahasa Thai 1 2 (100.00%)
Bahasa Kunigami 1 2 (100.00%) Bahasa Arab 1 2 (100.00%)
Bahasa Kurdi Utara 1 2 (100.00%) Bahasa Parsi 1 2 (100.00%)
Bahasa Ladino 1 2 (100.00%) Bahasa Urdu 1 2 (100.00%)
Bahasa Lower Sorbian 1 2 (100.00%) Bahasa Melayu Kedah 1 2 (100.00%)
Bahasa Makassar 1 2 (100.00%) Bahasa Catalan 1 2 (100.00%)
Bahasa Makau 1 2 (100.00%) Bahasa Afrikaans 1 2 (100.00%)
Bahasa Malta 1 2 (100.00%) Bahasa Poland 1 4 (100.00%)
Bahasa Mandarin 1 4 (100.00%) Bahasa Provençal Lama 1 6 (100.00%)
Bahasa Maori 1 2 (100.00%) Bahasa Hindi 1 2 (100.00%)
Bahasa Melayu 9 10 (1.74%) Bahasa Burma 1 2 (100.00%)
Bahasa Melayu Brunei 2 2 (83.33%) Bahasa Tajik 1 2 (100.00%)
Bahasa Melayu Kedah 1 2 (100.00%) Bahasa Tatar 1 2 (100.00%)
Bahasa Melayu Kelantan-Patani 1 0 (0.00%) Bahasa Kazakh 1 2 (100.00%)
Bahasa Melayu Melaka 1 2 (100.00%) Bahasa Iceland 1 4 (100.00%)
Bahasa Melayu Sarawak 1 0 (0.00%) Bahasa Telugu 1 2 (100.00%)
Bahasa Melayu Terengganu Pesisir 1 0 (0.00%) Bahasa Georgia 1 2 (100.00%)
Bahasa Minangkabau 2 2 (50.00%) Bahasa Asturia 1 2 (100.00%)
Bahasa Miranda 1 2 (100.00%) Bahasa Denmark 1 10 (100.00%)
Bahasa Moore 1 2 (100.00%) Bahasa Hungary 1 2 (100.00%)
Bahasa Mooré 2 2 (50.00%) Bahasa Jerman 1 4 (100.00%)
Bahasa Norman 1 2 (100.00%) Bahasa Ladino 1 2 (100.00%)
Bahasa Norway Bokmål 1 8 (100.00%) Bahasa Scots 1 2 (100.00%)
Bahasa Norway Nynorsk 1 8 (100.00%) Bahasa Perancis Lama 1 10 (100.00%)
Bahasa Okinawa 1 4 (100.00%) Persian 1 2 (100.00%)
Bahasa Ottoman Turkish 1 2 (100.00%) Bahasa Ireland 1 4 (100.00%)
Bahasa Parsi 1 2 (100.00%) Bahasa Ottoman Turkish 1 2 (100.00%)
Bahasa Perancis 2 6 (100.00%) Bahasa Norway Bokmål 1 8 (100.00%)
Bahasa Perancis Lama 1 10 (100.00%) Bahasa Norway Nynorsk 1 8 (100.00%)
Bahasa Piedmont 1 2 (100.00%) Bahasa Azeri 1 6 (100.00%)
Bahasa Poland 1 4 (100.00%) Bahasa Turkmen 1 4 (100.00%)
Bahasa Portugis 1 2 (100.00%) Bahasa Yunani 1 2 (100.00%)
Bahasa Provençal Lama 1 6 (100.00%) Bahasa Piedmont 1 2 (100.00%)
Bahasa Punic 1 2 (100.00%) Bahasa Okinawa 1 4 (100.00%)
Bahasa Punjabi 1 2 (100.00%) Bahasa Punic 1 2 (100.00%)
Bahasa Rungus 1 2 (100.00%) Bahasa Norman 1 2 (100.00%)
Bahasa Scots 1 2 (100.00%) Inggeris 1 0 (0.00%)
Bahasa Semai 1 2 (100.00%) Bahasa Bali 1 6 (100.00%)
Bahasa Sepanyol 2 2 (99.03%) Bahasa Suryani Klasik 1 2 (100.00%)
Bahasa Serbia 1 2 (100.00%) Bahasa Uyghur 1 2 (100.00%)
Bahasa Sinhala 1 2 (100.00%) Bahasa Uzbek 1 2 (100.00%)
Bahasa Slovene 1 4 (100.00%) Bahasa Ghotuo 1 2 (100.00%)
Bahasa Sunda 1 2 (100.00%) Bahasa Kikuyu 1 2 (100.00%)
Bahasa Suryani Klasik 1 2 (100.00%) Bahasa Arab Hijaz 1 2 (100.00%)
Bahasa Swahili 1 2 (100.00%) Bahasa Azerbaijan 1 4 (100.00%)
Bahasa Tagalog 1 2 (100.00%) Bahasa Miranda 1 2 (100.00%)
Bahasa Tajik 1 2 (100.00%) Bahasa Amuzgo San Pedro Amuzgos 1 2 (100.00%)
Bahasa Tatar 1 2 (100.00%) Bahasa Kurdi Utara 1 2 (100.00%)
Bahasa Telugu 1 2 (100.00%) Bahasa Lower Sorbian 1 2 (100.00%)
Bahasa Thai 1 2 (100.00%) Bahasa Makau 1 2 (100.00%)
Bahasa Turki 1 4 (100.00%) Bahas Melayu 1 0 (0.00%)
Bahasa Turkmen 1 4 (100.00%) Bahasa Ainu 1 2 (100.00%)
Bahasa Urdu 1 2 (100.00%) Bahasa Ingeris 1 2 (100.00%)
Bahasa Uyghur 1 2 (100.00%) Bahasa Kunigami 1 2 (100.00%)
Bahasa Uzbek 1 2 (100.00%) Bahasa Makassar 1 2 (100.00%)
Bahasa Wales 1 2 (100.00%) Bahasa Melayu Terengganu Pesisir 1 0 (0.00%)
Bahasa Yonaguni 1 2 (100.00%) Bahasa Melayu Melaka 1 2 (100.00%)
Bahasa Yunani 1 2 (100.00%) Bahasa Igbo 1 2 (100.00%)
Bahasa Yup'ik 1 2 (100.00%) Bahasa Sinhala 1 2 (100.00%)
Inggeris 1 0 (0.00%) Bahasa Moore 1 2 (100.00%)
Persian 1 2 (100.00%) Bahasa Yup'ik 1 2 (100.00%)
Rentas bahasa 2 2 (75.00%) Bahasa Farefare 1 2 (100.00%)
Translingual 2 6 (66.67%) Bahasa Dhivehi 1 2 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-04-13 from the mswiktionary dump dated 2025-04-03 using wiktextract (aeaf2a1 and fb63907). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.