Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
:Templat:Lampung Api 1 2 (100.00%) bahasa Indonesia 17 22 (3.84%)
:Templat:dusun balangan 1 2 (100.00%) bahasa Indonesia Peranakan 11 4 (1.83%)
:Templat:maanyan siong 1 2 (100.00%) bahasa Melayu 5 2 (83.91%)
:Templat:samihim 1 2 (100.00%) bahasa Jawa 4 2 (64.71%)
Arti 1 4 (100.00%) bahasa Belanda 3 4 (66.67%)
Bahasa Indonesia 1 0 (0.00%) bahasa Banjar 3 2 (25.00%)
Bahasa Jawa 3 2 (80.00%) bahasa Jepang 3 20 (94.34%)
Bahasa Jepang 1 0 (0.00%) bahasa Batak Simalungun 3 4 (40.00%)
Bahasa Vietnam 1 26 (100.00%) Bahasa Jawa 3 2 (80.00%)
GHWOSMXbahasa Indonesia 1 0 (0.00%) bahasa Inggris 2 2 (90.00%)
Lintas bahasa 1 0 (0.00%) bahasa Aceh 2 2 (66.67%)
bahasa Aceh 2 2 (66.67%) bahasa Minangkabau 2 2 (99.52%)
bahasa Afrikaans 1 0 (0.00%) bahasa Sunda 2 4 (96.00%)
bahasa Amis 1 0 (0.00%) bahasa Tetun 2 2 (50.00%)
bahasa Arab 1 2 (100.00%) bahasa Jawa kuno 2 4 (40.00%)
bahasa Armenia 1 0 (0.00%) bahasa Batak Toba 2 2 (75.00%)
bahasa Badui 1 2 (100.00%) bahasa Kanakanabu 2 2 (80.00%)
bahasa Bahnar 1 2 (100.00%) bahasa Cham Timur 2 2 (83.33%)
bahasa Banjar 3 2 (25.00%) bahasa Tionghoa 2 2 (90.00%)
bahasa Batak Mandailing 1 2 (100.00%) bahasa Vietnam 2 4 (90.77%)
bahasa Batak Simalungun 3 4 (40.00%) bahasa Tamiang 2 2 (50.00%)
bahasa Batak Toba 2 2 (75.00%) bahasa Gorontalo 1 2 (100.00%)
bahasa Belanda 3 4 (66.67%) bahasa Portugis 1 0 (0.00%)
bahasa Berawan 1 2 (100.00%) bahasa Sunda kuno 1 2 (100.00%)
bahasa Betawi 1 2 (100.00%) bahasa Prancis 1 0 (0.00%)
bahasa Bugis 1 2 (100.00%) bahasa Bugis 1 2 (100.00%)
bahasa Bunun 1 0 (0.00%) bahasa Makassar 1 2 (100.00%)
bahasa Cham Timur 2 2 (83.33%) bahasa Kangean 1 0 (0.00%)
bahasa Esperanto 1 2 (100.00%) bahasa Nias 1 2 (100.00%)
bahasa Galisia 1 0 (0.00%) bahasa Palembang 1 2 (100.00%)
bahasa Gorontalo 1 2 (100.00%) bahasa Madura 1 2 (100.00%)
bahasa Gorontalo ( dalam Bahasa Belanda ) 1 2 (100.00%) bahasa Betawi 1 2 (100.00%)
bahasa Hakka 1 10 (100.00%) bahasa Badui 1 2 (100.00%)
bahasa Indonesia 17 22 (3.84%) bahasa Melayu Tengah 1 4 (100.00%)
bahasa Indonesia Peranakan 11 4 (1.83%) bahasa Musi 1 2 (100.00%)
bahasa Inggris 2 2 (90.00%) bahasa Berawan 1 2 (100.00%)
bahasa Italia 1 0 (0.00%) bahasa Arab 1 2 (100.00%)
bahasa Jawa 4 2 (64.71%) bahasa Afrikaans 1 0 (0.00%)
bahasa Jawa kuno 2 4 (40.00%) bahasa Italia 1 0 (0.00%)
bahasa Jepang 3 20 (94.34%) bahasa Esperanto 1 2 (100.00%)
bahasa Jepang Kuno 1 2 (100.00%) bahasa Lampung Api 1 2 (100.00%)
bahasa Jepang lama 1 2 (100.00%) bahasa Okinawa 1 2 (100.00%)
bahasa Kanakanabu 2 2 (80.00%) bahasa Hakka 1 10 (100.00%)
bahasa Kangean 1 0 (0.00%) bahasa Kerinci 1 0 (0.00%)
bahasa Kavalan 1 0 (0.00%) bahasa Polandia 1 0 (0.00%)
bahasa Kendayan 1 2 (100.00%) bahasa Batak Mandailing 1 2 (100.00%)
bahasa Kerinci 1 0 (0.00%) bahasa Tsou 1 0 (0.00%)
bahasa Kimaragang 1 2 (100.00%) bahasa Bahnar 1 2 (100.00%)
bahasa Korea 1 4 (100.00%) bahasa Yami 1 0 (0.00%)
bahasa Kristang 1 2 (100.00%) bahasa Galisia 1 0 (0.00%)
bahasa Lampung Api 1 2 (100.00%) bahasa Zazaki 1 2 (100.00%)
bahasa Madura 1 2 (100.00%) bahasa Amis 1 0 (0.00%)
bahasa Makassar 1 2 (100.00%) bahasa Paiwan 1 2 (100.00%)
bahasa Melayu 5 2 (83.91%) bahasa Kavalan 1 0 (0.00%)
bahasa Melayu Pontianak 1 2 (100.00%) bahasa Bunun 1 0 (0.00%)
bahasa Melayu Tengah 1 4 (100.00%) bahasa Kendayan 1 2 (100.00%)
bahasa Minangkabau 2 2 (99.52%) bahasa Melayu Pontianak 1 2 (100.00%)
bahasa Musi 1 2 (100.00%) Bahasa Jepang 1 0 (0.00%)
bahasa Nias 1 2 (100.00%) bahasa Rukai 1 2 (100.00%)
bahasa Okinawa 1 2 (100.00%) bahasa Kimaragang 1 2 (100.00%)
bahasa Paiwan 1 2 (100.00%) GHWOSMXbahasa Indonesia 1 0 (0.00%)
bahasa Palembang 1 2 (100.00%) Bahasa Indonesia 1 0 (0.00%)
bahasa Polandia 1 0 (0.00%) bahasa Korea 1 4 (100.00%)
bahasa Portugis 1 0 (0.00%) Lintas bahasa 1 0 (0.00%)
bahasa Prancis 1 0 (0.00%) Bahasa Vietnam 1 26 (100.00%)
bahasa Rukai 1 2 (100.00%) Arti 1 4 (100.00%)
bahasa Rusia 1 2 (100.00%) bahasa Armenia 1 0 (0.00%)
bahasa Sanskerta 1 4 (100.00%) bahasa Rusia 1 2 (100.00%)
bahasa Sunda 2 4 (96.00%) bahasa Turkmen 1 0 (0.00%)
bahasa Sunda kuno 1 2 (100.00%) :Templat:maanyan siong 1 2 (100.00%)
bahasa Tamiang 2 2 (50.00%) :Templat:samihim 1 2 (100.00%)
bahasa Tetun 2 2 (50.00%) bahasa Sanskerta 1 4 (100.00%)
bahasa Tionghoa 2 2 (90.00%) :Templat:dusun balangan 1 2 (100.00%)
bahasa Tsou 1 0 (0.00%) bahasa Gorontalo ( dalam Bahasa Belanda ) 1 2 (100.00%)
bahasa Turkmen 1 0 (0.00%) bahasa Jepang lama 1 2 (100.00%)
bahasa Urak Lawoi' 1 2 (100.00%) bahasa Jepang Kuno 1 2 (100.00%)
bahasa Vietnam 2 4 (90.77%) bahasa Kristang 1 2 (100.00%)
bahasa Yami 1 0 (0.00%) :Templat:Lampung Api 1 2 (100.00%)
bahasa Zazaki 1 2 (100.00%) bahasa indonesia 1 0 (0.00%)
bahasa indonesia 1 0 (0.00%) bahasa Urak Lawoi' 1 2 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-07-20 from the idwiktionary dump dated 2025-07-02 using wiktextract (45c4a21 and f1c2b61). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.