Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
:Templat:Lampung Api 1 2 (100.00%) bahasa Indonesia 17 22 (3.79%)
:Templat:batak toba 1 2 (100.00%) bahasa Indonesia Peranakan 10 4 (1.79%)
:Templat:dusun balangan 1 2 (100.00%) bahasa Melayu 4 4 (69.05%)
:Templat:jawa ngapak 1 2 (100.00%) bahasa Inggris 3 2 (77.78%)
:Templat:jawa ngoko 1 2 (100.00%) bahasa Banjar 3 2 (40.00%)
:Templat:maanyan siong 1 2 (100.00%) bahasa Jepang 3 20 (94.62%)
:Templat:palembang ogan 1 2 (100.00%) Bahasa Jawa 3 2 (80.00%)
:Templat:samihim 1 2 (100.00%) bahasa Belanda 3 4 (80.00%)
:Templat:semendo 1 2 (100.00%) bahasa Batak Simalungun 3 4 (40.00%)
Bahasa Belanda 1 4 (100.00%) Bahasa Indonesia 3 0 (0.00%)
Bahasa Indonesia 3 0 (0.00%) bahasa Minangkabau 2 6 (98.88%)
Bahasa Isam 1 6 (100.00%) bahasa Jawa 2 2 (63.64%)
Bahasa Jawa 3 2 (80.00%) bahasa Kangean 2 2 (50.00%)
Bahasa Jepang 1 0 (0.00%) bahasa Italia 2 2 (50.00%)
Bahasa Lampung Api 1 2 (100.00%) bahasa Vietnam 2 4 (90.77%)
Bahasa Melayu 1 2 (100.00%) bahasa Gorontalo 1 4 (100.00%)
Bahasa Minangkabau 1 8 (100.00%) bahasa Portugis 1 0 (0.00%)
Bahasa Using 1 2 (100.00%) bahasa Makassar 1 4 (100.00%)
Bahasa Vietnam 1 6 (100.00%) bahasa Palembang 1 2 (100.00%)
GHWOSMXbahasa Indonesia 1 0 (0.00%) bahasa Madura 1 2 (100.00%)
Lintas bahasa 1 0 (0.00%) bahasa Sunda 1 4 (100.00%)
bahasa Aceh 1 2 (100.00%) bahasa Betawi 1 4 (100.00%)
bahasa Alune 1 2 (100.00%) bahasa Bali 1 2 (100.00%)
bahasa Armenia 1 0 (0.00%) bahasa Kawi 1 8 (100.00%)
bahasa Bahnar 1 2 (100.00%) bahasa Pakpak 1 2 (100.00%)
bahasa Bakumpai 1 4 (100.00%) bahasa Bakumpai 1 4 (100.00%)
bahasa Bali 1 2 (100.00%) bahasa Sunda kuno 1 2 (100.00%)
bahasa Banjar 3 2 (40.00%) bahasa Melayu Tengah 1 4 (100.00%)
bahasa Batak Simalungun 3 4 (40.00%) bahasa Aceh 1 2 (100.00%)
bahasa Batak Toba 1 2 (100.00%) bahasa Lampung Api 1 2 (100.00%)
bahasa Belanda 3 4 (80.00%) bahasa Komering 1 2 (100.00%)
bahasa Betawi 1 4 (100.00%) bahasa Hakka 1 6 (100.00%)
bahasa Brunei 1 2 (100.00%) Bahasa Belanda 1 4 (100.00%)
bahasa Cham Timur 1 2 (100.00%) bahasa Kerinci 1 2 (100.00%)
bahasa Galisia 1 0 (0.00%) bahasa Polandia 1 0 (0.00%)
bahasa Gorontalo 1 4 (100.00%) bahasa Batak Toba 1 2 (100.00%)
bahasa Gorontalo ( dalam Bahasa Belanda ) 1 2 (100.00%) bahasa Galisia 1 0 (0.00%)
bahasa Hakka 1 6 (100.00%) bahasa Malagasi 1 4 (100.00%)
bahasa Hokkien 1 0 (0.00%) Bahasa Isam 1 6 (100.00%)
bahasa Indonesia 17 22 (3.79%) bahasa Brunei 1 2 (100.00%)
bahasa Indonesia Peranakan 10 4 (1.79%) bahasa Musi 1 2 (100.00%)
bahasa Inggris 3 2 (77.78%) Bahasa Lampung Api 1 2 (100.00%)
bahasa Italia 2 2 (50.00%) :Templat:batak toba 1 2 (100.00%)
bahasa Jawa 2 2 (63.64%) bahasa Alune 1 2 (100.00%)
bahasa Jepang 3 20 (94.62%) bahasa Mandailing 1 2 (100.00%)
bahasa Jepang Kuno 1 2 (100.00%) bahasa Hokkien 1 0 (0.00%)
bahasa Jepang lama 1 2 (100.00%) Bahasa Jepang 1 0 (0.00%)
bahasa Kangean 2 2 (50.00%) Bahasa Minangkabau 1 8 (100.00%)
bahasa Kawi 1 8 (100.00%) GHWOSMXbahasa Indonesia 1 0 (0.00%)
bahasa Kerinci 1 2 (100.00%) bahasa Tionghoa 1 0 (0.00%)
bahasa Kimaragang 1 2 (100.00%) bahasa Korea 1 4 (100.00%)
bahasa Komering 1 2 (100.00%) Lintas bahasa 1 0 (0.00%)
bahasa Korea 1 4 (100.00%) Bahasa Vietnam 1 6 (100.00%)
bahasa Kristang 1 2 (100.00%) bahasa Melayu Manado 1 2 (100.00%)
bahasa Lampung Api 1 2 (100.00%) bahasa Armenia 1 0 (0.00%)
bahasa Madura 1 2 (100.00%) bahasa Rusia 1 2 (100.00%)
bahasa Makassar 1 4 (100.00%) bahasa Turkmen 1 0 (0.00%)
bahasa Malagasi 1 4 (100.00%) bahasa Cham Timur 1 2 (100.00%)
bahasa Mandailing 1 2 (100.00%) Bahasa Using 1 2 (100.00%)
bahasa Melayu 4 4 (69.05%) :Templat:maanyan siong 1 2 (100.00%)
bahasa Melayu Manado 1 2 (100.00%) :Templat:samihim 1 2 (100.00%)
bahasa Melayu Tengah 1 4 (100.00%) bahasa Yami 1 0 (0.00%)
bahasa Minangkabau 2 6 (98.88%) Bahasa Melayu 1 2 (100.00%)
bahasa Musi 1 2 (100.00%) bahasa Bahnar 1 2 (100.00%)
bahasa Pakpak 1 2 (100.00%) :Templat:dusun balangan 1 2 (100.00%)
bahasa Palembang 1 2 (100.00%) bahasa Gorontalo ( dalam Bahasa Belanda ) 1 2 (100.00%)
bahasa Polandia 1 0 (0.00%) bahasa Jepang lama 1 2 (100.00%)
bahasa Portugis 1 0 (0.00%) bahasa Jepang Kuno 1 2 (100.00%)
bahasa Rusia 1 2 (100.00%) bahasa Tamiang 1 0 (0.00%)
bahasa Sanskerta 1 4 (100.00%) :Templat:Lampung Api 1 2 (100.00%)
bahasa Sunda 1 4 (100.00%) bahasa indonesia 1 0 (0.00%)
bahasa Sunda kuno 1 2 (100.00%) bahasa Kristang 1 2 (100.00%)
bahasa Tamiang 1 0 (0.00%) bahasa Kimaragang 1 2 (100.00%)
bahasa Tionghoa 1 0 (0.00%) bahasa Sanskerta 1 4 (100.00%)
bahasa Turkmen 1 0 (0.00%) :Templat:jawa ngapak 1 2 (100.00%)
bahasa Vietnam 2 4 (90.77%) :Templat:semendo 1 2 (100.00%)
bahasa Yami 1 0 (0.00%) :Templat:jawa ngoko 1 2 (100.00%)
bahasa indonesia 1 0 (0.00%) :Templat:palembang ogan 1 2 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-12-06 from the idwiktionary dump dated 2025-12-01 using wiktextract (ddb1505 and 9905b1f). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.