Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Afrikaans 3 6 (10.53%) Deutsch 28210 0 (0.00%)
Akkadisch 4 150 (86.67%) Altgriechisch 75 608 (20.15%)
Albanisch 1 44 (100.00%) Latein 44 256 (3.70%)
Altenglisch 4 36 (30.19%) Prußisch 34 330 (92.89%)
Altfranzösisch 1 0 (0.00%) Schwedisch 29 72 (0.00%)
Altgriechisch 75 608 (20.15%) Italienisch 20 28 (0.01%)
Althochdeutsch 2 0 (0.00%) Niederländisch 19 122 (9.01%)
Altirisch 3 0 (0.00%) Armenisch 18 134 (3.69%)
Altnordisch 2 0 (0.00%) Südpikenisch 13 0 (0.00%)
Arabisch 2 76 (98.54%) Gotisch 13 28 (54.37%)
Armenisch 18 134 (3.69%) Polnisch 12 8 (0.03%)
Aserbaidschanisch 2 0 (0.00%) Französisch 12 134 (10.29%)
Asturisch 2 0 (0.00%) Okzitanisch 12 18 (0.05%)
Bairisch 1 0 (0.00%) Niederdeutsch 10 56 (66.94%)
Baschkirisch 2 28 (100.00%) Niedersorbisch 10 10 (0.38%)
Baskisch 4 42 (100.00%) Dänisch 9 40 (15.48%)
Bosnisch 4 0 (0.00%) Isländisch 9 50 (11.43%)
Bretonisch 2 0 (0.00%) Finnisch 9 82 (5.02%)
Bulgarisch 1 18 (100.00%) Irisch 9 4 (3.54%)
Deutsch 28210 0 (0.00%) Serbisch 9 134 (1.39%)
Durango-Nahuatl 1 0 (0.00%) Englisch 8 16 (17.16%)
Dänisch 9 40 (15.48%) Neugriechisch 8 64 (5.22%)
Englisch 8 16 (17.16%) Ungarisch 8 52 (80.65%)
Esperanto 5 8 (0.00%) Russisch 8 36 (0.00%)
Estnisch 1 22 (100.00%) Kroatisch 8 14 (0.00%)
Finnisch 9 82 (5.02%) Obersorbisch 8 16 (0.00%)
Französisch 12 134 (10.29%) Spanisch 7 2 (12.55%)
Friaulisch 3 0 (0.00%) Norwegisch 7 18 (20.19%)
Frühneuhochdeutsch 5 0 (0.00%) Katalanisch 7 42 (3.85%)
Fulfulde 1 0 (0.00%) Marsisch 7 0 (0.00%)
Färöisch 5 62 (89.47%) Portugiesisch 6 40 (1.44%)
Galicisch 3 0 (0.00%) Slowakisch 6 0 (0.00%)
Georgisch 4 76 (87.66%) Walisisch 6 0 (0.00%)
Gotisch 13 28 (54.37%) Slowenisch 6 2 (0.00%)
Guaraní 1 0 (0.00%) Kurdisch 6 38 (80.54%)
Guerrero-Nahuatl 1 0 (0.00%) Türkisch 6 136 (86.55%)
Hausa 2 12 (3.51%) Volskisch 6 0 (0.00%)
Hebräisch 1 12 (100.00%) Klassisches Nahuatl 6 0 (0.00%)
Hethitisch 6 0 (0.00%) Zentral-Nahuatl 6 0 (0.00%)
Hindi 2 0 (0.00%) Hethitisch 6 0 (0.00%)
Huastekisches Ost-Nahuatl 1 0 (0.00%) Tschechisch 5 0 (0.00%)
Huastekisches West-Nahuatl 1 0 (0.00%) Esperanto 5 8 (0.00%)
Huastekisches Zentral-Nahuatl 5 0 (0.00%) Luxemburgisch 5 96 (9.09%)
Hurritisch 1 0 (0.00%) Färöisch 5 62 (89.47%)
Ido 4 20 (43.89%) Litauisch 5 0 (0.00%)
Interlingua 1 0 (0.00%) Huastekisches Zentral-Nahuatl 5 0 (0.00%)
Interlingue 1 0 (0.00%) Scots 5 16 (17.65%)
Irisch 9 4 (3.54%) Weißrussisch 5 0 (0.00%)
Isländisch 9 50 (11.43%) Suaheli 5 38 (31.78%)
Italienisch 20 28 (0.01%) Mazedonisch 5 0 (0.00%)
Jamaika-Kreolisch 4 6 (20.00%) Urdu 5 72 (5.24%)
Jiddisch 3 0 (0.00%) Ukrainisch 5 0 (0.00%)
Kasachisch 1 14 (100.00%) Frühneuhochdeutsch 5 0 (0.00%)
Kaschubisch 2 0 (0.00%) Umbrisch 5 0 (0.00%)
Katalanisch 7 42 (3.85%) Ido 4 20 (43.89%)
Kirgisisch 1 14 (100.00%) Westfriesisch 4 54 (29.36%)
Klassisches Nahuatl 6 0 (0.00%) Baskisch 4 42 (100.00%)
Klassisches Nahuatl‎ 3 0 (0.00%) Paschtu 4 54 (1.68%)
Komi 1 0 (0.00%) Sardisch 4 0 (0.00%)
Komorisch 1 0 (0.00%) Bosnisch 4 0 (0.00%)
Korsisch 1 0 (0.00%) Georgisch 4 76 (87.66%)
Kotava 1 2 (100.00%) Venezianisch 4 0 (0.00%)
Krimtatarisch 1 0 (0.00%) Oskisch 4 0 (0.00%)
Kroatisch 8 14 (0.00%) Akkadisch 4 150 (86.67%)
Kurdisch 6 38 (80.54%) Altenglisch 4 36 (30.19%)
Latein 44 256 (3.70%) Jamaika-Kreolisch 4 6 (20.00%)
Lettisch 1 32 (100.00%) Vestinisch 4 0 (0.00%)
Litauisch 5 0 (0.00%) Usbekisch 3 48 (0.00%)
Luxemburgisch 5 96 (9.09%) Galicisch 3 0 (0.00%)
Maltesisch 1 18 (100.00%) Friaulisch 3 0 (0.00%)
Marsisch 7 0 (0.00%) Afrikaans 3 6 (10.53%)
Mazedonisch 5 0 (0.00%) Persisch 3 0 (0.00%)
Mezquital-Otomi 1 0 (0.00%) Altirisch 3 0 (0.00%)
Mittelgriechisch 1 0 (0.00%) Serbokroatisch 3 14 (0.00%)
Mittelhochdeutsch 2 0 (0.00%) Tetelcingo-Nahuatl 3 0 (0.00%)
Mongolisch 1 6 (100.00%) Temascaltepec-Nahuatl 3 0 (0.00%)
Nauruisch 1 0 (0.00%) Jiddisch 3 0 (0.00%)
Nepalesisch 1 12 (0.00%) Klassisches Nahuatl‎ 3 0 (0.00%)
Neugriechisch 8 64 (5.22%) Sesotho 3 0 (0.00%)
Niederdeutsch 10 56 (66.94%) Sindhi 3 24 (0.00%)
Niederländisch 19 122 (9.01%) Altnordisch 2 0 (0.00%)
Niedersorbisch 10 10 (0.38%) Rumänisch 2 92 (85.03%)
Nord-Sotho 1 0 (0.00%) Asturisch 2 0 (0.00%)
Norwegisch 7 18 (20.19%) Hausa 2 12 (3.51%)
Novial 1 0 (0.00%) Mittelhochdeutsch 2 0 (0.00%)
Obersorbisch 8 16 (0.00%) Kaschubisch 2 0 (0.00%)
Okzitanisch 12 18 (0.05%) Shona 2 0 (0.00%)
Orizaba-Nahuatl 2 0 (0.00%) Bretonisch 2 0 (0.00%)
Oskisch 4 0 (0.00%) Arabisch 2 76 (98.54%)
Papiamentu 2 0 (0.00%) Aserbaidschanisch 2 0 (0.00%)
Paschtu 4 54 (1.68%) Althochdeutsch 2 0 (0.00%)
Persisch 3 0 (0.00%) Hindi 2 0 (0.00%)
Polabisch 2 2 (0.00%) Papiamentu 2 0 (0.00%)
Polnisch 12 8 (0.03%) Volapük 2 0 (0.00%)
Portugiesisch 6 40 (1.44%) Baschkirisch 2 28 (100.00%)
Prußisch 34 330 (92.89%) Tadschikisch 2 0 (0.00%)
Rumänisch 2 92 (85.03%) Samoanisch 2 0 (0.00%)
Russisch 8 36 (0.00%) isiZulu 2 0 (0.00%)
Rätoromanisch 2 0 (0.00%) Rätoromanisch 2 0 (0.00%)
Samoanisch 2 0 (0.00%) West-Pandschabi 2 36 (4.46%)
Sanskrit 1 52 (100.00%) Orizaba-Nahuatl 2 0 (0.00%)
Sardisch 4 0 (0.00%) Polabisch 2 2 (0.00%)
Schottisch-Gälisch 1 32 (100.00%) Estnisch 1 22 (100.00%)
Schwedisch 29 72 (0.00%) Interlingua 1 0 (0.00%)
Scots 5 16 (17.65%) Interlingue 1 0 (0.00%)
Serbisch 9 134 (1.39%) Bulgarisch 1 18 (100.00%)
Serbokroatisch 3 14 (0.00%) Albanisch 1 44 (100.00%)
Sesotho 3 0 (0.00%) Maltesisch 1 18 (100.00%)
Shona 2 0 (0.00%) Krimtatarisch 1 0 (0.00%)
Sindarin 1 0 (0.00%) Lettisch 1 32 (100.00%)
Sindhi 3 24 (0.00%) Huastekisches Ost-Nahuatl 1 0 (0.00%)
Sizilianisch 1 0 (0.00%) Tagalog 1 6 (100.00%)
Slowakisch 6 0 (0.00%) Nauruisch 1 0 (0.00%)
Slowenisch 6 2 (0.00%) Tok Pisin 1 0 (0.00%)
Spanisch 7 2 (12.55%) Schottisch-Gälisch 1 32 (100.00%)
Suaheli 5 38 (31.78%) Tuvaluisch 1 2 (0.00%)
Sumerisch 1 20 (100.00%) Westflämisch 1 4 (0.00%)
Südpikenisch 13 0 (0.00%) Hebräisch 1 12 (100.00%)
Tadschikisch 2 0 (0.00%) Turkmenisch 1 12 (0.00%)
Tagalog 1 6 (100.00%) Altfranzösisch 1 0 (0.00%)
Tatarisch 1 12 (0.00%) Tatarisch 1 12 (0.00%)
Temascaltepec-Nahuatl 3 0 (0.00%) Kasachisch 1 14 (100.00%)
Tetelcingo-Nahuatl 3 0 (0.00%) Kirgisisch 1 14 (100.00%)
Tok Pisin 1 0 (0.00%) Komi 1 0 (0.00%)
Tschechisch 5 0 (0.00%) Mongolisch 1 6 (100.00%)
Turkmenisch 1 12 (0.00%) Huastekisches West-Nahuatl 1 0 (0.00%)
Tuvaluisch 1 2 (0.00%) Sizilianisch 1 0 (0.00%)
Twi 1 0 (0.00%) Zentrales Puebla-Nahuatl 1 0 (0.00%)
Türkisch 6 136 (86.55%) Korsisch 1 0 (0.00%)
Ukrainisch 5 0 (0.00%) Nord-Sotho 1 0 (0.00%)
Umbrisch 5 0 (0.00%) Sindarin 1 0 (0.00%)
Ungarisch 8 52 (80.65%) Guaraní 1 0 (0.00%)
Urdu 5 72 (5.24%) Sumerisch 1 20 (100.00%)
Usbekisch 3 48 (0.00%) Hurritisch 1 0 (0.00%)
Venezianisch 4 0 (0.00%) Bairisch 1 0 (0.00%)
Vestinisch 4 0 (0.00%) Nepalesisch 1 12 (0.00%)
Volapük 2 0 (0.00%) Twi 1 0 (0.00%)
Volskisch 6 0 (0.00%) Novial 1 0 (0.00%)
Walisisch 6 0 (0.00%) Sanskrit 1 52 (100.00%)
Weißrussisch 5 0 (0.00%) Mittelgriechisch 1 0 (0.00%)
West-Pandschabi 2 36 (4.46%) Mezquital-Otomi 1 0 (0.00%)
Westflämisch 1 4 (0.00%) Guerrero-Nahuatl 1 0 (0.00%)
Westfriesisch 4 54 (29.36%) Durango-Nahuatl 1 0 (0.00%)
Zentral-Nahuatl 6 0 (0.00%) Kotava 1 2 (100.00%)
Zentrales Puebla-Nahuatl 1 0 (0.00%) Komorisch 1 0 (0.00%)
isiZulu 2 0 (0.00%) Fulfulde 1 0 (0.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2024-05-30 from the dewiktionary dump dated 2024-05-02 using wiktextract (91e95e7 and db5a844). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.