Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Abanyom 2 0 (0.00%) Français 35541 11270 (45.62%)
Afrikaans 14 2 (1.11%) Espagnol 3041 180724 (57.74%)
Akan 1 0 (0.00%) Japonais 1007 41758 (23.42%)
Albanais 11 36 (51.35%) Breton 857 42520 (7.63%)
Allemand 115 230 (10.01%) Allemand 115 230 (10.01%)
Alémanique alsacien 3 0 (0.00%) Latin 114 16 (0.27%)
Amami du Nord 2 0 (0.00%) Tchèque 96 10 (0.14%)
Amami du Sud 2 0 (0.00%) Suédois 95 14 (0.03%)
Ancien français 59 8 (4.88%) Russe 84 330 (87.62%)
Ancien occitan 5 2 (12.93%) Roumain 80 120 (40.71%)
Angevin 9 0 (0.00%) Néerlandais 74 42 (1.41%)
Anglais 63 132 (0.28%) Espéranto 66 40 (22.68%)
Anglo-normand 1 0 (0.00%) Anglais 63 132 (0.28%)
Arabe 13 52 (8.91%) Occitan 63 198 (31.57%)
Arabe judéo-tripolitain 21 0 (0.00%) Ancien français 59 8 (4.88%)
Arabe marocain 3 2 (0.00%) Luxembourgeois 58 168 (3.72%)
Aragonais 2 0 (0.00%) Italien 53 12 (53.53%)
Arménien 5 0 (0.00%) Polonais 52 8 (0.60%)
Asturien 7 4 (0.00%) Gallois 52 26 (42.91%)
Azéri 8 6 (10.17%) Gaélique irlandais 44 30 (6.41%)
Bambara 1 0 (0.00%) Slovaque 43 30 (6.17%)
Baoulé 2 0 (0.00%) Same du Nord 43 42 (4.33%)
Bas-sorabe 4 0 (0.00%) Portugais 42 52 (63.09%)
Basque 6 10 (97.16%) Danois 39 56 (61.22%)
Bengali 1 0 (0.00%) Lituanien 36 4 (0.00%)
Berrichon 1 0 (0.00%) Kotava 34 0 (0.00%)
Biélorusse 5 54 (100.00%) Grec ancien 34 2 (0.14%)
Bosniaque 2 0 (0.00%) Letton 33 0 (0.00%)
Brabançon 10 0 (0.00%) Norvégien (bokmål) 32 0 (0.00%)
Breton 857 42520 (7.63%) Slovène 31 4 (0.05%)
Bulgare 12 24 (20.56%) Hindi 30 0 (0.00%)
Catalan 20 152 (24.37%) Finnois 27 798 (32.50%)
Chakali 2 0 (0.00%) Vieux slave 27 8 (1.93%)
Champenois 2 0 (0.00%) Gallo 26 2 (0.14%)
Chaoui 1 2 (100.00%) Limbourgeois 25 16 (0.00%)
Cherokee 6 0 (0.00%) Islandais 25 170 (0.14%)
Chinois 6 2 (0.07%) Hébreu 23 22 (10.10%)
Chleuh 2 0 (0.00%) Flamand occidental 22 8 (0.82%)
Cornique 7 0 (0.00%) Grec 22 28 (0.39%)
Corse 8 0 (0.00%) Norvégien 21 34 (63.99%)
Coréen 3 38 (91.42%) Ukrainien 21 46 (54.56%)
Croate 3 0 (0.00%) Arabe judéo-tripolitain 21 0 (0.00%)
Créole du Cap-Vert 11 12 (2.74%) Norvégien (nynorsk) 20 0 (0.00%)
Créole guadeloupéen 2 0 (0.00%) Catalan 20 152 (24.37%)
Créole haïtien 2 0 (0.00%) Moyen français 19 16 (54.29%)
Créole indo-portugais 2 0 (0.00%) Mongol 15 40 (63.04%)
Créole martiniquais 2 0 (0.00%) Afrikaans 14 2 (1.11%)
Créole réunionnais 2 0 (0.00%) Arabe 13 52 (8.91%)
Danois 39 56 (61.22%) Hongrois 12 2 (0.00%)
Dioula 1 0 (0.00%) Normand 12 2 (0.89%)
Douala 3 2 (55.56%) Bulgare 12 24 (20.56%)
Espagnol 3041 180724 (57.74%) Franc-comtois 11 18 (20.19%)
Espéranto 66 40 (22.68%) Albanais 11 36 (51.35%)
Estonien 1 0 (0.00%) Créole du Cap-Vert 11 12 (2.74%)
Fe’fe’ 1 4 (100.00%) Brabançon 10 0 (0.00%)
Finnois 27 798 (32.50%) Galicien 10 16 (7.60%)
Flamand occidental 22 8 (0.82%) Angevin 9 0 (0.00%)
Flamand oriental 2 0 (0.00%) Tourangeau 9 2 (0.52%)
Franc-comtois 11 18 (20.19%) Turc 9 2 (0.61%)
Francique rhénan 1 0 (0.00%) Volapük réformé 9 82 (92.31%)
Francique ripuaire 2 0 (0.00%) Azéri 8 6 (10.17%)
Francoprovençal 6 0 (0.00%) Corse 8 0 (0.00%)
Français 35541 11270 (45.62%) Okinawaïen 8 4 (0.70%)
Frison 3 72 (92.31%) Cornique 7 0 (0.00%)
Féroïen 5 0 (0.00%) Vénitien 7 0 (0.00%)
Galicien 10 16 (7.60%) Asturien 7 4 (0.00%)
Gallo 26 2 (0.14%) Vieil anglais 7 0 (0.00%)
Gallo-italique de Sicile 2 0 (0.00%) Nahuatl classique 7 2 (0.61%)
Gallois 52 26 (42.91%) Peul 7 0 (0.00%)
Gaulois 2 0 (0.00%) Chinois 6 2 (0.07%)
Gaélique irlandais 44 30 (6.41%) Basque 6 10 (97.16%)
Gotique 4 2 (1.50%) Francoprovençal 6 0 (0.00%)
Grec 22 28 (0.39%) Hunsrik 6 0 (0.00%)
Grec ancien 34 2 (0.14%) Lingala 6 48 (99.37%)
Grec cargésien 3 2 (16.67%) Serbe 6 2 (58.33%)
Griko 2 0 (0.00%) Cherokee 6 0 (0.00%)
Haoussa 2 0 (0.00%) Temné 6 0 (0.00%)
Hassanya 2 0 (0.00%) Féroïen 5 0 (0.00%)
Hindi 30 0 (0.00%) Picard 5 2 (4.00%)
Hindoustani caribéen 2 0 (0.00%) Ancien occitan 5 2 (12.93%)
Hongrois 12 2 (0.00%) Ido 5 2 (62.20%)
Hunsrik 6 0 (0.00%) Sicilien 5 2 (44.35%)
Hébreu 23 22 (10.10%) Oki-no-erabu 5 0 (0.00%)
Hébreu ancien 3 24 (99.90%) Serbo-croate 5 2 (11.11%)
Ido 5 2 (62.20%) Quenya 5 136 (100.00%)
Ik 2 0 (0.00%) Moyen breton 5 0 (0.00%)
Indonésien 2 0 (0.00%) Vieux norrois 5 0 (0.00%)
Interlingua 3 0 (0.00%) Toku-no-shima 5 0 (0.00%)
Inuktitut 3 6 (99.80%) Biélorusse 5 54 (100.00%)
Islandais 25 170 (0.14%) Arménien 5 0 (0.00%)
Istro-roumain 3 2 (5.56%) Vieil okinawaïen 5 14 (100.00%)
Italien 53 12 (53.53%) Salentin 4 2 (16.67%)
Japonais 1007 41758 (23.42%) Tatare 4 24 (83.33%)
Judéo-espagnol 3 0 (0.00%) Bas-sorabe 4 0 (0.00%)
Kabyle 1 0 (0.00%) Nahuatl central 4 10 (3.57%)
Kachoube 3 0 (0.00%) Kunigami 4 0 (0.00%)
Kazakh 1 0 (0.00%) Kikaï 4 2 (25.00%)
Kikaï 4 2 (25.00%) Xhosa 4 32 (0.00%)
Kikuyu 2 0 (0.00%) Gotique 4 2 (1.50%)
Kinyarwanda 2 30 (94.74%) Coréen 3 38 (91.42%)
Kirghiz 2 0 (0.00%) Scots 3 12 (18.84%)
Kotava 34 0 (0.00%) Frison 3 72 (92.31%)
Koyukon 3 102 (0.00%) Alémanique alsacien 3 0 (0.00%)
Kunigami 4 0 (0.00%) Interlingua 3 0 (0.00%)
Kurde 1 0 (0.00%) Judéo-espagnol 3 0 (0.00%)
Lacandon 2 0 (0.00%) Yonaguni 3 0 (0.00%)
Latin 114 16 (0.27%) Mirandais 3 4 (20.00%)
Letton 33 0 (0.00%) Koyukon 3 102 (0.00%)
Limbourgeois 25 16 (0.00%) Grec cargésien 3 2 (16.67%)
Lingala 6 48 (99.37%) Nǀu 3 0 (0.00%)
Lituanien 36 4 (0.00%) Inuktitut 3 6 (99.80%)
Lorrain 2 0 (0.00%) Croate 3 0 (0.00%)
Luxembourgeois 58 168 (3.72%) Istro-roumain 3 2 (5.56%)
Macédonien 3 10 (14.29%) Macédonien 3 10 (14.29%)
Malgache 1 0 (0.00%) Moyen anglais 3 0 (0.00%)
Maltais 2 0 (0.00%) Kachoube 3 0 (0.00%)
Mannois 1 8 (100.00%) Tamoul 3 0 (0.00%)
Marathe 1 0 (0.00%) Arabe marocain 3 2 (0.00%)
Mbochi 1 2 (100.00%) Douala 3 2 (55.56%)
Micmac 1 2 (100.00%) Oubykh 3 2 (82.35%)
Mirandais 3 4 (20.00%) Yupik central 3 102 (70.59%)
Miyako 2 0 (0.00%) Shimaoré 3 936 (100.00%)
Mongol 15 40 (63.04%) Hébreu ancien 3 24 (99.90%)
Monégasque 2 0 (0.00%) Yoron 3 0 (0.00%)
Moré 2 0 (0.00%) Vietnamien 2 0 (0.00%)
Moyen anglais 3 0 (0.00%) Haoussa 2 0 (0.00%)
Moyen breton 5 0 (0.00%) Créole réunionnais 2 0 (0.00%)
Moyen français 19 16 (54.29%) Romanche 2 2 (66.67%)
Moyen okinawaïen 2 4 (100.00%) Indonésien 2 0 (0.00%)
Nahuatl central 4 10 (3.57%) Lacandon 2 0 (0.00%)
Nahuatl classique 7 2 (0.61%) Créole haïtien 2 0 (0.00%)
Nahuatl de Guerrero 2 0 (0.00%) Gallo-italique de Sicile 2 0 (0.00%)
Nahuatl de la Huasteca central 2 0 (0.00%) Gaulois 2 0 (0.00%)
Nahuatl de la Huasteca occidental 1 0 (0.00%) Bosniaque 2 0 (0.00%)
Nahuatl de la Huasteca oriental 1 0 (0.00%) Baoulé 2 0 (0.00%)
Nahuatl de l’Orizaba 2 0 (0.00%) Chleuh 2 0 (0.00%)
Nde-nsele-nta 2 0 (0.00%) Créole martiniquais 2 0 (0.00%)
Normand 12 2 (0.89%) Griko 2 0 (0.00%)
Norvégien 21 34 (63.99%) Kinyarwanda 2 30 (94.74%)
Norvégien (bokmål) 32 0 (0.00%) Flamand oriental 2 0 (0.00%)
Norvégien (nynorsk) 20 0 (0.00%) Champenois 2 0 (0.00%)
Nuuchahnulth 2 0 (0.00%) Aragonais 2 0 (0.00%)
Néerlandais 74 42 (1.41%) Chakali 2 0 (0.00%)
Nǀu 3 0 (0.00%) Solrésol 2 0 (0.00%)
Occitan 63 198 (31.57%) Créole indo-portugais 2 0 (0.00%)
Oki-no-erabu 5 0 (0.00%) Créole guadeloupéen 2 0 (0.00%)
Okinawaïen 8 4 (0.70%) Kikuyu 2 0 (0.00%)
Oneida 1 0 (0.00%) Sango 2 2 (28.69%)
Ossète 1 0 (0.00%) Miyako 2 0 (0.00%)
Oubykh 3 2 (82.35%) Yaeyama 2 0 (0.00%)
Ourdou 2 0 (0.00%) Lorrain 2 0 (0.00%)
Pandunia 2 0 (0.00%) Persan 2 0 (0.00%)
Persan 2 0 (0.00%) Ourdou 2 0 (0.00%)
Peul 7 0 (0.00%) Francique ripuaire 2 0 (0.00%)
Picard 5 2 (4.00%) Maltais 2 0 (0.00%)
Pitcairnais 1 10 (100.00%) Nahuatl de la Huasteca central 2 0 (0.00%)
Poitevin-saintongeais 1 0 (0.00%) Pandunia 2 0 (0.00%)
Polonais 52 8 (0.60%) Tupi 2 2 (20.00%)
Portugais 42 52 (63.09%) Slave molisan 2 0 (0.00%)
Quechua de Cuzco 1 0 (0.00%) Tanjijili 2 0 (0.00%)
Quenya 5 136 (100.00%) Amami du Sud 2 0 (0.00%)
Romanche 2 2 (66.67%) Monégasque 2 0 (0.00%)
Roumain 80 120 (40.71%) Nahuatl de l’Orizaba 2 0 (0.00%)
Russe 84 330 (87.62%) Moré 2 0 (0.00%)
Salentin 4 2 (16.67%) Soussou 2 0 (0.00%)
Same du Nord 43 42 (4.33%) Kirghiz 2 0 (0.00%)
Sango 2 2 (28.69%) Abanyom 2 0 (0.00%)
Sanskrit 1 0 (0.00%) Hindoustani caribéen 2 0 (0.00%)
Scots 3 12 (18.84%) Amami du Nord 2 0 (0.00%)
Serbe 6 2 (58.33%) Ik 2 0 (0.00%)
Serbo-croate 5 2 (11.11%) Nahuatl de Guerrero 2 0 (0.00%)
Shimaoré 3 936 (100.00%) Nuuchahnulth 2 0 (0.00%)
Sicilien 5 2 (44.35%) Étrusque 2 0 (0.00%)
Slave molisan 2 0 (0.00%) Hassanya 2 0 (0.00%)
Slovaque 43 30 (6.17%) Nde-nsele-nta 2 0 (0.00%)
Slovène 31 4 (0.05%) Moyen okinawaïen 2 4 (100.00%)
Solrésol 2 0 (0.00%) Bambara 1 0 (0.00%)
Soussou 2 0 (0.00%) Kurde 1 0 (0.00%)
Suédois 95 14 (0.03%) Malgache 1 0 (0.00%)
Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Poitevin-saintongeais 1 0 (0.00%)
Tadjik 1 0 (0.00%) Estonien 1 0 (0.00%)
Tamazight du Maroc central 1 0 (0.00%) Tatar de Crimée 1 2 (100.00%)
Tamoul 3 0 (0.00%) Wallisien 1 0 (0.00%)
Tanjijili 2 0 (0.00%) Éwé 1 0 (0.00%)
Tatar de Crimée 1 2 (100.00%) Francique rhénan 1 0 (0.00%)
Tatare 4 24 (83.33%) Vieil irlandais 1 8 (100.00%)
Tchèque 96 10 (0.14%) Wallon 1 0 (0.00%)
Temné 6 0 (0.00%) Dioula 1 0 (0.00%)
Toku-no-shima 5 0 (0.00%) Anglo-normand 1 0 (0.00%)
Tourangeau 9 2 (0.52%) Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Tupi 2 2 (20.00%) Quechua de Cuzco 1 0 (0.00%)
Turc 9 2 (0.61%) Chaoui 1 2 (100.00%)
Télougou 1 0 (0.00%) Marathe 1 0 (0.00%)
Ukrainien 21 46 (54.56%) Mannois 1 8 (100.00%)
Vieil anglais 7 0 (0.00%) Sanskrit 1 0 (0.00%)
Vieil irlandais 1 8 (100.00%) Pitcairnais 1 10 (100.00%)
Vieil okinawaïen 5 14 (100.00%) Télougou 1 0 (0.00%)
Vietnamien 2 0 (0.00%) Bengali 1 0 (0.00%)
Vieux norrois 5 0 (0.00%) Ossète 1 0 (0.00%)
Vieux polonais 1 0 (0.00%) Akan 1 0 (0.00%)
Vieux slave 27 8 (1.93%) Kazakh 1 0 (0.00%)
Volapük réformé 9 82 (92.31%) Kabyle 1 0 (0.00%)
Vénitien 7 0 (0.00%) Micmac 1 2 (100.00%)
Wallisien 1 0 (0.00%) Mbochi 1 2 (100.00%)
Wallon 1 0 (0.00%) Tadjik 1 0 (0.00%)
Xhosa 4 32 (0.00%) Vieux polonais 1 0 (0.00%)
Yaeyama 2 0 (0.00%) Tamazight du Maroc central 1 0 (0.00%)
Yonaguni 3 0 (0.00%) Nahuatl de la Huasteca occidental 1 0 (0.00%)
Yoron 3 0 (0.00%) Berrichon 1 0 (0.00%)
Yupik central 3 102 (70.59%) Oneida 1 0 (0.00%)
Étrusque 2 0 (0.00%) Nahuatl de la Huasteca oriental 1 0 (0.00%)
Éwé 1 0 (0.00%) Fe’fe’ 1 4 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2024-12-21 from the frwiktionary dump dated 2024-12-20 using wiktextract (d8cb2f3 and 4e554ae). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.