Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Abanyom 2 0 (0.00%) Français 35383 11256 (45.94%)
Afrikaans 14 2 (1.11%) Espagnol 2933 173008 (56.91%)
Akan 1 0 (0.00%) Japonais 1005 41560 (23.08%)
Albanais 1 32 (100.00%) Italien 895 15840 (50.71%)
Allemand 105 218 (9.73%) Breton 843 41558 (7.59%)
Alémanique alsacien 3 0 (0.00%) Latin 114 16 (0.29%)
Ancien français 59 8 (4.61%) Allemand 105 218 (9.73%)
Ancien occitan 5 2 (12.93%) Tchèque 96 8 (0.14%)
Angevin 9 0 (0.00%) Suédois 95 14 (0.03%)
Anglais 60 94 (0.22%) Russe 82 322 (87.68%)
Anglo-normand 1 0 (0.00%) Néerlandais 74 42 (1.40%)
Arabe 11 44 (10.38%) Roumain 74 116 (40.59%)
Arabe judéo-tripolitain 21 0 (0.00%) Espéranto 61 40 (22.68%)
Arabe marocain 1 2 (0.00%) Occitan 61 186 (31.44%)
Aragonais 2 0 (0.00%) Anglais 60 94 (0.22%)
Asturien 7 4 (0.00%) Ancien français 59 8 (4.61%)
Azéri 8 2 (10.21%) Luxembourgeois 58 168 (3.73%)
Bambara 1 0 (0.00%) Portugais 54 64 (61.45%)
Baoulé 2 0 (0.00%) Gallois 52 24 (42.91%)
Bas-sorabe 4 0 (0.00%) Polonais 49 8 (0.56%)
Basque 6 10 (97.16%) Slovaque 43 30 (6.36%)
Bengali 1 0 (0.00%) Same du Nord 43 42 (4.44%)
Berrichon 1 0 (0.00%) Danois 39 60 (61.17%)
Biélorusse 5 54 (100.00%) Kotava 34 0 (0.00%)
Bosniaque 2 0 (0.00%) Lituanien 34 4 (0.00%)
Brabançon 10 0 (0.00%) Letton 34 0 (0.00%)
Breton 843 41558 (7.59%) Norvégien (bokmål) 32 0 (0.00%)
Bulgare 12 24 (20.57%) Gaélique irlandais 31 18 (1.80%)
Catalan 20 152 (24.37%) Slovène 31 4 (0.05%)
Chakali 2 0 (0.00%) Grec ancien 31 2 (0.14%)
Champenois 2 0 (0.00%) Moyen français 28 10 (1.47%)
Chaoui 1 2 (100.00%) Finnois 27 798 (32.36%)
Cherokee 6 0 (0.00%) Vieux slave 27 8 (1.93%)
Chinois 1 2 (100.00%) Gallo 26 2 (0.14%)
Chleuh 2 0 (0.00%) Limbourgeois 25 16 (0.00%)
Cornique 7 0 (0.00%) Islandais 25 170 (0.14%)
Corse 8 0 (0.00%) Hébreu 23 14 (9.66%)
Coréen 3 40 (91.34%) Flamand occidental 22 8 (0.82%)
Croate 3 0 (0.00%) Norvégien 21 34 (63.73%)
Créole du Cap-Vert 11 12 (2.74%) Ukrainien 21 46 (54.57%)
Créole guadeloupéen 2 0 (0.00%) Arabe judéo-tripolitain 21 0 (0.00%)
Créole haïtien 2 0 (0.00%) Catalan 20 152 (24.37%)
Créole indo-portugais 2 0 (0.00%) Norvégien (nynorsk) 20 0 (0.00%)
Créole martiniquais 2 0 (0.00%) Grec 18 28 (0.40%)
Créole réunionnais 2 0 (0.00%) Mongol 15 40 (63.04%)
Danois 39 60 (61.17%) Afrikaans 14 2 (1.11%)
Dioula 1 0 (0.00%) Franc-comtois 12 18 (14.42%)
Douala 3 2 (55.56%) Hongrois 12 2 (0.00%)
Espagnol 2933 173008 (56.91%) Normand 12 2 (0.91%)
Espéranto 61 40 (22.68%) Bulgare 12 24 (20.57%)
Estonien 1 0 (0.00%) Créole du Cap-Vert 11 12 (2.74%)
Fe’fe’ 1 4 (100.00%) Arabe 11 44 (10.38%)
Finnois 27 798 (32.36%) Brabançon 10 0 (0.00%)
Flamand occidental 22 8 (0.82%) Hindi 10 0 (0.00%)
Flamand oriental 2 0 (0.00%) Angevin 9 0 (0.00%)
Franc-comtois 12 18 (14.42%) Turc 9 2 (0.64%)
Francique rhénan 1 0 (0.00%) Galicien 9 16 (7.41%)
Francique ripuaire 2 0 (0.00%) Tourangeau 9 2 (0.52%)
Francoprovençal 4 0 (0.00%) Azéri 8 2 (10.21%)
Français 35383 11256 (45.94%) Volapük réformé 8 70 (91.67%)
Frison 3 72 (92.31%) Corse 8 0 (0.00%)
Féroïen 5 0 (0.00%) Asturien 7 4 (0.00%)
Galicien 9 16 (7.41%) Cornique 7 0 (0.00%)
Gallo 26 2 (0.14%) Vénitien 7 0 (0.00%)
Gallo-italique de Sicile 2 0 (0.00%) Nahuatl classique 7 2 (0.61%)
Gallois 52 24 (42.91%) Peul 7 0 (0.00%)
Gaulois 2 0 (0.00%) Basque 6 10 (97.16%)
Gaélique irlandais 31 18 (1.80%) Hunsrik 6 0 (0.00%)
Gotique 4 2 (1.50%) Lingala 6 48 (99.37%)
Grec 18 28 (0.40%) Serbe 6 2 (58.33%)
Grec ancien 31 2 (0.14%) Cherokee 6 0 (0.00%)
Grec cargésien 3 2 (16.67%) Temné 6 0 (0.00%)
Griko 2 0 (0.00%) Ancien occitan 5 2 (12.93%)
Haoussa 2 0 (0.00%) Féroïen 5 0 (0.00%)
Hassanya 2 0 (0.00%) Picard 5 2 (4.00%)
Hindi 10 0 (0.00%) Ido 5 2 (62.20%)
Hindoustani caribéen 2 0 (0.00%) Vieil anglais 5 0 (0.00%)
Hongrois 12 2 (0.00%) Quenya 5 150 (100.00%)
Hunsrik 6 0 (0.00%) Moyen breton 5 0 (0.00%)
Hébreu 23 14 (9.66%) Toku-no-shima 5 0 (0.00%)
Hébreu ancien 3 32 (99.90%) Biélorusse 5 54 (100.00%)
Ido 5 2 (62.20%) Sicilien 4 2 (95.65%)
Ik 2 0 (0.00%) Francoprovençal 4 0 (0.00%)
Indonésien 2 0 (0.00%) Salentin 4 2 (18.92%)
Interlingua 3 0 (0.00%) Bas-sorabe 4 0 (0.00%)
Inuktitut 3 6 (99.80%) Tatare 4 24 (83.33%)
Islandais 25 170 (0.14%) Vieux norrois 4 0 (0.00%)
Istro-roumain 3 2 (5.56%) Nahuatl central 4 10 (3.57%)
Italien 895 15840 (50.71%) Xhosa 4 32 (0.00%)
Japonais 1005 41560 (23.08%) Okinawaïen 4 0 (0.00%)
Judéo-espagnol 3 0 (0.00%) Gotique 4 2 (1.50%)
Kabyle 1 0 (0.00%) Scots 3 10 (16.92%)
Kachoube 3 0 (0.00%) Frison 3 72 (92.31%)
Kazakh 1 0 (0.00%) Coréen 3 40 (91.34%)
Kikuyu 2 0 (0.00%) Koyukon 3 102 (0.00%)
Kinyarwanda 2 30 (94.74%) Mirandais 3 4 (20.00%)
Kirghiz 2 0 (0.00%) Interlingua 3 0 (0.00%)
Kotava 34 0 (0.00%) Alémanique alsacien 3 0 (0.00%)
Koyukon 3 102 (0.00%) Judéo-espagnol 3 0 (0.00%)
Kurde 1 0 (0.00%) Grec cargésien 3 2 (16.67%)
Lacandon 2 0 (0.00%) Nǀu 3 0 (0.00%)
Latin 114 16 (0.29%) Oki-no-erabu 3 0 (0.00%)
Letton 34 0 (0.00%) Inuktitut 3 6 (99.80%)
Limbourgeois 25 16 (0.00%) Croate 3 0 (0.00%)
Lingala 6 48 (99.37%) Serbo-croate 3 2 (14.29%)
Lituanien 34 4 (0.00%) Istro-roumain 3 2 (5.56%)
Lorrain 2 0 (0.00%) Macédonien 3 10 (14.29%)
Luxembourgeois 58 168 (3.73%) Moyen anglais 3 0 (0.00%)
Macédonien 3 10 (14.29%) Kachoube 3 0 (0.00%)
Malgache 1 0 (0.00%) Tamoul 3 0 (0.00%)
Maltais 2 0 (0.00%) Douala 3 2 (55.56%)
Mannois 1 8 (100.00%) Oubykh 3 2 (82.35%)
Marathe 1 0 (0.00%) Yupik central 3 102 (68.91%)
Micmac 1 2 (100.00%) Shimaoré 3 936 (100.00%)
Mirandais 3 4 (20.00%) Hébreu ancien 3 32 (99.90%)
Mongol 15 40 (63.04%) Vietnamien 2 0 (0.00%)
Monégasque 2 0 (0.00%) Indonésien 2 0 (0.00%)
Moré 2 0 (0.00%) Romanche 2 4 (66.67%)
Moyen anglais 3 0 (0.00%) Haoussa 2 0 (0.00%)
Moyen breton 5 0 (0.00%) Lacandon 2 0 (0.00%)
Moyen français 28 10 (1.47%) Créole réunionnais 2 0 (0.00%)
Nahuatl central 4 10 (3.57%) Créole haïtien 2 0 (0.00%)
Nahuatl classique 7 2 (0.61%) Gaulois 2 0 (0.00%)
Nahuatl de Guerrero 2 0 (0.00%) Bosniaque 2 0 (0.00%)
Nahuatl de la Huasteca central 2 0 (0.00%) Baoulé 2 0 (0.00%)
Nahuatl de la Huasteca occidental 1 0 (0.00%) Chleuh 2 0 (0.00%)
Nahuatl de la Huasteca oriental 1 0 (0.00%) Créole martiniquais 2 0 (0.00%)
Nahuatl de l’Orizaba 2 0 (0.00%) Kinyarwanda 2 30 (94.74%)
Nde-nsele-nta 2 0 (0.00%) Champenois 2 0 (0.00%)
Normand 12 2 (0.91%) Aragonais 2 0 (0.00%)
Norvégien 21 34 (63.73%) Griko 2 0 (0.00%)
Norvégien (bokmål) 32 0 (0.00%) Chakali 2 0 (0.00%)
Norvégien (nynorsk) 20 0 (0.00%) Flamand oriental 2 0 (0.00%)
Nuuchahnulth 2 0 (0.00%) Gallo-italique de Sicile 2 0 (0.00%)
Néerlandais 74 42 (1.40%) Créole indo-portugais 2 0 (0.00%)
Nǀu 3 0 (0.00%) Solrésol 2 0 (0.00%)
Occitan 61 186 (31.44%) Créole guadeloupéen 2 0 (0.00%)
Oki-no-erabu 3 0 (0.00%) Kikuyu 2 0 (0.00%)
Okinawaïen 4 0 (0.00%) Sango 2 2 (28.77%)
Oneida 1 0 (0.00%) Lorrain 2 0 (0.00%)
Ossète 1 0 (0.00%) Francique ripuaire 2 0 (0.00%)
Oubykh 3 2 (82.35%) Persan 2 0 (0.00%)
Ourdou 2 0 (0.00%) Ourdou 2 0 (0.00%)
Pandunia 2 0 (0.00%) Maltais 2 0 (0.00%)
Persan 2 0 (0.00%) Nahuatl de la Huasteca central 2 0 (0.00%)
Peul 7 0 (0.00%) Tupi 2 2 (20.00%)
Picard 5 2 (4.00%) Slave molisan 2 0 (0.00%)
Pitcairnais 1 10 (100.00%) Tanjijili 2 0 (0.00%)
Poitevin-saintongeais 1 0 (0.00%) Monégasque 2 0 (0.00%)
Polonais 49 8 (0.56%) Nahuatl de l’Orizaba 2 0 (0.00%)
Portugais 54 64 (61.45%) Moré 2 0 (0.00%)
Quechua de Cuzco 1 0 (0.00%) Soussou 2 0 (0.00%)
Quenya 5 150 (100.00%) Pandunia 2 0 (0.00%)
Romanche 2 4 (66.67%) Kirghiz 2 0 (0.00%)
Roumain 74 116 (40.59%) Abanyom 2 0 (0.00%)
Russe 82 322 (87.68%) Hindoustani caribéen 2 0 (0.00%)
Salentin 4 2 (18.92%) Ik 2 0 (0.00%)
Same du Nord 43 42 (4.44%) Nahuatl de Guerrero 2 0 (0.00%)
Sango 2 2 (28.77%) Nuuchahnulth 2 0 (0.00%)
Sanskrit 1 0 (0.00%) Étrusque 2 0 (0.00%)
Scots 3 10 (16.92%) Hassanya 2 0 (0.00%)
Serbe 6 2 (58.33%) Nde-nsele-nta 2 0 (0.00%)
Serbo-croate 3 2 (14.29%) Chinois 1 2 (100.00%)
Shimaoré 3 936 (100.00%) Estonien 1 0 (0.00%)
Sicilien 4 2 (95.65%) Wallisien 1 0 (0.00%)
Slave molisan 2 0 (0.00%) Albanais 1 32 (100.00%)
Slovaque 43 30 (6.36%) Kurde 1 0 (0.00%)
Slovène 31 4 (0.05%) Malgache 1 0 (0.00%)
Solrésol 2 0 (0.00%) Wallon 1 0 (0.00%)
Soussou 2 0 (0.00%) Bambara 1 0 (0.00%)
Suédois 95 14 (0.03%) Poitevin-saintongeais 1 0 (0.00%)
Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Vieil irlandais 1 8 (100.00%)
Tadjik 1 0 (0.00%) Dioula 1 0 (0.00%)
Tamazight du Maroc central 1 0 (0.00%) Éwé 1 0 (0.00%)
Tamoul 3 0 (0.00%) Tatar de Crimée 1 2 (100.00%)
Tanjijili 2 0 (0.00%) Francique rhénan 1 0 (0.00%)
Tatar de Crimée 1 2 (100.00%) Anglo-normand 1 0 (0.00%)
Tatare 4 24 (83.33%) Chaoui 1 2 (100.00%)
Tchèque 96 8 (0.14%) Quechua de Cuzco 1 0 (0.00%)
Temné 6 0 (0.00%) Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Toku-no-shima 5 0 (0.00%) Marathe 1 0 (0.00%)
Tourangeau 9 2 (0.52%) Mannois 1 8 (100.00%)
Tupi 2 2 (20.00%) Sanskrit 1 0 (0.00%)
Turc 9 2 (0.64%) Pitcairnais 1 10 (100.00%)
Télougou 1 0 (0.00%) Télougou 1 0 (0.00%)
Ukrainien 21 46 (54.57%) Bengali 1 0 (0.00%)
Vieil anglais 5 0 (0.00%) Ossète 1 0 (0.00%)
Vieil irlandais 1 8 (100.00%) Akan 1 0 (0.00%)
Vietnamien 2 0 (0.00%) Arabe marocain 1 2 (0.00%)
Vieux norrois 4 0 (0.00%) Kazakh 1 0 (0.00%)
Vieux polonais 1 0 (0.00%) Kabyle 1 0 (0.00%)
Vieux slave 27 8 (1.93%) Micmac 1 2 (100.00%)
Volapük réformé 8 70 (91.67%) Tadjik 1 0 (0.00%)
Vénitien 7 0 (0.00%) Vieux polonais 1 0 (0.00%)
Wallisien 1 0 (0.00%) Tamazight du Maroc central 1 0 (0.00%)
Wallon 1 0 (0.00%) Nahuatl de la Huasteca occidental 1 0 (0.00%)
Xhosa 4 32 (0.00%) Berrichon 1 0 (0.00%)
Yupik central 3 102 (68.91%) Oneida 1 0 (0.00%)
Étrusque 2 0 (0.00%) Nahuatl de la Huasteca oriental 1 0 (0.00%)
Éwé 1 0 (0.00%) Fe’fe’ 1 4 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2024-09-15 from the frwiktionary dump dated 2024-09-01 using wiktextract (f5e0f37 and f566de1). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.