Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Abanyom 2 0 (0.00%) Français 32452 10612 (46.23%)
Afrikaans 11 2 (1.16%) Japonais 1001 41488 (22.68%)
Akan 1 0 (0.00%) Breton 227 118 (7.49%)
Albanais 1 32 (100.00%) Latin 112 16 (0.30%)
Allemand 90 112 (9.80%) Italien 101 36 (49.44%)
Alémanique alsacien 3 0 (0.00%) Tchèque 96 10 (0.14%)
Ancien français 56 8 (4.65%) Suédois 95 14 (0.03%)
Ancien occitan 5 2 (12.93%) Allemand 90 112 (9.80%)
Angevin 5 0 (0.00%) Russe 81 320 (87.82%)
Anglais 58 92 (0.07%) Roumain 74 116 (40.60%)
Anglo-normand 1 0 (0.00%) Néerlandais 68 130 (0.86%)
Arabe 11 44 (11.87%) Espagnol 67 58 (56.43%)
Arabe judéo-tripolitain 21 0 (0.00%) Espéranto 59 40 (22.71%)
Arabe marocain 1 2 (0.00%) Anglais 58 92 (0.07%)
Aragonais 2 0 (0.00%) Luxembourgeois 58 168 (3.73%)
Asturien 7 4 (0.00%) Ancien français 56 8 (4.65%)
Azéri 8 6 (10.21%) Portugais 55 66 (61.59%)
Bambara 1 0 (0.00%) Occitan 51 150 (30.78%)
Baoulé 2 0 (0.00%) Gallois 51 18 (42.88%)
Bas-sorabe 4 0 (0.00%) Polonais 49 6 (0.57%)
Basque 6 10 (96.90%) Slovaque 43 30 (6.36%)
Bengali 1 0 (0.00%) Same du Nord 43 42 (4.86%)
Biélorusse 5 54 (100.00%) Danois 38 60 (61.18%)
Bosniaque 2 0 (0.00%) Kotava 34 0 (0.00%)
Brabançon 10 0 (0.00%) Letton 34 0 (0.00%)
Breton 227 118 (7.49%) Norvégien (bokmål) 32 0 (0.00%)
Bulgare 12 24 (20.55%) Gaélique irlandais 31 18 (1.81%)
Catalan 17 32 (24.37%) Slovène 31 4 (0.05%)
Chakali 2 0 (0.00%) Lituanien 31 4 (0.00%)
Champenois 2 0 (0.00%) Grec ancien 30 2 (0.14%)
Chaoui 1 2 (100.00%) Finnois 27 798 (32.36%)
Cherokee 6 0 (0.00%) Moyen français 26 10 (1.48%)
Chinois 1 2 (100.00%) Vieux slave 26 6 (1.45%)
Chleuh 2 0 (0.00%) Gallo 25 2 (0.15%)
Cornique 7 0 (0.00%) Limbourgeois 25 16 (0.00%)
Corse 8 0 (0.00%) Islandais 25 170 (0.14%)
Coréen 3 30 (91.32%) Hébreu 23 20 (9.66%)
Croate 3 0 (0.00%) Flamand occidental 22 8 (0.93%)
Créole du Cap-Vert 11 12 (2.74%) Norvégien 21 34 (63.73%)
Créole guadeloupéen 2 0 (0.00%) Ukrainien 21 46 (54.58%)
Créole haïtien 2 0 (0.00%) Arabe judéo-tripolitain 21 0 (0.00%)
Créole indo-portugais 2 0 (0.00%) Norvégien (nynorsk) 20 0 (0.00%)
Créole martiniquais 2 0 (0.00%) Grec 19 28 (0.42%)
Créole réunionnais 2 0 (0.00%) Catalan 17 32 (24.37%)
Danois 38 60 (61.18%) Mongol 15 40 (63.04%)
Dioula 1 0 (0.00%) Hongrois 12 2 (0.00%)
Douala 2 2 (55.56%) Franc-comtois 12 18 (14.47%)
Espagnol 67 58 (56.43%) Bulgare 12 24 (20.55%)
Espéranto 59 40 (22.71%) Créole du Cap-Vert 11 12 (2.74%)
Estonien 1 0 (0.00%) Afrikaans 11 2 (1.16%)
Fe’fe’ 1 4 (100.00%) Arabe 11 44 (11.87%)
Finnois 27 798 (32.36%) Normand 10 4 (0.93%)
Flamand occidental 22 8 (0.93%) Brabançon 10 0 (0.00%)
Flamand oriental 2 0 (0.00%) Hindi 10 0 (0.00%)
Franc-comtois 12 18 (14.47%) Galicien 9 16 (7.50%)
Francique rhénan 1 0 (0.00%) Azéri 8 6 (10.21%)
Francique ripuaire 2 0 (0.00%) Tourangeau 8 2 (0.56%)
Francoprovençal 2 0 (0.00%) Corse 8 0 (0.00%)
Français 32452 10612 (46.23%) Volapük réformé 8 70 (91.67%)
Frison 3 72 (92.31%) Turc 7 2 (0.00%)
Féroïen 5 0 (0.00%) Asturien 7 4 (0.00%)
Galicien 9 16 (7.50%) Cornique 7 0 (0.00%)
Gallo 25 2 (0.15%) Vénitien 7 0 (0.00%)
Gallo-italique de Sicile 2 0 (0.00%) Nahuatl classique 7 2 (0.61%)
Gallois 51 18 (42.88%) Basque 6 10 (96.90%)
Gaulois 2 0 (0.00%) Hunsrik 6 0 (0.00%)
Gaélique irlandais 31 18 (1.81%) Lingala 6 50 (99.37%)
Gotique 4 2 (1.50%) Serbe 6 2 (58.33%)
Grec 19 28 (0.42%) Cherokee 6 0 (0.00%)
Grec ancien 30 2 (0.14%) Temné 6 0 (0.00%)
Grec cargésien 3 2 (16.67%) Angevin 5 0 (0.00%)
Griko 2 0 (0.00%) Féroïen 5 0 (0.00%)
Haoussa 2 0 (0.00%) Ido 5 2 (62.20%)
Hassanya 2 0 (0.00%) Ancien occitan 5 2 (12.93%)
Hindi 10 0 (0.00%) Picard 5 6 (64.00%)
Hongrois 12 2 (0.00%) Vieil anglais 5 0 (0.00%)
Hunsrik 6 0 (0.00%) Quenya 5 148 (100.00%)
Hébreu 23 20 (9.66%) Moyen breton 5 0 (0.00%)
Hébreu ancien 3 24 (99.90%) Toku-no-shima 5 0 (0.00%)
Ido 5 2 (62.20%) Biélorusse 5 54 (100.00%)
Ik 2 0 (0.00%) Sicilien 4 2 (95.65%)
Indonésien 2 0 (0.00%) Salentin 4 2 (19.44%)
Interlingua 2 0 (0.00%) Bas-sorabe 4 0 (0.00%)
Inuktitut 3 10 (99.80%) Tatare 4 24 (83.33%)
Islandais 25 170 (0.14%) Vieux norrois 4 0 (0.00%)
Istro-roumain 3 2 (5.56%) Nahuatl central 4 10 (3.57%)
Italien 101 36 (49.44%) Xhosa 4 32 (0.00%)
Japonais 1001 41488 (22.68%) Gotique 4 2 (1.50%)
Judéo-espagnol 3 0 (0.00%) Scots 3 10 (31.43%)
Kabyle 1 0 (0.00%) Frison 3 72 (92.31%)
Kachoube 3 0 (0.00%) Coréen 3 30 (91.32%)
Kazakh 1 0 (0.00%) Mirandais 3 4 (20.00%)
Kikuyu 2 0 (0.00%) Alémanique alsacien 3 0 (0.00%)
Kinyarwanda 2 30 (94.74%) Koyukon 3 102 (0.00%)
Kirghiz 2 0 (0.00%) Oki-no-erabu 3 0 (0.00%)
Kotava 34 0 (0.00%) Judéo-espagnol 3 0 (0.00%)
Koyukon 3 102 (0.00%) Grec cargésien 3 2 (16.67%)
Kurde 1 0 (0.00%) Nǀu 3 0 (0.00%)
Lacandon 2 0 (0.00%) Inuktitut 3 10 (99.80%)
Latin 112 16 (0.30%) Croate 3 0 (0.00%)
Letton 34 0 (0.00%) Istro-roumain 3 2 (5.56%)
Limbourgeois 25 16 (0.00%) Macédonien 3 10 (14.29%)
Lingala 6 50 (99.37%) Moyen anglais 3 0 (0.00%)
Lituanien 31 4 (0.00%) Tamoul 3 0 (0.00%)
Lorrain 2 0 (0.00%) Kachoube 3 0 (0.00%)
Luxembourgeois 58 168 (3.73%) Yupik central 3 102 (68.64%)
Macédonien 3 10 (14.29%) Oubykh 3 2 (82.35%)
Malgache 1 0 (0.00%) Hébreu ancien 3 24 (99.90%)
Maltais 2 0 (0.00%) Okinawaïen 3 0 (0.00%)
Mannois 1 8 (100.00%) Vietnamien 2 0 (0.00%)
Marathe 1 0 (0.00%) Indonésien 2 0 (0.00%)
Micmac 1 2 (100.00%) Romanche 2 4 (66.67%)
Mirandais 3 4 (20.00%) Haoussa 2 0 (0.00%)
Mongol 15 40 (63.04%) Lacandon 2 0 (0.00%)
Monégasque 2 0 (0.00%) Interlingua 2 0 (0.00%)
Moré 2 0 (0.00%) Créole haïtien 2 0 (0.00%)
Moyen anglais 3 0 (0.00%) Créole réunionnais 2 0 (0.00%)
Moyen breton 5 0 (0.00%) Gaulois 2 0 (0.00%)
Moyen français 26 10 (1.48%) Champenois 2 0 (0.00%)
Nahuatl central 4 10 (3.57%) Créole martiniquais 2 0 (0.00%)
Nahuatl classique 7 2 (0.61%) Aragonais 2 0 (0.00%)
Nahuatl de Guerrero 2 0 (0.00%) Bosniaque 2 0 (0.00%)
Nahuatl de la Huasteca central 2 0 (0.00%) Francoprovençal 2 0 (0.00%)
Nahuatl de la Huasteca occidental 1 0 (0.00%) Baoulé 2 0 (0.00%)
Nahuatl de la Huasteca oriental 1 0 (0.00%) Chleuh 2 0 (0.00%)
Nahuatl de l’Orizaba 2 0 (0.00%) Kinyarwanda 2 30 (94.74%)
Nde-nsele-nta 2 0 (0.00%) Gallo-italique de Sicile 2 0 (0.00%)
Normand 10 4 (0.93%) Chakali 2 0 (0.00%)
Norvégien 21 34 (63.73%) Flamand oriental 2 0 (0.00%)
Norvégien (bokmål) 32 0 (0.00%) Griko 2 0 (0.00%)
Norvégien (nynorsk) 20 0 (0.00%) Serbo-croate 2 0 (0.00%)
Nuuchahnulth 2 0 (0.00%) Kikuyu 2 0 (0.00%)
Néerlandais 68 130 (0.86%) Créole guadeloupéen 2 0 (0.00%)
Nǀu 3 0 (0.00%) Créole indo-portugais 2 0 (0.00%)
Occitan 51 150 (30.78%) Sango 2 2 (28.77%)
Oki-no-erabu 3 0 (0.00%) Solrésol 2 0 (0.00%)
Okinawaïen 3 0 (0.00%) Francique ripuaire 2 0 (0.00%)
Oneida 1 0 (0.00%) Lorrain 2 0 (0.00%)
Ossète 1 0 (0.00%) Persan 2 0 (0.00%)
Oubykh 3 2 (82.35%) Ourdou 2 0 (0.00%)
Ourdou 2 0 (0.00%) Maltais 2 0 (0.00%)
Pandunia 2 0 (0.00%) Douala 2 2 (55.56%)
Persan 2 0 (0.00%) Nahuatl de la Huasteca central 2 0 (0.00%)
Picard 5 6 (64.00%) Tupi 2 2 (20.00%)
Pitcairnais 1 10 (100.00%) Slave molisan 2 0 (0.00%)
Polonais 49 6 (0.57%) Tanjijili 2 0 (0.00%)
Portugais 55 66 (61.59%) Monégasque 2 0 (0.00%)
Quechua de Cuzco 1 0 (0.00%) Nahuatl de l’Orizaba 2 0 (0.00%)
Quenya 5 148 (100.00%) Moré 2 0 (0.00%)
Romanche 2 4 (66.67%) Pandunia 2 0 (0.00%)
Roumain 74 116 (40.60%) Kirghiz 2 0 (0.00%)
Russe 81 320 (87.82%) Abanyom 2 0 (0.00%)
Salentin 4 2 (19.44%) Ik 2 0 (0.00%)
Same du Nord 43 42 (4.86%) Nahuatl de Guerrero 2 0 (0.00%)
Sango 2 2 (28.77%) Nuuchahnulth 2 0 (0.00%)
Sanskrit 1 0 (0.00%) Étrusque 2 0 (0.00%)
Scots 3 10 (31.43%) Hassanya 2 0 (0.00%)
Serbe 6 2 (58.33%) Nde-nsele-nta 2 0 (0.00%)
Serbo-croate 2 0 (0.00%) Chinois 1 2 (100.00%)
Sicilien 4 2 (95.65%) Kurde 1 0 (0.00%)
Slave molisan 2 0 (0.00%) Albanais 1 32 (100.00%)
Slovaque 43 30 (6.36%) Bambara 1 0 (0.00%)
Slovène 31 4 (0.05%) Estonien 1 0 (0.00%)
Solrésol 2 0 (0.00%) Wallisien 1 0 (0.00%)
Soussou 1 0 (0.00%) Malgache 1 0 (0.00%)
Suédois 95 14 (0.03%) Chaoui 1 2 (100.00%)
Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Dioula 1 0 (0.00%)
Tadjik 1 0 (0.00%) Éwé 1 0 (0.00%)
Tamoul 3 0 (0.00%) Anglo-normand 1 0 (0.00%)
Tanjijili 2 0 (0.00%) Tatar de Crimée 1 2 (100.00%)
Tatar de Crimée 1 2 (100.00%) Francique rhénan 1 0 (0.00%)
Tatare 4 24 (83.33%) Quechua de Cuzco 1 0 (0.00%)
Tchèque 96 10 (0.14%) Marathe 1 0 (0.00%)
Temné 6 0 (0.00%) Swahili 1 2 (100.00%)
Because subject class concord and object class concord uses the same form (for example "c5"), we don't have yet a way to distinguish them. The meaning is human-parsable because we can look at the alignment of the headers (horizontal=subject concord, vertical=object concord), but the parser does not (yet) have this kind of memory or spatial awareness. /////// Sep 9th 2022: I've recently implemented a way to transform row or column tags into other tags based on language, but it doesn't solve an underlying issue with Swahili tables that I'd missed or forgotten about: The horizontal column headers for class and person don't get inherited through the whole template because it is chopped up into several subtables that don't directly inherit from above. It's an issue with 'bleed' again. /////// Later (comment in Dec 2022): forgot to put this here, but Swahili tables have now gotten their own big systems that make them work, including a "save to register" feature that can keep column headers in memory to be used in a later subtable within a template.
Toku-no-shima 5 0 (0.00%) Vieil irlandais 1 8 (100.00%)
Tourangeau 8 2 (0.56%) Mannois 1 8 (100.00%)
Tupi 2 2 (20.00%) Sanskrit 1 0 (0.00%)
Turc 7 2 (0.00%) Pitcairnais 1 10 (100.00%)
Télougou 1 0 (0.00%) Télougou 1 0 (0.00%)
Ukrainien 21 46 (54.58%) Bengali 1 0 (0.00%)
Vieil anglais 5 0 (0.00%) Ossète 1 0 (0.00%)
Vieil irlandais 1 8 (100.00%) Arabe marocain 1 2 (0.00%)
Vietnamien 2 0 (0.00%) Akan 1 0 (0.00%)
Vieux norrois 4 0 (0.00%) Kazakh 1 0 (0.00%)
Vieux polonais 1 0 (0.00%) Kabyle 1 0 (0.00%)
Vieux slave 26 6 (1.45%) Micmac 1 2 (100.00%)
Volapük réformé 8 70 (91.67%) Soussou 1 0 (0.00%)
Vénitien 7 0 (0.00%) Tadjik 1 0 (0.00%)
Wallisien 1 0 (0.00%) Vieux polonais 1 0 (0.00%)
Xhosa 4 32 (0.00%) Nahuatl de la Huasteca occidental 1 0 (0.00%)
Yupik central 3 102 (68.64%) Oneida 1 0 (0.00%)
Étrusque 2 0 (0.00%) Nahuatl de la Huasteca oriental 1 0 (0.00%)
Éwé 1 0 (0.00%) Fe’fe’ 1 4 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2024-05-28 from the frwiktionary dump dated 2024-05-02 using wiktextract (9d9fc81 and db5a844). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.