Wiktionary data extraction errors and warnings

Inflection check

List of different kinds of inflection tables. When wiktextract parses word heads and tables, it assigns the forms it encounters with tags that describe grammatical or contextual information. The tags and forms that are found in head sections and tables are kept separate from other head section and table tags, and later they are merged with other heads and tables into table types that all contain the same number of word forms with the same tags for those forms.

The information presented here is mostly for debugging, but it can also be used to find interesting word paradigms and to hunt down mistakes, typoes and badly formated Wiktionary entries. A table type that has only a few unique instances is quite likely to contain some kind of minor error in the original data.

Language ⏶ Table forms Errors (% affected words) Language Table forms ⏷ Errors (% affected words)
Africâner/Africânder 6 2 (4.00%) Português 90 18 (0.25%)
Albanês 2 4 (13.33%) Galego 75 64 (2.58%)
Alemânico 2 2 (58.33%) Francês 38 2 (3.97%)
Alemão 5 12 (72.31%) Espanhol 37 6 (0.99%)
Altaico 1 0 (0.00%) Italiano 32 8 (5.03%)
Aragonês 16 2 (9.35%) Inglês 31 6 (4.33%)
Arromeno 1 4 (100.00%) Catalão 26 4 (5.94%)
Asturiano 25 38 (11.24%) Asturiano 25 38 (11.24%)
Asurini 1 2 (100.00%) Mirandês 23 6 (1.33%)
Azerbaijano 2 2 (50.00%) Galego-Português Medieval 21 18 (8.43%)
Baixo Saxão 2 2 (50.00%) Aragonês 16 2 (9.35%)
Baixo Saxão Holandês 1 4 (100.00%) Occitano 14 12 (5.88%)
Basco 1 2 (100.00%) Talian 13 2 (22.22%)
Bicolano 1 2 (100.00%) Véneto/Vêneto 12 2 (5.13%)
Bielorrusso 1 0 (0.00%) Sardo 10 16 (26.09%)
Bretão 4 0 (0.00%) Corso/Córsico 10 4 (14.55%)
Buginês 1 0 (0.00%) Siciliano 9 12 (18.75%)
Bávaro 1 0 (0.00%) Leonês 9 0 (0.00%)
Bósnio 2 2 (96.77%) Romeno 8 4 (38.89%)
Búlgaro 1 0 (0.00%) Holandês/Neerlandês 6 2 (57.14%)
Cabo-verdiano 1 0 (0.00%) Língua Franca Nova 6 2 (3.77%)
Catalão 26 4 (5.94%) Latim 6 10 (90.91%)
Catalão/Valenciano 1 0 (0.00%) Africâner/Africânder 6 2 (4.00%)
Cebuano 1 2 (100.00%) Romanche 6 0 (0.00%)
Chamorro 1 0 (0.00%) Alemão 5 12 (72.31%)
Chavacano 1 0 (0.00%) Ladino 5 4 (16.67%)
Checo/Tcheco 2 2 (88.89%) Luxemburguês 5 2 (2.94%)
Chinês 2 0 (0.00%) Valenciano 5 0 (0.00%)
Coreano 1 0 (0.00%) Friuliano 5 2 (8.33%)
Corso/Córsico 10 4 (14.55%) Córnico 5 2 (43.75%)
Croata 1 4 (100.00%) Francês Antigo 5 6 (33.33%)
Cuanhama 1 0 (0.00%) Liguriano 4 2 (28.57%)
Curdo 2 2 (74.51%) Interlíngua 4 2 (7.14%)
Címbrio 1 2 (100.00%) Bretão 4 0 (0.00%)
Córnico 5 2 (43.75%) Ido 4 2 (57.14%)
Dimili/Zazaki 2 2 (66.67%) Multilíngue 4 0 (0.00%)
Dinamarquês 2 4 (93.33%) Gaélico Escocês 4 4 (75.76%)
Emiliano-romanholo 2 2 (66.67%) Estremenho 4 0 (0.00%)
Eslavo Eclesiástico 1 2 (100.00%) Maltês 4 2 (25.00%)
Eslovaco 2 2 (81.82%) Galês 4 2 (60.00%)
Esloveno 1 10 (100.00%) Provençal Antigo 4 0 (0.00%)
Espanhol 37 6 (0.99%) Japonês 3 2 (1.56%)
Espanhol Medieval 1 0 (0.00%) Indonésio 3 2 (77.78%)
Esperanto 2 2 (69.70%) Tupi 3 2 (60.00%)
Estoniano 2 2 (83.33%) Franco-Provençal 3 2 (60.00%)
Estremenho 4 0 (0.00%) Scots 3 2 (26.32%)
Feroês 1 2 (100.00%) Lombardo 3 0 (0.00%)
Finlandês 2 6 (60.00%) Ladino Dolomita 3 0 (0.00%)
Flamengo 1 4 (100.00%) Inglês Antigo 3 0 (0.00%)
Franco-Provençal 3 2 (60.00%) Náuatle Clássico 3 0 (0.00%)
Francês 38 2 (3.97%) Grego 3 2 (52.94%)
Francês Antigo 5 6 (33.33%) Hebraico 3 2 (83.33%)
Francónio/Francônio/Kölsch 1 0 (0.00%) Esperanto 2 2 (69.70%)
Friuliano 5 2 (8.33%) Turco 2 2 (60.00%)
Frísio 1 0 (0.00%) Valão 2 0 (0.00%)
Gagauz 1 0 (0.00%) Checo/Tcheco 2 2 (88.89%)
Galego 75 64 (2.58%) Papiamento 2 2 (3.74%)
Galego-Português Medieval 21 18 (8.43%) Servocroata 2 2 (36.36%)
Galês 4 2 (60.00%) Finlandês 2 6 (60.00%)
Gaélico Escocês 4 4 (75.76%) Bósnio 2 2 (96.77%)
Gilbertês 1 2 (100.00%) Dinamarquês 2 4 (93.33%)
Grego 3 2 (52.94%) Húngaro 2 6 (80.00%)
Grego Antigo 2 2 (83.33%) Islandês 2 2 (85.00%)
Guarani 1 2 (100.00%) Norueguês Bokmål 2 2 (91.67%)
Haitiano 1 2 (100.00%) Sérvio 2 2 (13.79%)
Hebraico 3 2 (83.33%) Vietnamita 2 2 (80.00%)
Holandês/Neerlandês 6 2 (57.14%) Estoniano 2 2 (83.33%)
Húngaro 2 6 (80.00%) Eslovaco 2 2 (81.82%)
Ido 4 2 (57.14%) Lituano 2 2 (75.00%)
Ilocano 2 2 (50.00%) Ilocano 2 2 (50.00%)
Indonésio 3 2 (77.78%) Baixo Saxão 2 2 (50.00%)
Inglês 31 6 (4.33%) Norueguês Nynorsk 2 2 (50.00%)
Inglês Antigo 3 0 (0.00%) Irlandês 2 2 (78.57%)
Interlíngua 4 2 (7.14%) Azerbaijano 2 2 (50.00%)
Interlíngue 1 0 (0.00%) Suaíli 2 2 (20.00%)
Inuktitut 1 0 (0.00%) Alemânico 2 2 (58.33%)
Iorubá 1 2 (100.00%) Albanês 2 4 (13.33%)
Irlandês 2 2 (78.57%) Curdo 2 2 (74.51%)
Irlandês Antigo 1 2 (100.00%) Manquês 2 2 (75.00%)
Islandês 2 2 (85.00%) Letão 2 2 (73.91%)
Italiano 32 8 (5.03%) Javanês 2 2 (12.50%)
Iucateco 1 0 (0.00%) Dimili/Zazaki 2 2 (66.67%)
Iídiche 1 2 (100.00%) Emiliano-romanholo 2 2 (66.67%)
Japonês 3 2 (1.56%) Lingala 2 4 (66.67%)
Javanês 2 2 (12.50%) Romani Vlax 2 2 (66.67%)
Ladino 5 4 (16.67%) Grego Antigo 2 2 (83.33%)
Ladino Dolomita 3 0 (0.00%) Árabe 2 0 (0.00%)
Latim 6 10 (90.91%) Persa 2 0 (0.00%)
Laz 1 4 (100.00%) Macedónio/Macedônio 2 0 (0.00%)
Leonês 9 0 (0.00%) Chinês 2 0 (0.00%)
Letão 2 2 (73.91%) Quéchua 2 6 (44.44%)
Liguriano 4 2 (28.57%) Tétum 1 0 (0.00%)
Lingala 2 4 (66.67%) Basco 1 2 (100.00%)
Lituano 2 2 (75.00%) Sueco 1 4 (100.00%)
Livicoviano 1 0 (0.00%) Croata 1 4 (100.00%)
Lombardo 3 0 (0.00%) Esloveno 1 10 (100.00%)
Luxemburguês 5 2 (2.94%) Polonês 1 2 (100.00%)
Língua Franca Nova 6 2 (3.77%) Feroês 1 2 (100.00%)
Macedónio/Macedônio 2 0 (0.00%) Piemontês 1 0 (0.00%)
Malaio 1 0 (0.00%) Tagalo 1 2 (100.00%)
Malgaxe 1 2 (100.00%) Chamorro 1 0 (0.00%)
Maltês 4 2 (25.00%) Frísio 1 0 (0.00%)
Manquês 2 2 (75.00%) Malaio 1 0 (0.00%)
Mari 1 0 (0.00%) Irlandês Antigo 1 2 (100.00%)
Min Nan 1 0 (0.00%) Novial 1 0 (0.00%)
Mirandês 23 6 (1.33%) Tártaro 1 0 (0.00%)
Mohawk 1 0 (0.00%) Uigure 1 0 (0.00%)
Moldavo 1 2 (100.00%) Uzbeque 1 0 (0.00%)
Multilíngue 4 0 (0.00%) Iucateco 1 0 (0.00%)
Normando 1 2 (100.00%) Francónio/Francônio/Kölsch 1 0 (0.00%)
Norueguês Bokmål 2 2 (91.67%) Chavacano 1 0 (0.00%)
Norueguês Nynorsk 2 2 (50.00%) Baixo Saxão Holandês 1 4 (100.00%)
Novial 1 0 (0.00%) Bávaro 1 0 (0.00%)
Náuatle Clássico 3 0 (0.00%) Cebuano 1 2 (100.00%)
Occitano 14 12 (5.88%) Flamengo 1 4 (100.00%)
Papiamento 2 2 (3.74%) Interlíngue 1 0 (0.00%)
Persa 2 0 (0.00%) Sesoto 1 0 (0.00%)
Piemontês 1 0 (0.00%) Surinamês 1 2 (100.00%)
Polonês 1 2 (100.00%) Gagauz 1 0 (0.00%)
Português 90 18 (0.25%) Iorubá 1 2 (100.00%)
Português² 1 0 (0.00%) Sardo Campidanês 1 0 (0.00%)
Português¹ 1 0 (0.00%) Guarani 1 2 (100.00%)
Provençal Antigo 4 0 (0.00%) Arromeno 1 4 (100.00%)
Quéchua 2 6 (44.44%) Haitiano 1 2 (100.00%)
Romanche 6 0 (0.00%) Malgaxe 1 2 (100.00%)
Romani Vlax 2 2 (66.67%) Min Nan 1 0 (0.00%)
Romeno 8 4 (38.89%) Normando 1 2 (100.00%)
Russo 1 0 (0.00%) Buginês 1 0 (0.00%)
Sardo 10 16 (26.09%) Cabo-verdiano 1 0 (0.00%)
Sardo Campidanês 1 0 (0.00%) Búlgaro 1 0 (0.00%)
Scots 3 2 (26.32%) Ucraniano 1 2 (100.00%)
Servocroata 2 2 (36.36%) Russo 1 0 (0.00%)
Sesoto 1 0 (0.00%) Gilbertês 1 2 (100.00%)
Shor 1 0 (0.00%) Bielorrusso 1 0 (0.00%)
Siciliano 9 12 (18.75%) Coreano 1 0 (0.00%)
Suaíli 2 2 (20.00%) Altaico 1 0 (0.00%)
Sueco 1 4 (100.00%) Tuvano 1 0 (0.00%)
Surinamês 1 2 (100.00%) Laz 1 4 (100.00%)
Sérvio 2 2 (13.79%) Livicoviano 1 0 (0.00%)
Tagalo 1 2 (100.00%) Espanhol Medieval 1 0 (0.00%)
Talian 13 2 (22.22%) Mari 1 0 (0.00%)
Tupi 3 2 (60.00%) Iídiche 1 2 (100.00%)
Turco 2 2 (60.00%) Címbrio 1 2 (100.00%)
Tuvano 1 0 (0.00%) Inuktitut 1 0 (0.00%)
Tártaro 1 0 (0.00%) Eslavo Eclesiástico 1 2 (100.00%)
Tétum 1 0 (0.00%) Mohawk 1 0 (0.00%)
Ucraniano 1 2 (100.00%) Moldavo 1 2 (100.00%)
Uigure 1 0 (0.00%) Votíaco 1 0 (0.00%)
Uzbeque 1 0 (0.00%) Português¹ 1 0 (0.00%)
Valenciano 5 0 (0.00%) Português² 1 0 (0.00%)
Valão 2 0 (0.00%) Bicolano 1 2 (100.00%)
Vietnamita 2 2 (80.00%) Cuanhama 1 0 (0.00%)
Votíaco 1 0 (0.00%) Catalão/Valenciano 1 0 (0.00%)
Véneto/Vêneto 12 2 (5.13%) Shor 1 0 (0.00%)
Árabe 2 0 (0.00%) Asurini 1 2 (100.00%)

This page is a part of the kaikki.org machine-readable dictionary. This dictionary is based on structured data extracted on 2025-07-19 from the ptwiktionary dump dated 2025-07-03 using wiktextract (45c4a21 and f1c2b61). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.