"type-token ratio" meaning in All languages combined

See type-token ratio on Wiktionary

Noun [English]

Forms: type-token ratios [plural], TTR [alternative]
Head templates: {{en-noun}} type-token ratio (plural type-token ratios)
  1. (corpus linguistics) The ratio of the number of different words (types) occurring in a text or corpus to the overall number of words (tokens) in the same text or corpus. Wikipedia link: Lexical density Related terms: type-token distinction
    Sense id: en-type-token_ratio-en-noun-Ab4mHoTf Categories (other): English entries with incorrect language header, Pages with 1 entry, Pages with entries, Linguistics

Inflected forms

Alternative forms

{
  "forms": [
    {
      "form": "type-token ratios",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "TTR",
      "tags": [
        "alternative"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "type-token ratio (plural type-token ratios)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "senses": [
    {
      "categories": [
        {
          "kind": "other",
          "name": "English entries with incorrect language header",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "Pages with 1 entry",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "Pages with entries",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "langcode": "en",
          "name": "Linguistics",
          "orig": "en:Linguistics",
          "parents": [],
          "source": "w"
        }
      ],
      "examples": [
        {
          "bold_text_offsets": [
            [
              275,
              291
            ]
          ],
          "ref": "2023 October 30, Herbold et al., “A large-scale comparison of human-written versus ChatGPT-generated essays”, in Scientific Reports, volume 13, page 5:",
          "text": "We identify vocabulary richness by using a well-established measure of textual, lexical diversity (MTLD) which is often used in the field of automated essay grading. It takes into account the number of unique words but unlike the best-known measure of lexical diversity, the type-token ratio (TTR), it is not as sensitive to the difference in the length of the texts.",
          "type": "quote"
        }
      ],
      "glosses": [
        "The ratio of the number of different words (types) occurring in a text or corpus to the overall number of words (tokens) in the same text or corpus."
      ],
      "id": "en-type-token_ratio-en-noun-Ab4mHoTf",
      "links": [
        [
          "linguistics",
          "linguistics"
        ],
        [
          "ratio",
          "ratio"
        ],
        [
          "type",
          "type"
        ],
        [
          "corpus",
          "corpus"
        ],
        [
          "token",
          "token"
        ]
      ],
      "qualifier": "corpus linguistics",
      "raw_glosses": [
        "(corpus linguistics) The ratio of the number of different words (types) occurring in a text or corpus to the overall number of words (tokens) in the same text or corpus."
      ],
      "related": [
        {
          "word": "type-token distinction"
        }
      ],
      "wikipedia": [
        "Lexical density"
      ]
    }
  ],
  "word": "type-token ratio"
}
{
  "forms": [
    {
      "form": "type-token ratios",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "TTR",
      "tags": [
        "alternative"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "type-token ratio (plural type-token ratios)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "related": [
    {
      "word": "type-token distinction"
    }
  ],
  "senses": [
    {
      "categories": [
        "English countable nouns",
        "English entries with incorrect language header",
        "English lemmas",
        "English multiword terms",
        "English nouns",
        "English terms with quotations",
        "Pages with 1 entry",
        "Pages with entries",
        "en:Linguistics"
      ],
      "examples": [
        {
          "bold_text_offsets": [
            [
              275,
              291
            ]
          ],
          "ref": "2023 October 30, Herbold et al., “A large-scale comparison of human-written versus ChatGPT-generated essays”, in Scientific Reports, volume 13, page 5:",
          "text": "We identify vocabulary richness by using a well-established measure of textual, lexical diversity (MTLD) which is often used in the field of automated essay grading. It takes into account the number of unique words but unlike the best-known measure of lexical diversity, the type-token ratio (TTR), it is not as sensitive to the difference in the length of the texts.",
          "type": "quote"
        }
      ],
      "glosses": [
        "The ratio of the number of different words (types) occurring in a text or corpus to the overall number of words (tokens) in the same text or corpus."
      ],
      "links": [
        [
          "linguistics",
          "linguistics"
        ],
        [
          "ratio",
          "ratio"
        ],
        [
          "type",
          "type"
        ],
        [
          "corpus",
          "corpus"
        ],
        [
          "token",
          "token"
        ]
      ],
      "qualifier": "corpus linguistics",
      "raw_glosses": [
        "(corpus linguistics) The ratio of the number of different words (types) occurring in a text or corpus to the overall number of words (tokens) in the same text or corpus."
      ],
      "wikipedia": [
        "Lexical density"
      ]
    }
  ],
  "word": "type-token ratio"
}

Download raw JSONL data for type-token ratio meaning in All languages combined (1.7kB)


This page is a part of the kaikki.org machine-readable All languages combined dictionary. This dictionary is based on structured data extracted on 2025-06-01 from the enwiktionary dump dated 2025-05-20 using wiktextract (3dadd05 and f1c2b61). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.