"collocation extraction" meaning in All languages combined

See collocation extraction on Wiktionary

Noun [English]

Head templates: {{en-noun|-}} collocation extraction (uncountable)
  1. (computational linguistics) The automated process of identifying and extracting word combinations that frequently appear together in a given corpus. Wikipedia link: collocation extraction Tags: uncountable Related terms: statistically improbable phrase
{
  "head_templates": [
    {
      "args": {
        "1": "-"
      },
      "expansion": "collocation extraction (uncountable)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "senses": [
    {
      "categories": [
        {
          "kind": "other",
          "name": "English entries with incorrect language header",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "Pages with 1 entry",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "Pages with entries",
          "parents": [],
          "source": "w"
        },
        {
          "kind": "other",
          "langcode": "en",
          "name": "Computational linguistics",
          "orig": "en:Computational linguistics",
          "parents": [],
          "source": "w"
        }
      ],
      "glosses": [
        "The automated process of identifying and extracting word combinations that frequently appear together in a given corpus."
      ],
      "id": "en-collocation_extraction-en-noun-StTMZJfn",
      "links": [
        [
          "computational linguistics",
          "computational linguistics"
        ],
        [
          "automated",
          "automated"
        ],
        [
          "identify",
          "identify"
        ],
        [
          "extract",
          "extract"
        ],
        [
          "corpus",
          "corpus"
        ]
      ],
      "raw_glosses": [
        "(computational linguistics) The automated process of identifying and extracting word combinations that frequently appear together in a given corpus."
      ],
      "related": [
        {
          "word": "statistically improbable phrase"
        }
      ],
      "tags": [
        "uncountable"
      ],
      "topics": [
        "computational",
        "computing",
        "engineering",
        "human-sciences",
        "linguistics",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ],
      "wikipedia": [
        "collocation extraction"
      ]
    }
  ],
  "word": "collocation extraction"
}
{
  "head_templates": [
    {
      "args": {
        "1": "-"
      },
      "expansion": "collocation extraction (uncountable)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "related": [
    {
      "word": "statistically improbable phrase"
    }
  ],
  "senses": [
    {
      "categories": [
        "English entries with incorrect language header",
        "English lemmas",
        "English multiword terms",
        "English nouns",
        "English uncountable nouns",
        "Pages with 1 entry",
        "Pages with entries",
        "en:Computational linguistics"
      ],
      "glosses": [
        "The automated process of identifying and extracting word combinations that frequently appear together in a given corpus."
      ],
      "links": [
        [
          "computational linguistics",
          "computational linguistics"
        ],
        [
          "automated",
          "automated"
        ],
        [
          "identify",
          "identify"
        ],
        [
          "extract",
          "extract"
        ],
        [
          "corpus",
          "corpus"
        ]
      ],
      "raw_glosses": [
        "(computational linguistics) The automated process of identifying and extracting word combinations that frequently appear together in a given corpus."
      ],
      "tags": [
        "uncountable"
      ],
      "topics": [
        "computational",
        "computing",
        "engineering",
        "human-sciences",
        "linguistics",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ],
      "wikipedia": [
        "collocation extraction"
      ]
    }
  ],
  "word": "collocation extraction"
}

Download raw JSONL data for collocation extraction meaning in All languages combined (1.2kB)


This page is a part of the kaikki.org machine-readable All languages combined dictionary. This dictionary is based on structured data extracted on 2025-06-01 from the enwiktionary dump dated 2025-05-20 using wiktextract (3dadd05 and f1c2b61). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.