"corpus" meaning in English

See corpus in All languages combined, or Wiktionary

Noun

IPA: /ˈkɔːpəs/ [Received-Pronunciation], /ˈkɔɹpəs/ [General-American] Audio: en-au-corpus.ogg [Australia] Forms: corpora [plural], corpuses [plural], corpusses [plural], corpi [plural, proscribed]
Rhymes: -ɔː(ɹ)pəs Etymology: Borrowed from Latin corpus (“body”). Doublet of corpse, corps, and riff. Etymology templates: {{root|en|ine-pro|*krep-}}, {{glossary|loanword|Borrowed}} Borrowed, {{bor|en|la|corpus||body|g=|g2=|g3=|id=|lit=|nocat=|pos=|sc=|sort=|tr=|ts=}} Latin corpus (“body”), {{bor+|en|la|corpus||body}} Borrowed from Latin corpus (“body”), {{doublet|en|corpse|corps|riff#Etymology 2}} Doublet of corpse, corps, and riff Head templates: {{en-noun|corpora|+|corpusses|corpi|pl4qual=proscribed}} corpus (plural corpora or corpuses or corpusses or (proscribed) corpi)
  1. A collection of writings, often on a specific topic, of a specific genre, from a specific demographic or a particular author, etc. Synonyms: collection, compilation, aggregation, body Translations (linguistics: collection of writings): مَتْن (matn) [masculine] (Arabic), مَكْنَز لُغَوِيّ (maknaz luḡawiyy) [masculine] (Arabic), ко́рпус (kórpus) [masculine] (Belarusian), збор (zbor) [masculine] (Belarusian), ко́рпус (kórpus) [masculine] (Bulgarian), corpus [masculine] (Catalan), 語料庫 (Chinese Mandarin), 语料库 (yǔliàokù) (Chinese Mandarin), korpus [masculine] (Czech), korpus [neuter] (Danish), corpus [neuter] (Dutch), tekstaro (Esperanto), korpuso (Esperanto), korpus (Finnish), corpus [masculine] (French), Korpus [neuter] (German), Textkorpus [neuter] (German), σώμα (sóma) [neuter] (Greek), συλλογή (syllogí) [feminine] (Greek), korpusz (Hungarian), corpus [masculine] (Italian), コーパス (kōpasu) (Japanese), 말뭉치 (malmungchi) (Korean), 코퍼스 (kopeoseu) (Korean), ко́рпус (kórpus) [masculine] (Macedonian), putunga kōrero (Maori), whakaputunga (Maori), korpus [masculine] (Norwegian), corpus [masculine] (Portuguese), ко́рпус (kórpus) [masculine] (Russian), собра́ние (sobránije) [neuter] (Russian), korpus [masculine] (Slovak), korpus [masculine] (Slovene), corpus [masculine] (Spanish), korpus [common-gender] (Swedish), språkbank [common-gender] (Swedish), külliyat (english: all works of a single author) (Turkish), ко́рпус (kórpus) [masculine] (Ukrainian), збі́рник (zbírnyk) [masculine] (Ukrainian)
    Sense id: en-corpus-en-noun-vYHqX49g Categories (other): English entries with incorrect language header, English entries with language name categories using raw markup, English links with manual fragments Disambiguation of English entries with incorrect language header: 63 26 11 Disambiguation of English entries with language name categories using raw markup: 59 29 12 Disambiguation of English links with manual fragments: 46 39 15 Disambiguation of 'linguistics: collection of writings': 51 22 27
  2. (specifically, linguistics) Such a collection in form of an electronic database used for linguistic analyses. Tags: specifically Categories (topical): Linguistics Synonyms: digital corpus, text corpus
    Sense id: en-corpus-en-noun-pIRAwBVD Categories (other): English links with manual fragments Disambiguation of English links with manual fragments: 46 39 15 Topics: human-sciences, linguistics, sciences
  3. (uncommon) A body, a collection. Tags: uncommon Synonyms: collection, body
    Sense id: en-corpus-en-noun-kqed46eE Categories (other): English links with manual fragments Disambiguation of English links with manual fragments: 46 39 15
The following are not (yet) sense-disambiguated
Related terms: Wiktionary:Corpora, corpus allatum, corpus callosotomy, corpus fetishism, corpus fimbriatum, corpus juris, corpus separatum, corpus vile

Inflected forms

Alternative forms

Download JSON data for corpus meaning in English (15.1kB)

{
  "derived": [
    {
      "_dis1": "0 0 0",
      "word": "aligned parallel corpus"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus callosum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus cavernosum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus delicti"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus language"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus linguistics"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus luteum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus manager"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus spongiosum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus striatum"
    },
    {
      "_dis1": "0 0 0",
      "word": "habeas corpus"
    }
  ],
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "ine-pro",
        "3": "*krep-"
      },
      "expansion": "",
      "name": "root"
    },
    {
      "args": {
        "1": "loanword",
        "2": "Borrowed"
      },
      "expansion": "Borrowed",
      "name": "glossary"
    },
    {
      "args": {
        "1": "en",
        "2": "la",
        "3": "corpus",
        "4": "",
        "5": "body",
        "g": "",
        "g2": "",
        "g3": "",
        "id": "",
        "lit": "",
        "nocat": "",
        "pos": "",
        "sc": "",
        "sort": "",
        "tr": "",
        "ts": ""
      },
      "expansion": "Latin corpus (“body”)",
      "name": "bor"
    },
    {
      "args": {
        "1": "en",
        "2": "la",
        "3": "corpus",
        "4": "",
        "5": "body"
      },
      "expansion": "Borrowed from Latin corpus (“body”)",
      "name": "bor+"
    },
    {
      "args": {
        "1": "en",
        "2": "corpse",
        "3": "corps",
        "4": "riff#Etymology 2"
      },
      "expansion": "Doublet of corpse, corps, and riff",
      "name": "doublet"
    }
  ],
  "etymology_text": "Borrowed from Latin corpus (“body”). Doublet of corpse, corps, and riff.",
  "forms": [
    {
      "form": "corpora",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpuses",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpusses",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpi",
      "tags": [
        "plural",
        "proscribed"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {
        "1": "corpora",
        "2": "+",
        "3": "corpusses",
        "4": "corpi",
        "pl4qual": "proscribed"
      },
      "expansion": "corpus (plural corpora or corpuses or corpusses or (proscribed) corpi)",
      "name": "en-noun"
    }
  ],
  "hyphenation": [
    "cor‧pus"
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "related": [
    {
      "_dis1": "0 0 0",
      "word": "Wiktionary:Corpora"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus allatum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus callosotomy"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus fetishism"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus fimbriatum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus juris"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus separatum"
    },
    {
      "_dis1": "0 0 0",
      "word": "corpus vile"
    }
  ],
  "senses": [
    {
      "categories": [
        {
          "_dis": "63 26 11",
          "kind": "other",
          "name": "English entries with incorrect language header",
          "parents": [
            "Entries with incorrect language header",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        },
        {
          "_dis": "59 29 12",
          "kind": "other",
          "name": "English entries with language name categories using raw markup",
          "parents": [
            "Entries with language name categories using raw markup",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        },
        {
          "_dis": "46 39 15",
          "kind": "other",
          "name": "English links with manual fragments",
          "parents": [
            "Links with manual fragments",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        }
      ],
      "examples": [
        {
          "ref": "2011, Patrick Spedding, James Lambert, “Fanny Hill, Lord Fanny, and the Myth of Metonymy”, in Studies in Philology, volume 108, number 1, page 113",
          "text": "No one suggests that Browning intended to mean vagina when he wrote “owls and bats, / Cowls and twats,” because the context does not allow for it, nor does the greater context of the Browning corpus.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A collection of writings, often on a specific topic, of a specific genre, from a specific demographic or a particular author, etc."
      ],
      "id": "en-corpus-en-noun-vYHqX49g",
      "links": [
        [
          "collection",
          "collection"
        ],
        [
          "writings",
          "writings"
        ],
        [
          "topic",
          "topic"
        ],
        [
          "genre",
          "genre"
        ],
        [
          "demographic",
          "demographic"
        ],
        [
          "author",
          "author"
        ]
      ],
      "synonyms": [
        {
          "word": "collection"
        },
        {
          "word": "compilation"
        },
        {
          "word": "aggregation"
        },
        {
          "word": "body"
        }
      ],
      "translations": [
        {
          "_dis1": "51 22 27",
          "code": "ar",
          "lang": "Arabic",
          "roman": "matn",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "مَتْن"
        },
        {
          "_dis1": "51 22 27",
          "code": "ar",
          "lang": "Arabic",
          "roman": "maknaz luḡawiyy",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "مَكْنَز لُغَوِيّ"
        },
        {
          "_dis1": "51 22 27",
          "code": "be",
          "lang": "Belarusian",
          "roman": "kórpus",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "ко́рпус"
        },
        {
          "_dis1": "51 22 27",
          "code": "be",
          "lang": "Belarusian",
          "roman": "zbor",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "збор"
        },
        {
          "_dis1": "51 22 27",
          "code": "bg",
          "lang": "Bulgarian",
          "roman": "kórpus",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "ко́рпус"
        },
        {
          "_dis1": "51 22 27",
          "code": "ca",
          "lang": "Catalan",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "cmn",
          "lang": "Chinese Mandarin",
          "sense": "linguistics: collection of writings",
          "word": "語料庫"
        },
        {
          "_dis1": "51 22 27",
          "code": "cmn",
          "lang": "Chinese Mandarin",
          "roman": "yǔliàokù",
          "sense": "linguistics: collection of writings",
          "word": "语料库"
        },
        {
          "_dis1": "51 22 27",
          "code": "cs",
          "lang": "Czech",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "da",
          "lang": "Danish",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "nl",
          "lang": "Dutch",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "eo",
          "lang": "Esperanto",
          "sense": "linguistics: collection of writings",
          "word": "tekstaro"
        },
        {
          "_dis1": "51 22 27",
          "code": "eo",
          "lang": "Esperanto",
          "sense": "linguistics: collection of writings",
          "word": "korpuso"
        },
        {
          "_dis1": "51 22 27",
          "code": "fi",
          "lang": "Finnish",
          "sense": "linguistics: collection of writings",
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "fr",
          "lang": "French",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "de",
          "lang": "German",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "Korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "de",
          "lang": "German",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "Textkorpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "el",
          "lang": "Greek",
          "roman": "sóma",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "σώμα"
        },
        {
          "_dis1": "51 22 27",
          "code": "el",
          "lang": "Greek",
          "roman": "syllogí",
          "sense": "linguistics: collection of writings",
          "tags": [
            "feminine"
          ],
          "word": "συλλογή"
        },
        {
          "_dis1": "51 22 27",
          "code": "hu",
          "lang": "Hungarian",
          "sense": "linguistics: collection of writings",
          "word": "korpusz"
        },
        {
          "_dis1": "51 22 27",
          "code": "it",
          "lang": "Italian",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "ja",
          "lang": "Japanese",
          "roman": "kōpasu",
          "sense": "linguistics: collection of writings",
          "word": "コーパス"
        },
        {
          "_dis1": "51 22 27",
          "code": "ko",
          "lang": "Korean",
          "roman": "malmungchi",
          "sense": "linguistics: collection of writings",
          "word": "말뭉치"
        },
        {
          "_dis1": "51 22 27",
          "code": "ko",
          "lang": "Korean",
          "roman": "kopeoseu",
          "sense": "linguistics: collection of writings",
          "word": "코퍼스"
        },
        {
          "_dis1": "51 22 27",
          "code": "mk",
          "lang": "Macedonian",
          "roman": "kórpus",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "ко́рпус"
        },
        {
          "_dis1": "51 22 27",
          "code": "mi",
          "lang": "Maori",
          "sense": "linguistics: collection of writings",
          "word": "putunga kōrero"
        },
        {
          "_dis1": "51 22 27",
          "code": "mi",
          "lang": "Maori",
          "sense": "linguistics: collection of writings",
          "word": "whakaputunga"
        },
        {
          "_dis1": "51 22 27",
          "code": "no",
          "lang": "Norwegian",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "pt",
          "lang": "Portuguese",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "ru",
          "lang": "Russian",
          "roman": "kórpus",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "ко́рпус"
        },
        {
          "_dis1": "51 22 27",
          "code": "ru",
          "lang": "Russian",
          "roman": "sobránije",
          "sense": "linguistics: collection of writings",
          "tags": [
            "neuter"
          ],
          "word": "собра́ние"
        },
        {
          "_dis1": "51 22 27",
          "code": "sk",
          "lang": "Slovak",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "sl",
          "lang": "Slovene",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "es",
          "lang": "Spanish",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "corpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "sv",
          "lang": "Swedish",
          "sense": "linguistics: collection of writings",
          "tags": [
            "common-gender"
          ],
          "word": "korpus"
        },
        {
          "_dis1": "51 22 27",
          "code": "sv",
          "lang": "Swedish",
          "sense": "linguistics: collection of writings",
          "tags": [
            "common-gender"
          ],
          "word": "språkbank"
        },
        {
          "_dis1": "51 22 27",
          "code": "tr",
          "english": "all works of a single author",
          "lang": "Turkish",
          "sense": "linguistics: collection of writings",
          "word": "külliyat"
        },
        {
          "_dis1": "51 22 27",
          "code": "uk",
          "lang": "Ukrainian",
          "roman": "kórpus",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "ко́рпус"
        },
        {
          "_dis1": "51 22 27",
          "code": "uk",
          "lang": "Ukrainian",
          "roman": "zbírnyk",
          "sense": "linguistics: collection of writings",
          "tags": [
            "masculine"
          ],
          "word": "збі́рник"
        }
      ]
    },
    {
      "categories": [
        {
          "kind": "topical",
          "langcode": "en",
          "name": "Linguistics",
          "orig": "en:Linguistics",
          "parents": [
            "Language",
            "Social sciences",
            "Communication",
            "Sciences",
            "Society",
            "All topics",
            "Fundamental"
          ],
          "source": "w"
        },
        {
          "_dis": "46 39 15",
          "kind": "other",
          "name": "English links with manual fragments",
          "parents": [
            "Links with manual fragments",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        }
      ],
      "examples": [
        {
          "ref": "2007, Mihail Mihailov, Hannu Tommola, “Compiling Parallel Text Corpora: Towards Automation of Routine Procedures”, in Wolfgang Teubert, editor, Text Corpora and Multilingual Lexicography (Benjamins Current Topics; 8), Amsterdam: John Benjamins Publishing Company, page 60",
          "text": "Text corpora are being used in most current lexicographic projects. Applied linguistic research is another field where text corpora are welcome as an inexhaustible source of empirical information, a polygon for testing various linguistic tools – spell-checkers, OCRs, machine translation systems, NLP systems, etc.",
          "type": "quotation"
        },
        {
          "ref": "2008, Anabel Borja, “Corpora for Translators in Spain. The CDJ-GITRAD Corpus and the GENITT Project.”, in Gunilla [M.] Anderman, Margaret Rogers, editors, Incorporating Corpora: The Linguist and the Translator, Clevedon, North Somerset: Multilingual Matters, page 248",
          "text": "Comparable corpora are made up of texts in different languages that may be related in various ways, but are not translations of each other. They may have nothing in common at all, or be on the same subject, of the same genre, or from the same chronological period, etc.",
          "type": "quotation"
        },
        {
          "ref": "2013, “Introduction”, in Gerry Knowles, Briony Williams, L[ita] Taylor, editors, A Corpus of Formal British English Speech: The Lancaster/IBM Spoken English Corpus, Abingdon, Oxon., New York, N.Y.: Routledge, page 1",
          "text": "The Lancaster/IBM Spoken English Corpus began in September 1984 as part of a research project into the automatic assignment of intonation […] The original design of the corpus was determined by the need to provide data for research into speech synthesis. As a result, unlike most other corpora currently being used in the computational linguistics field, the SEC exists in several forms. […] However, whatever the original motivation for compiling a corpus, it quickly becomes an object of interest in its own right. New users find it valuable for applications for which it was not designed.",
          "type": "quotation"
        },
        {
          "ref": "2014, Giuseppina Balossi, “Corpus Approaches to the Study of Language and Literature”, in A Corpus Linguistic Approach to Literary Language and Characterization: Virginia Woolf's The Waves (Linguistic Approaches to Literature; 18), Amsterdam: John Benjamins Publishing Company, page 41",
          "text": "A corpus approach is a useful methodology for observing, describing and interpreting the stylistic features of language in literary and non-literary texts.",
          "type": "quotation"
        },
        {
          "ref": "2018, James Lambert, “A multitude of ‘lishes’: The nomenclature of hybridity”, in English World-Wide, page 4",
          "text": "Today, computer databases and corpora infinitely increase the ease of this type of research, but the collecting process remains essentially the same.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "Such a collection in form of an electronic database used for linguistic analyses."
      ],
      "id": "en-corpus-en-noun-pIRAwBVD",
      "links": [
        [
          "linguistics",
          "linguistics"
        ],
        [
          "collection",
          "collection"
        ],
        [
          "electronic",
          "electronic"
        ],
        [
          "database",
          "database"
        ],
        [
          "linguistic",
          "linguistic"
        ]
      ],
      "raw_glosses": [
        "(specifically, linguistics) Such a collection in form of an electronic database used for linguistic analyses."
      ],
      "synonyms": [
        {
          "word": "digital corpus"
        },
        {
          "word": "text corpus"
        }
      ],
      "tags": [
        "specifically"
      ],
      "topics": [
        "human-sciences",
        "linguistics",
        "sciences"
      ]
    },
    {
      "categories": [
        {
          "_dis": "46 39 15",
          "kind": "other",
          "name": "English links with manual fragments",
          "parents": [
            "Links with manual fragments",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        }
      ],
      "examples": [
        {
          "ref": "1998, Dimitǎr Draganov, “New Coin Types of Hadrianopolis”, in Ulrike Peter, editor, Stephanos Nomismatikos: Edith Schönert-Geiss zum 65. Geburtstag (Griechisches Münzwerk), Berlin: Akademie Verlag, page 221",
          "text": "About a hundred years ago in Germany, the publishing of corpuses of the ancient Greek coinages was started. […] The significance of those, and some other corpuses is exclusive, because they allowed an enormous amount of numismatic material kept in museum and private collections all over the world, to be studied and systematized.",
          "type": "quotation"
        },
        {
          "ref": "2014, Margaret Darling, Barbara Precious, “Introduction”, in A Corpus of Roman Pottery from Lincoln (Lincoln Archaeological Studies; 6), Oxford: Oxbow Books, page 1",
          "text": "An assessment in 1991 proposed publication of the results of this work in three stages: […] secondly, a corpus of the Roman pottery to present the type series and to discuss the fabrics and forms recovered, […]",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A body, a collection."
      ],
      "id": "en-corpus-en-noun-kqed46eE",
      "links": [
        [
          "body",
          "body"
        ],
        [
          "collection",
          "collection"
        ]
      ],
      "raw_glosses": [
        "(uncommon) A body, a collection."
      ],
      "synonyms": [
        {
          "word": "collection"
        },
        {
          "word": "body"
        }
      ],
      "tags": [
        "uncommon"
      ]
    }
  ],
  "sounds": [
    {
      "ipa": "/ˈkɔːpəs/",
      "tags": [
        "Received-Pronunciation"
      ]
    },
    {
      "ipa": "/ˈkɔɹpəs/",
      "tags": [
        "General-American"
      ]
    },
    {
      "rhymes": "-ɔː(ɹ)pəs"
    },
    {
      "audio": "en-au-corpus.ogg",
      "mp3_url": "https://upload.wikimedia.org/wikipedia/commons/transcoded/4/4b/En-au-corpus.ogg/En-au-corpus.ogg.mp3",
      "ogg_url": "https://upload.wikimedia.org/wikipedia/commons/4/4b/En-au-corpus.ogg",
      "tags": [
        "Australia"
      ],
      "text": "Audio (AU)"
    }
  ],
  "word": "corpus"
}
{
  "categories": [
    "English 2-syllable words",
    "English countable nouns",
    "English doublets",
    "English entries with incorrect language header",
    "English entries with language name categories using raw markup",
    "English lemmas",
    "English links with manual fragments",
    "English nouns",
    "English nouns with irregular plurals",
    "English terms borrowed from Latin",
    "English terms derived from Latin",
    "English terms derived from Proto-Indo-European",
    "English terms derived from the Proto-Indo-European root *krep-",
    "English terms with IPA pronunciation",
    "English terms with audio links",
    "Rhymes:English/ɔː(ɹ)pəs",
    "Rhymes:English/ɔː(ɹ)pəs/2 syllables"
  ],
  "derived": [
    {
      "word": "aligned parallel corpus"
    },
    {
      "word": "corpus callosum"
    },
    {
      "word": "corpus cavernosum"
    },
    {
      "word": "corpus delicti"
    },
    {
      "word": "corpus language"
    },
    {
      "word": "corpus linguistics"
    },
    {
      "word": "corpus luteum"
    },
    {
      "word": "corpus manager"
    },
    {
      "word": "corpus spongiosum"
    },
    {
      "word": "corpus striatum"
    },
    {
      "word": "habeas corpus"
    }
  ],
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "ine-pro",
        "3": "*krep-"
      },
      "expansion": "",
      "name": "root"
    },
    {
      "args": {
        "1": "loanword",
        "2": "Borrowed"
      },
      "expansion": "Borrowed",
      "name": "glossary"
    },
    {
      "args": {
        "1": "en",
        "2": "la",
        "3": "corpus",
        "4": "",
        "5": "body",
        "g": "",
        "g2": "",
        "g3": "",
        "id": "",
        "lit": "",
        "nocat": "",
        "pos": "",
        "sc": "",
        "sort": "",
        "tr": "",
        "ts": ""
      },
      "expansion": "Latin corpus (“body”)",
      "name": "bor"
    },
    {
      "args": {
        "1": "en",
        "2": "la",
        "3": "corpus",
        "4": "",
        "5": "body"
      },
      "expansion": "Borrowed from Latin corpus (“body”)",
      "name": "bor+"
    },
    {
      "args": {
        "1": "en",
        "2": "corpse",
        "3": "corps",
        "4": "riff#Etymology 2"
      },
      "expansion": "Doublet of corpse, corps, and riff",
      "name": "doublet"
    }
  ],
  "etymology_text": "Borrowed from Latin corpus (“body”). Doublet of corpse, corps, and riff.",
  "forms": [
    {
      "form": "corpora",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpuses",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpusses",
      "tags": [
        "plural"
      ]
    },
    {
      "form": "corpi",
      "tags": [
        "plural",
        "proscribed"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {
        "1": "corpora",
        "2": "+",
        "3": "corpusses",
        "4": "corpi",
        "pl4qual": "proscribed"
      },
      "expansion": "corpus (plural corpora or corpuses or corpusses or (proscribed) corpi)",
      "name": "en-noun"
    }
  ],
  "hyphenation": [
    "cor‧pus"
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "related": [
    {
      "word": "Wiktionary:Corpora"
    },
    {
      "word": "corpus allatum"
    },
    {
      "word": "corpus callosotomy"
    },
    {
      "word": "corpus fetishism"
    },
    {
      "word": "corpus fimbriatum"
    },
    {
      "word": "corpus juris"
    },
    {
      "word": "corpus separatum"
    },
    {
      "word": "corpus vile"
    }
  ],
  "senses": [
    {
      "categories": [
        "English terms with quotations"
      ],
      "examples": [
        {
          "ref": "2011, Patrick Spedding, James Lambert, “Fanny Hill, Lord Fanny, and the Myth of Metonymy”, in Studies in Philology, volume 108, number 1, page 113",
          "text": "No one suggests that Browning intended to mean vagina when he wrote “owls and bats, / Cowls and twats,” because the context does not allow for it, nor does the greater context of the Browning corpus.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A collection of writings, often on a specific topic, of a specific genre, from a specific demographic or a particular author, etc."
      ],
      "links": [
        [
          "collection",
          "collection"
        ],
        [
          "writings",
          "writings"
        ],
        [
          "topic",
          "topic"
        ],
        [
          "genre",
          "genre"
        ],
        [
          "demographic",
          "demographic"
        ],
        [
          "author",
          "author"
        ]
      ],
      "synonyms": [
        {
          "word": "collection"
        },
        {
          "word": "compilation"
        },
        {
          "word": "aggregation"
        },
        {
          "word": "body"
        }
      ]
    },
    {
      "categories": [
        "English terms with quotations",
        "en:Linguistics"
      ],
      "examples": [
        {
          "ref": "2007, Mihail Mihailov, Hannu Tommola, “Compiling Parallel Text Corpora: Towards Automation of Routine Procedures”, in Wolfgang Teubert, editor, Text Corpora and Multilingual Lexicography (Benjamins Current Topics; 8), Amsterdam: John Benjamins Publishing Company, page 60",
          "text": "Text corpora are being used in most current lexicographic projects. Applied linguistic research is another field where text corpora are welcome as an inexhaustible source of empirical information, a polygon for testing various linguistic tools – spell-checkers, OCRs, machine translation systems, NLP systems, etc.",
          "type": "quotation"
        },
        {
          "ref": "2008, Anabel Borja, “Corpora for Translators in Spain. The CDJ-GITRAD Corpus and the GENITT Project.”, in Gunilla [M.] Anderman, Margaret Rogers, editors, Incorporating Corpora: The Linguist and the Translator, Clevedon, North Somerset: Multilingual Matters, page 248",
          "text": "Comparable corpora are made up of texts in different languages that may be related in various ways, but are not translations of each other. They may have nothing in common at all, or be on the same subject, of the same genre, or from the same chronological period, etc.",
          "type": "quotation"
        },
        {
          "ref": "2013, “Introduction”, in Gerry Knowles, Briony Williams, L[ita] Taylor, editors, A Corpus of Formal British English Speech: The Lancaster/IBM Spoken English Corpus, Abingdon, Oxon., New York, N.Y.: Routledge, page 1",
          "text": "The Lancaster/IBM Spoken English Corpus began in September 1984 as part of a research project into the automatic assignment of intonation […] The original design of the corpus was determined by the need to provide data for research into speech synthesis. As a result, unlike most other corpora currently being used in the computational linguistics field, the SEC exists in several forms. […] However, whatever the original motivation for compiling a corpus, it quickly becomes an object of interest in its own right. New users find it valuable for applications for which it was not designed.",
          "type": "quotation"
        },
        {
          "ref": "2014, Giuseppina Balossi, “Corpus Approaches to the Study of Language and Literature”, in A Corpus Linguistic Approach to Literary Language and Characterization: Virginia Woolf's The Waves (Linguistic Approaches to Literature; 18), Amsterdam: John Benjamins Publishing Company, page 41",
          "text": "A corpus approach is a useful methodology for observing, describing and interpreting the stylistic features of language in literary and non-literary texts.",
          "type": "quotation"
        },
        {
          "ref": "2018, James Lambert, “A multitude of ‘lishes’: The nomenclature of hybridity”, in English World-Wide, page 4",
          "text": "Today, computer databases and corpora infinitely increase the ease of this type of research, but the collecting process remains essentially the same.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "Such a collection in form of an electronic database used for linguistic analyses."
      ],
      "links": [
        [
          "linguistics",
          "linguistics"
        ],
        [
          "collection",
          "collection"
        ],
        [
          "electronic",
          "electronic"
        ],
        [
          "database",
          "database"
        ],
        [
          "linguistic",
          "linguistic"
        ]
      ],
      "raw_glosses": [
        "(specifically, linguistics) Such a collection in form of an electronic database used for linguistic analyses."
      ],
      "synonyms": [
        {
          "word": "digital corpus"
        },
        {
          "word": "text corpus"
        }
      ],
      "tags": [
        "specifically"
      ],
      "topics": [
        "human-sciences",
        "linguistics",
        "sciences"
      ]
    },
    {
      "categories": [
        "English terms with quotations",
        "English terms with uncommon senses"
      ],
      "examples": [
        {
          "ref": "1998, Dimitǎr Draganov, “New Coin Types of Hadrianopolis”, in Ulrike Peter, editor, Stephanos Nomismatikos: Edith Schönert-Geiss zum 65. Geburtstag (Griechisches Münzwerk), Berlin: Akademie Verlag, page 221",
          "text": "About a hundred years ago in Germany, the publishing of corpuses of the ancient Greek coinages was started. […] The significance of those, and some other corpuses is exclusive, because they allowed an enormous amount of numismatic material kept in museum and private collections all over the world, to be studied and systematized.",
          "type": "quotation"
        },
        {
          "ref": "2014, Margaret Darling, Barbara Precious, “Introduction”, in A Corpus of Roman Pottery from Lincoln (Lincoln Archaeological Studies; 6), Oxford: Oxbow Books, page 1",
          "text": "An assessment in 1991 proposed publication of the results of this work in three stages: […] secondly, a corpus of the Roman pottery to present the type series and to discuss the fabrics and forms recovered, […]",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A body, a collection."
      ],
      "links": [
        [
          "body",
          "body"
        ],
        [
          "collection",
          "collection"
        ]
      ],
      "raw_glosses": [
        "(uncommon) A body, a collection."
      ],
      "synonyms": [
        {
          "word": "collection"
        },
        {
          "word": "body"
        }
      ],
      "tags": [
        "uncommon"
      ]
    }
  ],
  "sounds": [
    {
      "ipa": "/ˈkɔːpəs/",
      "tags": [
        "Received-Pronunciation"
      ]
    },
    {
      "ipa": "/ˈkɔɹpəs/",
      "tags": [
        "General-American"
      ]
    },
    {
      "rhymes": "-ɔː(ɹ)pəs"
    },
    {
      "audio": "en-au-corpus.ogg",
      "mp3_url": "https://upload.wikimedia.org/wikipedia/commons/transcoded/4/4b/En-au-corpus.ogg/En-au-corpus.ogg.mp3",
      "ogg_url": "https://upload.wikimedia.org/wikipedia/commons/4/4b/En-au-corpus.ogg",
      "tags": [
        "Australia"
      ],
      "text": "Audio (AU)"
    }
  ],
  "translations": [
    {
      "code": "ar",
      "lang": "Arabic",
      "roman": "matn",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "مَتْن"
    },
    {
      "code": "ar",
      "lang": "Arabic",
      "roman": "maknaz luḡawiyy",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "مَكْنَز لُغَوِيّ"
    },
    {
      "code": "be",
      "lang": "Belarusian",
      "roman": "kórpus",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "ко́рпус"
    },
    {
      "code": "be",
      "lang": "Belarusian",
      "roman": "zbor",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "збор"
    },
    {
      "code": "bg",
      "lang": "Bulgarian",
      "roman": "kórpus",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "ко́рпус"
    },
    {
      "code": "ca",
      "lang": "Catalan",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "corpus"
    },
    {
      "code": "cmn",
      "lang": "Chinese Mandarin",
      "sense": "linguistics: collection of writings",
      "word": "語料庫"
    },
    {
      "code": "cmn",
      "lang": "Chinese Mandarin",
      "roman": "yǔliàokù",
      "sense": "linguistics: collection of writings",
      "word": "语料库"
    },
    {
      "code": "cs",
      "lang": "Czech",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "korpus"
    },
    {
      "code": "da",
      "lang": "Danish",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "korpus"
    },
    {
      "code": "nl",
      "lang": "Dutch",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "corpus"
    },
    {
      "code": "eo",
      "lang": "Esperanto",
      "sense": "linguistics: collection of writings",
      "word": "tekstaro"
    },
    {
      "code": "eo",
      "lang": "Esperanto",
      "sense": "linguistics: collection of writings",
      "word": "korpuso"
    },
    {
      "code": "fi",
      "lang": "Finnish",
      "sense": "linguistics: collection of writings",
      "word": "korpus"
    },
    {
      "code": "fr",
      "lang": "French",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "corpus"
    },
    {
      "code": "de",
      "lang": "German",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "Korpus"
    },
    {
      "code": "de",
      "lang": "German",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "Textkorpus"
    },
    {
      "code": "el",
      "lang": "Greek",
      "roman": "sóma",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "σώμα"
    },
    {
      "code": "el",
      "lang": "Greek",
      "roman": "syllogí",
      "sense": "linguistics: collection of writings",
      "tags": [
        "feminine"
      ],
      "word": "συλλογή"
    },
    {
      "code": "hu",
      "lang": "Hungarian",
      "sense": "linguistics: collection of writings",
      "word": "korpusz"
    },
    {
      "code": "it",
      "lang": "Italian",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "corpus"
    },
    {
      "code": "ja",
      "lang": "Japanese",
      "roman": "kōpasu",
      "sense": "linguistics: collection of writings",
      "word": "コーパス"
    },
    {
      "code": "ko",
      "lang": "Korean",
      "roman": "malmungchi",
      "sense": "linguistics: collection of writings",
      "word": "말뭉치"
    },
    {
      "code": "ko",
      "lang": "Korean",
      "roman": "kopeoseu",
      "sense": "linguistics: collection of writings",
      "word": "코퍼스"
    },
    {
      "code": "mk",
      "lang": "Macedonian",
      "roman": "kórpus",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "ко́рпус"
    },
    {
      "code": "mi",
      "lang": "Maori",
      "sense": "linguistics: collection of writings",
      "word": "putunga kōrero"
    },
    {
      "code": "mi",
      "lang": "Maori",
      "sense": "linguistics: collection of writings",
      "word": "whakaputunga"
    },
    {
      "code": "no",
      "lang": "Norwegian",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "korpus"
    },
    {
      "code": "pt",
      "lang": "Portuguese",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "corpus"
    },
    {
      "code": "ru",
      "lang": "Russian",
      "roman": "kórpus",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "ко́рпус"
    },
    {
      "code": "ru",
      "lang": "Russian",
      "roman": "sobránije",
      "sense": "linguistics: collection of writings",
      "tags": [
        "neuter"
      ],
      "word": "собра́ние"
    },
    {
      "code": "sk",
      "lang": "Slovak",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "korpus"
    },
    {
      "code": "sl",
      "lang": "Slovene",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "korpus"
    },
    {
      "code": "es",
      "lang": "Spanish",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "corpus"
    },
    {
      "code": "sv",
      "lang": "Swedish",
      "sense": "linguistics: collection of writings",
      "tags": [
        "common-gender"
      ],
      "word": "korpus"
    },
    {
      "code": "sv",
      "lang": "Swedish",
      "sense": "linguistics: collection of writings",
      "tags": [
        "common-gender"
      ],
      "word": "språkbank"
    },
    {
      "code": "tr",
      "english": "all works of a single author",
      "lang": "Turkish",
      "sense": "linguistics: collection of writings",
      "word": "külliyat"
    },
    {
      "code": "uk",
      "lang": "Ukrainian",
      "roman": "kórpus",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "ко́рпус"
    },
    {
      "code": "uk",
      "lang": "Ukrainian",
      "roman": "zbírnyk",
      "sense": "linguistics: collection of writings",
      "tags": [
        "masculine"
      ],
      "word": "збі́рник"
    }
  ],
  "word": "corpus"
}

This page is a part of the kaikki.org machine-readable English dictionary. This dictionary is based on structured data extracted on 2024-05-05 from the enwiktionary dump dated 2024-05-02 using wiktextract (f4fd8c9 and c9440ce). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.