"tokenize" meaning in English

See tokenize in All languages combined, or Wiktionary

Verb

IPA: /ˈtoʊ.kən.aɪz/ [General-American] Forms: tokenizes [present, singular, third-person], tokenizing [participle, present], tokenized [participle, past], tokenized [past]
Etymology: From token + -ize. Etymology templates: {{suffix|en|token|ize}} token + -ize Head templates: {{en-verb}} tokenize (third-person singular simple present tokenizes, present participle tokenizing, simple past and past participle tokenized)
  1. (transitive, computing) To reduce to a token or set of tokens by lexical analysis. Tags: transitive Categories (topical): Computing
    Sense id: en-tokenize-en-verb-jT9kOSc9 Categories (other): English entries with incorrect language header, English terms suffixed with -ize, Pages with 1 entry, Pages with entries Disambiguation of English entries with incorrect language header: 61 33 6 Disambiguation of English terms suffixed with -ize: 52 28 20 Disambiguation of Pages with 1 entry: 61 32 7 Disambiguation of Pages with entries: 65 31 5 Topics: computing, engineering, mathematics, natural-sciences, physical-sciences, sciences
  2. (transitive, computing) To substitute sensitive data with meaningless placeholders. Tags: transitive Categories (topical): Computing
    Sense id: en-tokenize-en-verb--Kj3NOCS Topics: computing, engineering, mathematics, natural-sciences, physical-sciences, sciences
  3. (transitive) To treat as a token minority. Tags: transitive
    Sense id: en-tokenize-en-verb-bFvBu9Vi
The following are not (yet) sense-disambiguated
Derived forms: tokenizable, tokenizer Related terms: tokenization

Inflected forms

Alternative forms

{
  "derived": [
    {
      "_dis1": "0 0 0",
      "word": "tokenizable"
    },
    {
      "_dis1": "0 0 0",
      "word": "tokenizer"
    }
  ],
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "token",
        "3": "ize"
      },
      "expansion": "token + -ize",
      "name": "suffix"
    }
  ],
  "etymology_text": "From token + -ize.",
  "forms": [
    {
      "form": "tokenizes",
      "tags": [
        "present",
        "singular",
        "third-person"
      ]
    },
    {
      "form": "tokenizing",
      "tags": [
        "participle",
        "present"
      ]
    },
    {
      "form": "tokenized",
      "tags": [
        "participle",
        "past"
      ]
    },
    {
      "form": "tokenized",
      "tags": [
        "past"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "tokenize (third-person singular simple present tokenizes, present participle tokenizing, simple past and past participle tokenized)",
      "name": "en-verb"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "verb",
  "related": [
    {
      "_dis1": "0 0 0",
      "word": "tokenization"
    }
  ],
  "senses": [
    {
      "categories": [
        {
          "kind": "topical",
          "langcode": "en",
          "name": "Computing",
          "orig": "en:Computing",
          "parents": [
            "Technology",
            "All topics",
            "Fundamental"
          ],
          "source": "w"
        },
        {
          "_dis": "61 33 6",
          "kind": "other",
          "name": "English entries with incorrect language header",
          "parents": [
            "Entries with incorrect language header",
            "Entry maintenance"
          ],
          "source": "w+disamb"
        },
        {
          "_dis": "52 28 20",
          "kind": "other",
          "name": "English terms suffixed with -ize",
          "parents": [],
          "source": "w+disamb"
        },
        {
          "_dis": "61 32 7",
          "kind": "other",
          "name": "Pages with 1 entry",
          "parents": [],
          "source": "w+disamb"
        },
        {
          "_dis": "65 31 5",
          "kind": "other",
          "name": "Pages with entries",
          "parents": [],
          "source": "w+disamb"
        }
      ],
      "glosses": [
        "To reduce to a token or set of tokens by lexical analysis."
      ],
      "id": "en-tokenize-en-verb-jT9kOSc9",
      "links": [
        [
          "computing",
          "computing#Noun"
        ],
        [
          "token",
          "token"
        ],
        [
          "lexical analysis",
          "lexical analysis"
        ]
      ],
      "raw_glosses": [
        "(transitive, computing) To reduce to a token or set of tokens by lexical analysis."
      ],
      "tags": [
        "transitive"
      ],
      "topics": [
        "computing",
        "engineering",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ]
    },
    {
      "categories": [
        {
          "kind": "topical",
          "langcode": "en",
          "name": "Computing",
          "orig": "en:Computing",
          "parents": [
            "Technology",
            "All topics",
            "Fundamental"
          ],
          "source": "w"
        }
      ],
      "glosses": [
        "To substitute sensitive data with meaningless placeholders."
      ],
      "id": "en-tokenize-en-verb--Kj3NOCS",
      "links": [
        [
          "computing",
          "computing#Noun"
        ]
      ],
      "raw_glosses": [
        "(transitive, computing) To substitute sensitive data with meaningless placeholders."
      ],
      "tags": [
        "transitive"
      ],
      "topics": [
        "computing",
        "engineering",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ]
    },
    {
      "categories": [],
      "glosses": [
        "To treat as a token minority."
      ],
      "id": "en-tokenize-en-verb-bFvBu9Vi",
      "links": [
        [
          "token",
          "token"
        ],
        [
          "minority",
          "minority"
        ]
      ],
      "raw_glosses": [
        "(transitive) To treat as a token minority."
      ],
      "tags": [
        "transitive"
      ]
    }
  ],
  "sounds": [
    {
      "ipa": "/ˈtoʊ.kən.aɪz/",
      "tags": [
        "General-American"
      ]
    }
  ],
  "wikipedia": [
    "Tokenization"
  ],
  "word": "tokenize"
}
{
  "categories": [
    "English 3-syllable words",
    "English entries with incorrect language header",
    "English lemmas",
    "English terms suffixed with -ize",
    "English verbs",
    "Pages with 1 entry",
    "Pages with entries"
  ],
  "derived": [
    {
      "word": "tokenizable"
    },
    {
      "word": "tokenizer"
    }
  ],
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "token",
        "3": "ize"
      },
      "expansion": "token + -ize",
      "name": "suffix"
    }
  ],
  "etymology_text": "From token + -ize.",
  "forms": [
    {
      "form": "tokenizes",
      "tags": [
        "present",
        "singular",
        "third-person"
      ]
    },
    {
      "form": "tokenizing",
      "tags": [
        "participle",
        "present"
      ]
    },
    {
      "form": "tokenized",
      "tags": [
        "participle",
        "past"
      ]
    },
    {
      "form": "tokenized",
      "tags": [
        "past"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "tokenize (third-person singular simple present tokenizes, present participle tokenizing, simple past and past participle tokenized)",
      "name": "en-verb"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "verb",
  "related": [
    {
      "word": "tokenization"
    }
  ],
  "senses": [
    {
      "categories": [
        "English transitive verbs",
        "en:Computing"
      ],
      "glosses": [
        "To reduce to a token or set of tokens by lexical analysis."
      ],
      "links": [
        [
          "computing",
          "computing#Noun"
        ],
        [
          "token",
          "token"
        ],
        [
          "lexical analysis",
          "lexical analysis"
        ]
      ],
      "raw_glosses": [
        "(transitive, computing) To reduce to a token or set of tokens by lexical analysis."
      ],
      "tags": [
        "transitive"
      ],
      "topics": [
        "computing",
        "engineering",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ]
    },
    {
      "categories": [
        "English transitive verbs",
        "en:Computing"
      ],
      "glosses": [
        "To substitute sensitive data with meaningless placeholders."
      ],
      "links": [
        [
          "computing",
          "computing#Noun"
        ]
      ],
      "raw_glosses": [
        "(transitive, computing) To substitute sensitive data with meaningless placeholders."
      ],
      "tags": [
        "transitive"
      ],
      "topics": [
        "computing",
        "engineering",
        "mathematics",
        "natural-sciences",
        "physical-sciences",
        "sciences"
      ]
    },
    {
      "categories": [
        "English transitive verbs"
      ],
      "glosses": [
        "To treat as a token minority."
      ],
      "links": [
        [
          "token",
          "token"
        ],
        [
          "minority",
          "minority"
        ]
      ],
      "raw_glosses": [
        "(transitive) To treat as a token minority."
      ],
      "tags": [
        "transitive"
      ]
    }
  ],
  "sounds": [
    {
      "ipa": "/ˈtoʊ.kən.aɪz/",
      "tags": [
        "General-American"
      ]
    }
  ],
  "wikipedia": [
    "Tokenization"
  ],
  "word": "tokenize"
}

Download raw JSONL data for tokenize meaning in English (2.2kB)


This page is a part of the kaikki.org machine-readable English dictionary. This dictionary is based on structured data extracted on 2025-01-10 from the enwiktionary dump dated 2025-01-01 using wiktextract (df33d17 and 4ed51a5). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.