"data lake" meaning in English

See data lake in All languages combined, or Wiktionary

Noun

Forms: data lakes [plural]
Etymology: Coined by Pentaho CEO James Dixon in 2010. Etymology templates: {{coin|en|James Dixon|in=2010|nobycat=1|occ=Pentaho CEO|w=-}} Coined by Pentaho CEO James Dixon in 2010 Head templates: {{en-noun}} data lake (plural data lakes)
  1. A massive, easily accessible data repository built on inexpensive computer hardware for storing big data. Categories (topical): Databases Hyponyms: data swamp Derived forms: data lakehouse

Inflected forms

Download JSON data for data lake meaning in English (3.8kB)

{
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "James Dixon",
        "in": "2010",
        "nobycat": "1",
        "occ": "Pentaho CEO",
        "w": "-"
      },
      "expansion": "Coined by Pentaho CEO James Dixon in 2010",
      "name": "coin"
    }
  ],
  "etymology_text": "Coined by Pentaho CEO James Dixon in 2010.",
  "forms": [
    {
      "form": "data lakes",
      "tags": [
        "plural"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "data lake (plural data lakes)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "senses": [
    {
      "categories": [
        {
          "kind": "other",
          "name": "English entries with incorrect language header",
          "parents": [
            "Entries with incorrect language header",
            "Entry maintenance"
          ],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "English entries with topic categories using raw markup",
          "parents": [
            "Entries with topic categories using raw markup",
            "Entry maintenance"
          ],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "English terms with non-redundant non-automated sortkeys",
          "parents": [
            "Terms with non-redundant non-automated sortkeys",
            "Entry maintenance"
          ],
          "source": "w"
        },
        {
          "kind": "other",
          "name": "Undetermined quotations with omitted translation",
          "parents": [
            "Quotations with omitted translation",
            "Entry maintenance"
          ],
          "source": "w"
        },
        {
          "kind": "topical",
          "langcode": "en",
          "name": "Databases",
          "orig": "en:Databases",
          "parents": [
            "Computing",
            "Technology",
            "All topics",
            "Fundamental"
          ],
          "source": "w"
        }
      ],
      "derived": [
        {
          "word": "data lakehouse"
        }
      ],
      "examples": [
        {
          "ref": "2010 September 21, Jos van Dongen, Twitter",
          "text": "Data Lake? Cute! RT @mattcasters: Great intro to Pentaho Hadoop int. by Will @wpgorman \"Battlebricks\" Gorman : http://vimeo.com/14641559",
          "type": "quotation"
        },
        {
          "ref": "2010 October 14, James Dixon, “Pentaho, Hadoop, and Data Lakes”, in James Dixon’s Blog",
          "text": "Based on the requirements above and the problems of the traditional solutions we have created a concept called the Data Lake to describe an optimal solution. If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.",
          "type": "quotation"
        },
        {
          "ref": "2013, Juergen Klenk, Yugal Sharma, Jeni Fan, “Saving Lives with Big Data: Unlocking the Hidden Potential in Electronic Health Records”, in Jay Liebowitz, editor, Big Data and Business Analytics, CRC Press, page 122",
          "text": "Perhaps the most transformative aspect of an analytics architecture that incorporates a data lake is that users do not need to have the possible answers in mind when they ask the questions.",
          "type": "quotation"
        },
        {
          "ref": "2013, Soumendra Mohanty, Madhu Jagadeesh, Harsha Srivatsa, Big Data Imperatives […], Apress, page 43",
          "text": "The difference between a data lake and a data warehouse is that in a data warehouse, the data is pre-categorized at the point of entry, which can dictate how it’s going to be analyzed.",
          "type": "quotation"
        },
        {
          "ref": "2014, Pethuru Raj, Ganesh Chandra Deka, editors, Handbook of Research on Cloud Infrastructures for Big Data Analytics, IGI Global, page 105",
          "text": "The data lake, in turn, supports a two-step process to analyze the data.",
          "type": "quotation"
        },
        {
          "ref": "2014 January 14, Edd Dumbill, “The Data Lake Dream”, in Forbes",
          "text": "One phrase in particular has become popular for describing the massing of data into Hadoop, the “Data Lake”, and indeed, this term has been adopted by Pivotal for their enterprise big data strategy.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A massive, easily accessible data repository built on inexpensive computer hardware for storing big data."
      ],
      "hyponyms": [
        {
          "word": "data swamp"
        }
      ],
      "id": "en-data_lake-en-noun-Xo5nS0N4",
      "links": [
        [
          "big data",
          "big data"
        ]
      ]
    }
  ],
  "word": "data lake"
}
{
  "derived": [
    {
      "word": "data lakehouse"
    }
  ],
  "etymology_templates": [
    {
      "args": {
        "1": "en",
        "2": "James Dixon",
        "in": "2010",
        "nobycat": "1",
        "occ": "Pentaho CEO",
        "w": "-"
      },
      "expansion": "Coined by Pentaho CEO James Dixon in 2010",
      "name": "coin"
    }
  ],
  "etymology_text": "Coined by Pentaho CEO James Dixon in 2010.",
  "forms": [
    {
      "form": "data lakes",
      "tags": [
        "plural"
      ]
    }
  ],
  "head_templates": [
    {
      "args": {},
      "expansion": "data lake (plural data lakes)",
      "name": "en-noun"
    }
  ],
  "lang": "English",
  "lang_code": "en",
  "pos": "noun",
  "senses": [
    {
      "categories": [
        "English coinages",
        "English countable nouns",
        "English entries with incorrect language header",
        "English entries with topic categories using raw markup",
        "English lemmas",
        "English multiword terms",
        "English nouns",
        "English terms with non-redundant non-automated sortkeys",
        "English terms with quotations",
        "Undetermined quotations with omitted translation",
        "Undetermined terms with quotations",
        "en:Databases"
      ],
      "examples": [
        {
          "ref": "2010 September 21, Jos van Dongen, Twitter",
          "text": "Data Lake? Cute! RT @mattcasters: Great intro to Pentaho Hadoop int. by Will @wpgorman \"Battlebricks\" Gorman : http://vimeo.com/14641559",
          "type": "quotation"
        },
        {
          "ref": "2010 October 14, James Dixon, “Pentaho, Hadoop, and Data Lakes”, in James Dixon’s Blog",
          "text": "Based on the requirements above and the problems of the traditional solutions we have created a concept called the Data Lake to describe an optimal solution. If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.",
          "type": "quotation"
        },
        {
          "ref": "2013, Juergen Klenk, Yugal Sharma, Jeni Fan, “Saving Lives with Big Data: Unlocking the Hidden Potential in Electronic Health Records”, in Jay Liebowitz, editor, Big Data and Business Analytics, CRC Press, page 122",
          "text": "Perhaps the most transformative aspect of an analytics architecture that incorporates a data lake is that users do not need to have the possible answers in mind when they ask the questions.",
          "type": "quotation"
        },
        {
          "ref": "2013, Soumendra Mohanty, Madhu Jagadeesh, Harsha Srivatsa, Big Data Imperatives […], Apress, page 43",
          "text": "The difference between a data lake and a data warehouse is that in a data warehouse, the data is pre-categorized at the point of entry, which can dictate how it’s going to be analyzed.",
          "type": "quotation"
        },
        {
          "ref": "2014, Pethuru Raj, Ganesh Chandra Deka, editors, Handbook of Research on Cloud Infrastructures for Big Data Analytics, IGI Global, page 105",
          "text": "The data lake, in turn, supports a two-step process to analyze the data.",
          "type": "quotation"
        },
        {
          "ref": "2014 January 14, Edd Dumbill, “The Data Lake Dream”, in Forbes",
          "text": "One phrase in particular has become popular for describing the massing of data into Hadoop, the “Data Lake”, and indeed, this term has been adopted by Pivotal for their enterprise big data strategy.",
          "type": "quotation"
        }
      ],
      "glosses": [
        "A massive, easily accessible data repository built on inexpensive computer hardware for storing big data."
      ],
      "hyponyms": [
        {
          "word": "data swamp"
        }
      ],
      "links": [
        [
          "big data",
          "big data"
        ]
      ]
    }
  ],
  "word": "data lake"
}

This page is a part of the kaikki.org machine-readable English dictionary. This dictionary is based on structured data extracted on 2024-05-03 from the enwiktionary dump dated 2024-05-02 using wiktextract (f4fd8c9 and c9440ce). The data shown on this site has been post-processed and various details (e.g., extra categories) removed, some information disambiguated, and additional data merged from other sources. See the raw data download page for the unprocessed wiktextract data.

If you use this data in academic research, please cite Tatu Ylonen: Wiktextract: Wiktionary as Machine-Readable Structured Data, Proceedings of the 13th Conference on Language Resources and Evaluation (LREC), pp. 1317-1325, Marseille, 20-25 June 2022. Linking to the relevant page(s) under https://kaikki.org would also be greatly appreciated.