What is a Data Lake? Understanding the Buzzword

lomidrevo · Jan 3, 2021

I think the basic idea is quite clear, as for example defined by wikipedia:

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).

But when I google more about this "technology", I am getting quite various ideas about what is considered as data lake. Some of them:

just a synonym to ETL approach to data processing
a distributed file system, like Apache Hadoop HDFS
NoSQL database with additional support of SQL, like for example MondogDB
or some proprietary architecture involving all of that and maybe some extra tools, like reporting, visualization and maybe machine learning?

How do you understand the term data lake? Is it just a buzzword?

pbuk · Jan 3, 2021

lomidrevo said:

Is it just a buzzword?

Yes. It can mean whatever the author wants it to mean.

lomidrevo · Jan 3, 2021

pbuk said:

Yes. It can mean whatever the author wants it to mean.

that is my current impression, thanks :)

sysprog · Jan 3, 2021

Maybe the 'data lake' is the 'reservoir' that engenders and sustains the 'cloud' ##-## I think that such metaphors are used for enablement of non-rigorous semblances of understanding ##-## I have encountered use of such fanciful terms much more by marketers than by engineers.

What is a Data Lake? Understanding the Buzzword

FAQ: What is a Data Lake? Understanding the Buzzword

What is a Data Lake?

Why is it called a Data Lake?

What is the difference between a Data Lake and a Data Warehouse?

How is data stored in a Data Lake?

What are the benefits of using a Data Lake?

Similar threads

Hot Threads

Recent Insights