Showing posts with label data lake vs data warehouse. Show all posts
Showing posts with label data lake vs data warehouse. Show all posts

Monday, May 18, 2020

Technological Benefits of Data Lakes






Data is the business asset for every organisation which is audited and protected. Data can be any form such as structured, semi-structured and unstructured. To handle any kind of the data, Data Lake comes in the picture as a centralized repository to store the data as-is (relational data from line of business applications, and non-relational data from mobile apps, IoT devices, and social media). 

The types of raw data that are stored in a data lake can include:

  • Audio, images and video
  • Communications (blogs, emails, social media, click-streams)
  • Operational data (inventory, sales, tickets, tourism)
  • Machine-generated data (log files, IoT sensor readings)
The most importantly, data lakes are specifically designed to run large scale analytics workloads in a cost-effective way. Within Data Lake, the necessary data is made available to all levels of employees, irrespective of their level or the designation.

All-around Availability of Data — This is the biggest advantage of the Data Lake implementation for any organisation because it gives a surety that all the employees, irrespective of their designation and roles, can have access to data and this term is known as data democratization.
Fetches Quality Data — Data lakes implementation supports many tools and technologies which gives a tremendous data processing power for fetching quality data such as —

Real-time decision analysis — Data lakes take advantage of large quantities of consistent data and deep learning algorithms to arrive at real-time decision analytics by the help of many supportive languages.




Supports SQL and other languages — Conventional data-warehouse technologies support SQL which is good enough for simple analytics. For advanced analytics, other languages are PIG, Hive, Tachyon, Impala and for machine learning, Spark MLlib is over there also.




Operational Analytics Monitoring— Data lakes have all kinds of great benefits for companies, data managers, and data processors. However, with a Data Lake, the necessary data is made available to all levels of employees, irrespective of their level or the designation. Search, explore, filter, aggregate, and visualize business data in near real-time for application monitoring, log analytics, and click stream analytics are easy tasks in Data lake. Just as in the case of Twitter, business user decides whom he wants to connect with or not to connect with, likewise in the case of Data Lakes, a user could choose the required data to meet different business objectives.


Scalable, Versatile and Schema Flexibility- This is the another biggest advantages of Data Lake that data volumes are growing exponentially day by day and unlike traditional data warehouse, Data Leaks offers scalability and is inexpensive as well. There are many technologies (AWS, Azure, Google Cloud etc.) now a days to help you to reduce the cost of your compute usage, like auto-scaling and integration. A data lake can store your versatile data such as XML, logs, multimedia, sensor data, chat, social data, binary, and people data from diverse sources. Hadoop Data Lake enables us to be schema free, or we could come up with multiple schemas for the same data. Meanwhile we can easily separate schema from data, which is good for analytics.

Tuesday, July 17, 2018

Data Lake Vs Data Warehouse


We know that data is the business asset for any organisation which always keeps secure and accessible to business users whenever it required. 
In current era, two techniques are very popular to store the data for the business insights. Hence, we are going to differentiate them based on some technical terms.

One is Data Warehouse which is highly structured store of the data that is requiring a significant amount of discovery, planning, data modeling, and development work before the data becomes available for analysis by the business users.

Second one is a Data Lake which is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed. We can say that Data Lake is a more organic store of data without regard for the perceived value or structure of the data.

Data lakes are a big opportunity to store large amounts of data in an affordable way without having to decide upfront how it must be structured and used. They are typically used to complement traditional data warehouses, which are still better adapted for highly-trusted, tightly-governed data such as your financial figures, but there are some overlaps between the two compositories.

Data Warehouses compared to Data Lakes - Depending on the business requirements, a typical organization will require both a data warehouse and a data lake as they serve different needs, and use cases.
Characteristics
Data Warehouse
Data Lake
Type of data stored
Structured data (most often in columns & rows in a relational database) from transactional systems, operational databases, and line of business applications
Any type of data structure,
any format, including structured, semi-structured, and unstructured data from IoT devices, web sites, mobile apps, social media, and corporate applications
Best way to ingest data
Batch processes
Streaming, micro-batch, or
batch processes
Schema
Designed prior to the DW implementation (schema-on-write)
define the structure of the data at the time of analysis , referred to as schema on reading (schema-on-read)
Typical load pattern
ETL - (Extract, Transform, then Load)
ELT - (Extract, Load, and Transform at the time the data is loaded)
Price/Performance
Fastest query results using higher cost storage
Query results getting faster using low-cost storage
Data Quality
Highly curated data that serves as the central version of the truth
Any data that may or may not be curated (ie. raw data)
Users
Business analysts
Data scientists, Data developers, and Business analysts (using curated data)
Analytics pattern
Determine structure, acquire data, then analyze it; iterate back to change structure as needed.
Batch reporting, BI and visualizations
Acquire data, analyze it, then iterate to determine its final structured form.
Machine Learning, Predictive analytics, data discovery and profiling
During the development of a traditional data warehouse, we should decide a considerable amount of time which is going to spend analyzing data sources, understanding business processes, profiling data, and modeling data.
In contrast, the default expectation for a data lake is to acquire all of the data and retain all of the data.
Please visit us to learn more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse - Introduction
  4. Data Warehouse - Multidimensional Cube
  5. Data Warehouse - Multidimensional Cube Types
  6. Data Warehouse - Architecture and Multidimensional Model
  7. Data Warehouse - Dimension tables.
  8. Data Warehouse - Fact tables.
  9. Data Warehouse - Conceptual Modeling.
  10. Data Warehouse - Star schema.
  11. Data Warehouse - Snowflake schema.
  12. Data Warehouse - Fact constellations
  13. Data Warehouse - OLAP Servers.
  14. Preparation for a successful Data Lake in the cloud
  15. Why does cloud make Data Lakes Better?