Emerging Big Data and Cloud Computing
Emerging Big Data and Cloud Computing
However, there are certain basic tenets of Big Data that will make it even
simpler to answer what is Big Data:
GROUP 2
AMIT
2
Around 2005, people began to realize just how much data users generated
through Facebook, YouTube, and other online services. Hadoop (an open-
source framework created specifically to store and analyze big data sets)
was developed that same year. NoSQL also began to gain popularity during
this time.
With the advent of the Internet of Things (IoT), more objects and devices are
connected to the internet, gathering data on customer usage patterns and
product performance. The emergence of machine learning has produced still
more data.
While big data has come far, its usefulness is only just beginning. Cloud
computing has expanded big data possibilities even further. The cloud offers
truly elastic scalability, where developers can simply spin up ad hoc clusters
to test a subset of data.
Big data makes it possible for you to gain more complete answers
because you have more information.
More complete answers mean more confidence in the data—which
means a completely different approach to tackling problems.
Now that we are on track with what is big data, let’s have a look at the types
of big data:
a) Structured
Structured is one of the types of big data and By structured data, we mean
data that can be processed, stored, and retrieved in a fixed format. It refers
to highly organized information that can be readily and seamlessly stored
and accessed from a database by simple search engine algorithms. For
instance, the employee table in a company database will be structured as
the employee details, their job positions, their salaries, etc., will be present
in an organized manner.
b) Unstructured
GROUP 2
AMIT
3
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data.
Structured and unstructured are two important types of big data.
c) Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to
the data containing both the formats mentioned above, that is, structured
and unstructured data. To be precise, it refers to the data that although has
not been classified under a particular repository (database), yet contains
vital information or tags that segregate individual elements within the data.
Thus we come to the end of types of data.
These characteristics, isolated, are enough to know what big data is. Let’s
look at them in depth:
a) Variety
b) Velocity
Velocity essentially refers to the speed at which data is being created in real-
time. In a broader prospect, it comprises the rate of change, linking of
incoming data sets at varying speeds, and activity bursts.
c) Volume
Volume is one of the characteristics of big data. We already know that Big
Data indicates huge ‘volumes’ of data that is being generated on a daily
basis from various sources like social media platforms, business processes,
machines, networks, human interactions, etc. Such a large amount of data is
stored in data warehouses. Thus comes to the end of characteristics of big
data.
GROUP 2
AMIT
4
The importance of big data does not revolve around how much data a
company has but how a company utilizes the collected data. Every company
uses data in its own way; the more efficiently a company uses its data, the
more potential it has to grow. The company can take data from any source
and analyze it to find answers which will enable:
1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts of data
are to be stored and these tools also help in identifying more efficient ways
of doing business.
2. Time Reductions: The high speed of tools like Hadoop and in-memory
analytics can easily identify new sources of data which helps businesses
analyzing data immediatelyand make quick decisions based on the learning.
3. Understand the market conditions: By analyzing big data you can get
a better understanding of current market conditions. For example, by
analyzing customers’ purchasing behaviors, a company can find out the
products that are sold the most and produce products according to this
trend. By this, it can get ahead of its competitors.
The customer is the most important asset any business depends on. There is
no single business that can claim success without first having to establish a
solid customer base. However, even with a customer base, a business cannot
afford to disregard the high competition it faces. If a business is slow to learn
what customers are looking for, then it is very easy to begin offering poor
quality products. In the end, loss of clientele will result, and this creates an
adverse overall effect on business success. The use of big data allows
businesses to observe various customer related patterns and trends.
Observing customer behavior is important to trigger loyalty.
GROUP 2
AMIT
5
Insights
Big data analytics can help change all business operations. This includes the
ability to
Another huge advantage of big data is the ability to help companies innovate
and redevelop their products.
2.CLOUD COMPUTING
Introduction
Cloud computing is a type of computing that relies on shared computing
resources rather than having local servers or personal devices to handle
applications.
GROUP 2
AMIT
6
Ability / space where you store your data ,process it and can access
anywhere from the world
As a Metaphor for the internet.
Cloud computing is :
Service: This term in cloud computing is the concept of being able to use
reusable, fine-grained components across a vendor’s network.
According to the NIST, all true cloud environments have five key
characteristics:
5. Measured service: Customers pay for the amount of resources they use
in a given period of time rather than paying for hardware or software upfront.
(Note that in a private cloud, this measured service usually involves some
GROUP 2
AMIT
7
form of charge backs where IT keeps track of how many resources different
departments within an organization are using.)
2.2 Applications:
i) Storage: cloud keeps many copies of storage. Using these copies of
resources, it extracts another resource if anyone of the resources fails.
ii. Database: are repositories for information with links within the
information that help making the data searchable.
Advantages:
ii. Improved performance: Data is located near the site with the greatest
demand and the database systems are parallelized, which allows the load to
be balanced among the servers.
iv. Flexibility : Systems can be changed and modified without harm to the
entire
Disadvantage
ii. Labour costs With that added complexity comes the need for more
workers on the
payroll.
iii. Security Database fragments must be secured and so must the sites
housing the fragments.
Ex:
GROUP 2
AMIT
8
Google docs
Data base services (DaaS): it avoids the complexity and cost of running
your own database.
Benefits:
I. Ease of use :don’t have to worry about buying, installing, and maintaining
hardware for the database as there is no servers to provision and no
redundant systems to worry..
ii. Power The database isn’t housed locally, but that doesn’t mean that it is
not functional and effective. Depending on your vendor, you can get custom
datavalidation to ensure accurate information. You can create and manage
the database
with ease.
iii. Integration The database can be integrated with your other services to
provide more value and power. For instance, you can tie it in with calendars,
email, and people to make your work more powerful.
Advantage of lower labor costs there. So it’s possible that you are using the
service in Chicago, the physical servers are in Washington state, and the
database administrator is in the Philippines.
GROUP 2
AMIT
9
• Clients
• Data centre
• Distributed servers
i. Clients:
• Clients are the devices that the end users interact with to manage their
information on the cloud.
b. Thin: are comps that don’t have internal hard drives then display the info
but rather let server do all the work.
Thin Vs Thick
iv. Security
v. Data Security
GROUP 2
AMIT
10
• This gives the service provider more flexibility in options and security.
EX :
Amazon has their cloud solution all over the world ,if one failed at one site
the service would still be accessed through another site
• If cloud needs more h/w they need not throw more servers in the safe room
–they can add
Convenience
Scalability
Low costs
Security
Anytime, anywhere access
High availability
Limitations /Disadvantages:
GROUP 2
AMIT
11
ii. Design services with high availability and disaster recovery in mind.
Leverage the multi- availability zones provided by cloud vendors in your
infrastructure.
iii. If your services have a low tolerance for failure, consider multi-region
deployments with automated failover to ensure the best business continuity
possible.
iv. Define and implement a disaster recovery plan in line with your business
objectives that provide the lowest possible recovery time (RTO) and recovery
point objectives (RPO).
b) Security and Privacy: Code Space and the hacking of their AWS EC2
console, which led to data deletion and the eventual shutdown of the
company. Their dependence on remote cloud based infrastructure meant
taking on the risks of outsourcing everything.
c) Vulnerability to Attack: Even the best teams suffer severe attacks and
security breaches from time to time.
GROUP 2
AMIT
12
GROUP 2
AMIT
13
f) Costs Savings: Adopting cloud solutions on a small scale and for short-
term projects can be perceived as being expensive.
GROUP 2
AMIT
14
a) Hypervisor
b) Management Software
c) Deployment Software
d) Network
e) Server
The server assists to compute the resource sharing and offers other
services like resource allocation and de-allocation, monitoring the
resources, provides the security etc.
f) Storage
GROUP 2
AMIT
15
THE END
GROUP 2
AMIT