0% found this document useful (0 votes)
21 views

Zookeeper

Uploaded by

Mayank Devani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Zookeeper

Uploaded by

Mayank Devani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Store function

the contents of a relation to exto.


A function that specifies how to save
ternal
storage.
by the same type.
Frequently, load and store functions are implemented
data from delimited t
For example, PigStorage, which is used for loading ext
files, can store data in the same format.
3.3 Zookeeper
Zookeeper ted system,
is core component of distributed system. For designing distribute
Cw coordination services need to be designed and developed, which are as follows:

1. Name service

It is a service which provide name to some data associated with that name,

E.g. A telephone directory is a name service which provide name of user to


their phone number.
Similarly, DNS is also a name service which provide domain name to IP
address.
I n order to keep track of the services which are running and to search it by
using their name, zookeeper can be used.
A name service can be extended to group membership service by using which
you can retrieve the data covered in that group that are related to the name
which is being searched.
2. Locking
Zookeeper provides an easy approach i.e. implementing mutexes, for
permitting the serialized access to the resources which are shared in
distributed system.
3. Synchronization
The data access over the distributed system should be synchronize, for that
purpose Zookeeper provides an easy approach.

4. Configuration management
To store and manage configuration information of distributed system
centrally, Zookeeper can be used. This shows that
updated configuration
information will be provided from zookeeper to the node which joins the
system.

3-44
Big Data Analytics
HDFS, HIVE and HIVEQL, HBASE
This permits you that
state changed by changing
the centralized of distributed system can be

5. Leader election
configuration by using z0okeeper cue
5
In casea
system fails in distributed
should be
svstem. automatic taikove
implemented. Zookeeper provides supports TOr failover
by using leader election.
au
.Zookeeper is itselt a distributed application along with the coordination ce for

distributed system.
Zookeeper uses client-server model, the nodes who uses the service are clients. and

nodes which provide these services are server.


tume. onc
.Zookeeper ensemble is formed by collection of zookeeper servers. At a

zookeeper server connected to one client.


.One Zookeper server can manage any number of client connections at a time tve
alive and
client sends a ping after certain interval to notify to the server that it is

connected.
that the server is
The Zookeeper server responds with acknowledging the pig, specifying
also alive.
does receive the
T h e client connects to the other server in ensemble if it not

time.
acknowledgement sent by server within specified
to the new server.
I n such case the client's session is transferred
shows the client-server
architecture of ZooKeeper.
Fig. 3.3.1
Zookeeper service
Leader

Server Server Server


Server

Client Clent Clent Client


Clien Client
of Zookeeper
architecture
server
3.3.1 Client
Fig.
3.3.1 Monitoring Cluster
four letter words"
can be monitored using "the
in cluster
The Zookeeper services

Big Data Analytics


3-45
HDFS, HIVE and HIVEQL,

Zookeeper
HBASE
responds to small set of commands where every command is made up off up of four
eters. These commands are issued via telnet to zookeeper at client port
rollowing are few ofthe command which are used to monitor cluster by zookeeper
1) stat

rovide general information about the server and connected chients.


2) srvr

Provide complete list of server.


3) cons

Provides complete connections details for the clients Connected to this


list of
Server. The details include session id, last
operation performed, number of packets
sent or
received, etc.
4) conf
Display details about serving configuration
5) crst
Statistics for collection is reset.
6) dump:
This command only works on leader node. It lists
ephemeral nodes.
outstanding nodes. and

7) envi

Display detains of serving environment.


8) ruok
This command is issued by connected
client to check if the is
non-error state.
server running in
9) imok
Server responds with this command to client if it is
running.
10) srst:
Reset server statistics.

11) wchs
Lists details on watches for the server.
12) wchc
Displays detail information about the watches for
the server, by session. It
list of sessions with related
watches. displa

3 46
13) wchp HDFS, HIVE and HIVEQL, HBASE
Displays detail
of paths with information about the
related sessions. ne watches for the server, by
Example of ruok server, by path.
patn. It display list
command:
S echo ruok nc
127.0.0.1 5111
imok
HBase uses
3.3.2
Zookeeper
For real time read or write access of
used, which runs on
huge dataset distributed NoSQL database HBase is
top of HDFS.
.HBase follows master-slave
act as slaves architecture where HBase acts as a server and
region Seerver
v
.Installation of HBase distributed application is
based on a running zookeeper
.To track the status of data
which is distributed across master and Ciusi
Zookeeper. It is done slaves, HBase uses
by using the centralized configuration and distributed mutex
techniques.
.Following are few use cases of HBase:
Telecom
Billions of call records stored by telecom industry and in real time scenario
are

accessing of these huge data is very difficult. HBase can be used to process such a
large data in real time, efficiently and easily.
Social network:
Social networking sites like, facebook, Linkedln and Twitter receives a large
amount of data in the form of posts created by users. HBase can be used to iscover
the interesting facts and recent trends.
3.3.3 Building Application with Zookeeper

Zookeeper is a centralized service


which is used to managing centralized configuration,
services and distributed synchronization. These services are
naming, providing group
used by distributed applications.
and
there is need of fixing errors race
Whenever these services are implemented
these services,
unavoidable. Due to ditficulty in implementing
Conditions which are
services.
applications uses skimp on the in maintenanceof
of such services
can lead to complexity
Various implementations
application when it is deployed.
Big Data Analytics
3-47

You might also like