The chapter discusses cloud application development challenges including performance isolation, reliability, and infrastructure variability. It describes common architectural styles like stateless servers using RPCs and REST. Workflows are defined as coordinating multiple tasks through states and events. ZooKeeper is presented as a distributed coordination service implementing consensus with a shared namespace and guarantees like atomicity. The MapReduce programming model is also mentioned.
The chapter discusses cloud application development challenges including performance isolation, reliability, and infrastructure variability. It describes common architectural styles like stateless servers using RPCs and REST. Workflows are defined as coordinating multiple tasks through states and events. ZooKeeper is presented as a distributed coordination service implementing consensus with a shared namespace and guarantees like atomicity. The MapReduce programming model is also mentioned.
Dan C. Marinescu Chapter 4 1 Contents ■ Challenges for cloud computing. ■ Architectural styles for cloud applications. ■ Workflows - coordination of multiple activities. ■ Coordination based on a state machine model. ■ The MapReduce programming model. ■ A case study: the GrepTheWeb application.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 2 Challenges for cloud application development ■ Performance isolation - nearly impossible to reach in a real system, especially when the system is heavily loaded.
■ Reliability - major concern; server failures expected when a large
number of servers cooperate for the computations.
■ Cloud infrastructure exhibits latency and bandwidth fluctuations
which affect the application performance.
■ Performance considerations limit the amount of data logging; the
ability to identify the source of unexpected results and errors is helped by frequent logging.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 3 Architectural styles for cloud applications ■ Based on the client-server paradigm. ■ Stateless servers - view a client request as an independent transaction and respond to it; the client is not required to first establish a connection to the server. ■ Often clients and servers communicate using Remote Procedure Calls (RPCs). ■ Simple Object Access Protocol (SOAP) - application protocol for web applications; message format based on the XML. Uses TCP or UDP transport protocols. ■ Representational State Transfer (REST) - software architecture for distributed hypermedia systems. Supports client communication with stateless servers, it is platform independent, language independent, supports data caching, and can be used in the presence of firewalls.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 4 Workflows ■ Process description - structure describing the tasks to be executed and the order of their execution. Resembles a flowchart.
■ Case - an instance of a process description.
■ State of a case at time t - defined in terms of tasks already
completed at that time.
■ Events - cause transitions between states.
■ The life cycle of a workflow - creation, definition, verification, and
enactment; similar to the life cycle of a traditional program (creation, compilation, and execution).
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 5 Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 6 Safety and liveness
■ Desirable properties of workflows.
■ Safety nothing “bad” ever happens.
■ Liveness something “good” will eventually happen.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 7 Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 8 Basic workflow patterns ■ Workflow patterns - the temporal relationship among the tasks of a process ■ Sequence - several tasks have to be scheduled one after the completion of the other. ■ AND split - both tasks B and C are activated when task A terminates. ■ Synchronization - task C can only start after tasks A and B terminate. ■ XOR split - after completion of task A, either B or C can be activated. ■ XOR merge - task C is enabled when either A or B terminate. ■ OR split - after completion of task A one could activate either B, C, or both. ■ Multiple Merge - once task A terminates, B and C execute concurrently; when the first of them, say B, terminates, then D is activated; then, when C terminates, D is activated again. ■ Discriminator – wait for a number of incoming branches to complete before activating the subsequent activity; then wait for the remaining branches to finish without taking any action until all of them have terminated. Next, resets itself.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 9 Basic workflow patterns (cont’d) ■ N out of M join - barrier synchronization. Assuming that M tasks run concurrently, N (N<M) of them have to reach the barrier before the next task is enabled. In our example, any two out of the three tasks A, B, and C have to finish before E is enabled. ■ Deferred Choice - similar to the XOR split but the choice is not made explicitly; the run-time environment decides what branch to take.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 10 Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 11 Coordination - ZooKeeper ■ Cloud elasticity distribute computations and data across multiple systems; coordination among these systems is a critical function in a distributed environment. ■ ZooKeeper ■ Distributed coordination service for large-scale distributed systems. ■ High throughput and low latency service. ■ Implements a version of the Paxos consensus algorithm. ■ Open-source software written in Java with bindings for Java and C. ■ The servers in the pack communicate and elect a leader. ■ A database is replicated on each server; consistency of the replicas is maintained. ■ A client connect to a single server, synchronizes its clock with the server, and sends requests, receives responses and watch events through a TCP connection.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 12 Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 13 Zookeeper communication ■ Messaging layer responsible for the election of a new leader when the current leader fails.
■ Messaging protocols use:
■ Packets - sequence of bytes sent through a FIFO channel. ■ Proposals - units of agreement. ■ Messages - sequence of bytes atomically broadcast to all servers. ■ A message is included into a proposal and it is agreed upon before it is delivered. ■ Proposals are agreed upon by exchanging packets with a quorum of servers, as required by the Paxos algorithm.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 14 Zookeeper communication (cont’d)
■ Messaging layer guarantees:
■ Reliable delivery: if a message m is delivered to one server, it will
be eventually delivered to all servers.
■ Total order: if message m is delivered before message n to one
server, it will be delivered before n to all servers.
■ Causal order: if message n is sent after m has been delivered by
the sender of n, then m must be ordered before n.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 15 Shared hierarchical namespace similar to a file system; znodes instead of inodes
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 16 ZooKeeper service guarantees ■ Atomicity - a transaction either completes or fails.
■ Sequential consistency of updates - updates are applied strictly
in the order they are received.
■ Single system image for the clients - a client receives the same response regardless of the server it connects to.
■ Persistence of updates - once applied, an update persists until
it is overwritten by a client.
■ Reliability - the system is guaranteed to function correctly as
long as the majority of servers function correctly.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 17 Zookeeper API ■ The API is simple - consists of seven operations:
■ Create - add a node at a given location on the tree.
■ Delete - delete a node.
■ Get data - read data from a node.
■ Set data - write data to a node.
■ Get children - retrieve a list of the children of the node.
■ Synch - wait for the data to propagate.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 18 Elasticity and load distribution ■ Elasticity ability to use as many servers as necessary to optimally respond to cost and timing constraints of an application. ■ How to divide the load ◻ Transaction processing systems a front-end distributes the incoming transactions to a number of back-end systems. As the workload increases new back-end systems are added to the pool. ◻ For data-intensive batch applications two types of divisible workloads are possible: ■ modularly divisible the workload partitioning is defined a priori. ■ arbitrarily divisible the workload can be partitioned into an arbitrarily large number of smaller workloads of equal, or very close size. ■ Many applications in physics, biology, and other areas of computational science and engineering obey the arbitrarily divisible load sharing model.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 19 MapReduce philosophy 1. An application starts a master instance, M worker instances for the Map phase and later R worker instances for the Reduce phase. 2. The master instance partitions the input data in M segments. 3. Each map instance reads its input data segment and processes the data. 4. The results of the processing are stored on the local disks of the servers where the map instances run. 5. When all map instances have finished processing their data, the R reduce instances read the results of the first phase and merge the partial results. 6. The final results are written by the reduce instances to a shared storage server. 7. The master instance monitors the reduce instances and when all of them report task completion the application is terminated.
Cloud Computing: Theory and Practice.
Dan C. Marinescu Chapter 4 20 Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 21 Case study: GrepTheWeb ■ The application illustrates the means to ◻ create an on-demand infrastructure. ◻ run it on a massively distributed system in a manner that allows it to run in parallel and scale up and down, based on the number of users and the problem size. ■ GrepTheWeb ◻ Performs a search of a very large set of records to identify records that satisfy a regular expression. ◻ It is analogous to the Unix grep command. ◻ The source is a collection of document URLs produced by the Alexa Web Search, a software system that crawls the web every night. ◻ Uses message passing to trigger the activities of multiple controller threads which launch the application, initiate processing, shutdown the system, and create billing records. Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 22 (a) The simplified workflow showing the inputs: - the regular expression. - the input records generated by the web crawler. - the user commands to report the current status and to terminate the processing.
(b) The detailed workflow.
The system is based on message passing between several queues; four controller threads periodically poll their associated input queues, retrieve messages, and carry out the required actions Cloud Computing: Theory and Practice. Dan C. Marinescu Chapter 4 23