SERVICE ORIENTED ARCHITECTURES
Discovery,
Registries,
Metadata, and Databases,
Workflow in Service-Oriented Architectures
• MESSAGE-ORIENTED MIDDLEWARE
• Enterprise Bus- Interact with messages with a variety of different
formats (APIs), wire protocols, and transport mechanisms.
• The author of a service should not need to worry that a special port is
to be used to avoid firewall difficulties or that we need to use UDP
and special fault tolerance approaches to achieve satisfactory latency
on a long-distance communication
• Publish-Subscribe Model and Notification- A particular model for
linking source and destination for a message bus.
• The producer of the message (publisher) labels the message in some
fashion-by associating one or more topic names from a (controlled)
vocabulary.
• The receivers of the message (subscriber) will specify the topics for
which they wish to receive associated messages.
• One use content-based delivery systems where the content is
queried in some format such as SQL.
• Queuing and Messaging Systems-The Java Message Service (JMS) -
With specifies a set of interfaces outlining the communication
semantics in pub/sub and queuing systems.
• Advanced Message Queuing Protocol (AMQP) specifies the set of
wire formats for communications; unlike APIs, wire formats are cross-
platform( SOAP, SSL, and SMTP).
• Eg:-Amazon Simple Queue and Azure Queue
DISCOVERY, REGISTRIES, METADATA, AND
DATABASES
• Distributed applications need to discover resources that suit their
needs and manage them.
• Business services need to discover appropriate services to use and to
integrate with.
• Services-dynamically at runtime by classifying and categorizing
services or metadata information about services.
• A registry requires a set of data structure specifications for the
metadata to be stored in the registry.
• A set of operations such as Create, Read, Update, and Delete (CRUD)
for storing, deleting, and querying the data to store metadata for
ownership, containment, and categorization of services.
• Registries usually contain three categories of information:
• White pages contain name and general contact information about an
entity.
• Yellow pages contain classification information about the types and
location of the services the entry offers.
• Green pages contain information about the details of how to invoke
the offered services (technical data regarding the service).
UDDI and Service Registries
• Define a way to describe, publish, and discover information about web
services by creating a platform-independent, open framework.
• UDDI provides a name service and a directory service for looking up service
descriptions by name or by a specific attribute.
• There are two primary types of registries.
• A public registry - A logically centralized distributed service that replicates
data with other public registries on a regular basis.
• A private registry is only accessible within a single organization or is shared
by a group of business partners for a special purpose. Also called a
semiprivate or shared registry.
• The UDDI Business Registry consists of replicated registries (initially hosted
by IBM and Microsoft) called UDDI operators.
• A UDDI registry is an instance of a web service, and its entries can be
published and queried using a SOAP-based interface.
• UDDI defines data structures and APIs for programmatically
publishing service descriptions and querying the registry.
Data structure of UDDI
• Data in a UDDI registry is organized as instance types:
• BusinessEntity Describes an organization or a business that provides the web services,
including the company name, contact information, industry/product/geographic
classification, and so on
• BusinessService Describes a collection of related instances of web services offered by an
organization, such as the name of the service, a description, and so forth
• BindingTemplate Describes the technical information necessary to use a particular web
service, such as the URL address to access the web service instance and references to its
description
• tModel A generic container for specification of WSDL documents in general web services
• PublisherAssertion Defines a relationship between two or more businessEntity elements
• Subscription A standing request to keep track of changes to the entities in the
subscription
A UDDI registry can be used by service providers,
service requestors, or other registries.
UDDI entities and their relationship
UDDI provides a set of APIs
UDDI is an open standard, it has never gained
much popularity
• ProgrammableWeb.com is a registry of a variety of Web 2.0 applications,
such as mashups and APIs organized by category, date, or popularity.
• It has similar goals to UDDI, but does not use the detailed UDDI
specifications.
• Mashups are composite Web 2.0 applications which combine capabilities
from existing web-based applications, typically RESTful web services.
• Mashups can be compared to workflows, as they both implement
distributed programming at the service level. Content used in mashups is
typically sourced from a third party via a public interface or API.
Databases and Publish-Subscribe
• Publish-subscribe is a design pattern that enables asynchronous
interaction among distributed applications.
• Many high-level applications regularly query the database in order to
adapt their execution according to this information.
• Periodic data polling is not only inefficient and unscalable, but also
resource-demanding on both sides, especially when the interval
between calls to the database is very small, or in cases when there
is more than one consumer application, it may increase the amount
of network traffic and CPU usage dramatically.
• The publishsubscribe mechanism, already largely adopted in the
implementation of today’s applications, solves this issue.
• In a publish-subscribe interaction, event subscribers register to
particular event types and receive notifications from the event
publishers when they generate such events.
• There is a dynamic, many-to-many relationship between event publishers
and event subscribers, as there can be any number of
publishers/subscribers for any type of event which can vary at any time.
• Publish-subscription adds dynamicity to static the nature of databases.
• While the publish-subscribe pattern was first implemented in centralized
client/server systems, current research focuses mainly on distributed
versions.
• The key benefit of the distributed publish-subscribe mechanism is the
natural decoupling of publishers and subscribers.
• Since the publishers are unconcerned with the potential consumers of
their data, and the subscribers are unconcerned with the locations of the
potential producers of interesting data, the client interface of the publish-
subscribe system is simple and intuitive
• Publish-subscribe systems are classified as either topic-based or content-based.
• In topic-based systems, publishers generate events with respect to a topic or subject.
• Subscribers then specify their interest in a particular topic, and receive all events
published on that topic.
• Defining events in terms of topic names only is inflexible and requires subscribers to
filter events belonging to general topics.
• Content-based systems solve this problem by introducing a subscription scheme based
on the contents of events.
• Content-based systems are preferable as they give users the ability to express their
interest by specifying predicates over the values of a number of well-defined attributes.
• The matching of publications (events) to subscriptions (interest) is done based on the
content.
• Distributed solutions are mainly focused on topic-based publish-subscribe system
• Database systems provide many features that a messaging-based
architecture can exploit, such as reliable storage, transactions, and
triggers.
• Integrated publish-subscribe capabilities in the database account for
information-sharing systems that are simpler to deploy and maintain.
Publish-subscribe and database technology have evolved
independently, designing and implementing database-publish-
subscribe-aware systems requires bringing together concepts and
functionality from two separate worlds.
• A combination of features are introduced to allow a publish-subscribe style of messaging
between applications.
• These features include rule-based subscribers, message propagation, the listen feature, and
notification capabilities.
• Oracle Streams Advanced Queuing is built on top of Oracle Streams and leverages the
functionality of Oracle Database so that messages can be stored persistently, propagated
between queues on different computers and databases, and transmitted using Oracle Net
Services and HTTP(S).
• As Oracle Streams Advanced Queuing is implemented in database tables, all operational benefits
of high availability (HA), scalability, and reliability are also applicable to queue data.
• Standard database features such as recovery, restart, and security are supported by Oracle
Streams Advanced Queuing.
• Database development and management tools such as Oracle Enterprise Manager can be applied
to monitor queues. Like other database tables, queue tables can be imported and exported.
Metadata Catalogs
• Metadata catalogs play a vital role in distributed heterogeneous environments
such as grids by providing users and applications the means to discover and
locate the desired data and services among lots of sites on such environments.
• Metadata is important, since it adds context to the data, in order to identify,
locate, and interpret it.
• Key metadata on the grid includes the name and location of the data resource,
structure of the data held within the data resource, data item names and
descriptions, and user information (name, address, and profiles and preferences),
or basic listings and simple lookup of available services, relating function and
location without significant rich context.
• Metadata catalogs are used by various groups and communities, ranging from
high-energy physics to biomedical, earth observation, and geological science
• the earliest metadata catalogs is the Metadata Catalog Service
(MCAT) [90], which is a part of the Storage Resource Broker (SRB)
• Aims to provide an abstraction layer over heterogeneous storage
devices and file systems either inside or across computing centers.
MCAT stores the data hierarchically using a tree of collections and is
both a file and metadata catalog. Later versions of MCAT support
replication and federation of data resources.
AMGA (the ARDA Metadata for Grid
Applications
• Attributes are represented as key-value pairs with type information, and
each entry assigns an individual value to the attributes of its collection.
• A schema can be a representation of a directory, which can contain either
entries or other schemas.
• As an advantage of this tree-like structure, users can define a hierarchical
structure which can help to better organize metadata in subtrees that can
be queried independently.
• The server supports several storage systems by using modules.
• AMGA can manage groups of users with different permissions on
directories.
• In grid environment, file and metadata catalogs are used by users for
discovering and locating data among the hundreds of grid sites.
WORKFLOW IN SERVICE-ORIENTED
ARCHITECTURES
• services as the basic unit for constructing distributed systems.
• a “real system” consists of multiple interacting (generalized) services
• The prototypical complete system could be a “grid of services,” but
we also talk about a “grid of grids” or even a “grid of clouds
• multiple application grids in different areas of “critical infrastructure.”
• Component grids (subgrids) are invoked for collaboration,
visualization, sensor fusion, computing, and GIS applications.
• Workflow is used to integrate component grids and services.
Basic Workflow Concepts
• workflow which is the approach to “programming the interaction
between services.”
• workflow describes “programming the web or grid,” one can also use
terms such as “software coordination,” “service orchestration,”
“service or process coordination,” “service conversation,” “web or
grid scripting,” “application integration,” or “software bus.”
• different approaches emphasizing control flow, scheduling, and/or
data flow.
• The basic services are programmed in traditional languages—C, C++,
FORTRAN, Java, and Python—while workflow describes the coarser-
grained programming of services interacting with one another.
• Each service is programmed in traditional languages while their
interaction is described by workflow.
• elementary shell programming where pipes are often used to link
executing programs.
• scripting (as in shell scripts) is one popular approach to workflow with
distributed programming constructs
• TCP channels or publish-subscribe messaging replaces pipes
• this coalition is largely concerned with business process management,
with steps in a workflow often involving human and not computer
steps.
• For example, Allen defined business workflow as the automation of a
business process, in whole or in part, during which documents,
information, or tasks are passed from one participant to another for
action, according to a set of procedural rules
Workflow Standards
• A realization that this goal produced heavyweight architectures
where the tooling could not keep up with the support of the many
standards.
• Today greater emphasis on lightweight systems where interoperability
is achieved by ad hoc transformations where necessary.
• Another problem of the standardization work was that it largely
preceded the deployment of systems, and so one found premature
standardization that missed key points.
• The successful activities have a business process flavor, and for
scientific workflow.
• XML is not well suited to specifying programming constructs;
although XML can express data structures well, it is possible but not
natural to express loops and conditionals that are essential to any
language and the control of a workflow
Workflow Architecture and Specification
• Most workflow systems have two key components corresponding to
the language and runtime components of any programming
environment.
• Components are the workflow specification and workflow execution
engine.
Workflow Specification
• Scripting-based workflow systems can specify workflow in traditional
language syntax similar to Python, JavaScript, or Perl.
• One can also directly specify the (XML) interface document driving
the execution engine, although that is pretty low-level.
• However, most workflow systems use a graphical interface
• Two typical (Load and Merge) workflows from the Pan-STARRS
astronomy data processing area.
Workflow Execution Engine
• There are many different workflow systems. Workflow does not have
strong performance constraints.
• the typically large execution times of nodes made overhead less
important.
• This same feature typically allows workflow to be executed in a
distributed fashion—the network latency of long communication
hops is often not important.
• Examined according to their size, resource use, graph pattern, data
pattern, and usage scenario.
• The control and not the data flow of a workflow. Of course, the control structure
implies the data flow structure for a given set of nodes.
• A more general workflow structure is one which is a collection of vertices and
directed edges, each edge connecting one vertex to another such that there are
no cycles.
• There is no way to start at some vertex V and follow a sequence of edges that
eventually loops back to that vertex V again.
• In spite of sophisticated specialized workflow systems, scripting using traditional
languages and toolkits is perhaps the dominant technique used to build
workflows.
• This is done in an informal fashion using any environment with distributed
computing (Internet) support; PHP may be the most popular environment for
building mashups, but Python and JavaScript are also well used.
• The nodes of a workflow can be either services or collections of
services (subworkflows). This is consistent with the Grid of Grids
concept
How do you create a workflow in AWS?
1.Introduction.
2.Step 1: Create a State Machine.
3.Step 2: Create an AWS Identity and Access Management (IAM) Role.
4.Step 3: Design a Serverless Workflow.
5.Step 4: Create your AWS Lambda Functions.
6.Step 5: Populate your Workflow.
7.Step 5: Execute your Workflow.
8.Step 5: Terminate resources
• One important technology choice is the mechanism for transferring
information between the nodes of the graph. The simplest choice is
that each node reads from and writes to disk and this allows one to
treat the execution of each node as an independent job invoked when
all its needed input data is available on disk