Ferguson P., Huston G. - Quality of Service - Delivering QoS On The Internet and in Corporate Networks (1998) PDF
Ferguson P., Huston G. - Quality of Service - Delivering QoS On The Internet and in Corporate Networks (1998) PDF
Networks
by Paul Ferguson and Geoff Huston
John Wiley & Sons © 1998 Archive
This masterful guide offers an in-depth theoretical look at the technical and logistical
issues surrounding Quality of Service on the network.
1
Table of Contents
Introduction
2
Introduction
“Any sufficiently advanced technology is indistinguishable from magic.”
—Arthur C. Clarke
Overview
Quality of Service (QoS) is one of the most elusive, confounding, and confusing topics in
data networking today. Why has such an apparently simple concept reached such
dizzying heights of confusion? After all, it seems that the entire communications industry
appears to be using the term with some apparent ease, and with such common usage, it
is reasonable to expect a common level of understanding of the term.
Our research into this topic for this book has lead to the conclusion that such levels of
confusion arise primarily because QoS means so many things to so many people. The
trade press, hardware and software vendors, consumers, researchers, and industry
pundits all seem to have their own ideas and definitions of what QoS actually is, the
magic it will perform, and how to effectively deliver it. The unfortunate byproduct,
however, is that these usually are conflicting concepts that often involve complex,
impractical, incompatible, and non-complementary mechanisms for delivering the desired
(and sometimes expected) results. Is QoS a mechanism to selectively allocate scarce
resources to a certain class of network traffic at a higher level of precedence? Or is it a
mechanism to ensure the highest possible delivery rates? Does QoS somehow ensure
that some types of traffic experience less delay than other types of traffic? Is it a dessert
topping, or is it a floor wax? In an effort to deliver QoS, you first must understand and
define the problem. If you look deep enough into the QoS conundrum, this is the heart of
the problem—there is seldom a single definition of QoS. The term QoS itself is an
ambiguous acronym. And, of course, multiple definitions of a problem yield multiple
possible solutions.
To illustrate the extent of the problem, we provide an observation made in early May
1997 while participating in the Next Generation Internet (NGI) workshop in northern
Virginia. The workshop formed several working groups that focused on specific
technology areas, one of which was Quality of Service. The QoS group spent a
significant amount of time spinning its wheels trying to define QoS, and of course, there
was a significant amount of initial disagreement. During the course of two days, however,
the group came to the conclusion that the best definition it could formulate for QoS was
one that defined methods for differentiating traffic and services—a fairly broad brush
stroke, and one that may be interpreted differently. The point here is that Quality of
Service is clearly an area in which research is needed; tools must be developed; and the
industry needs much improvement, less rhetoric, and more consensus.
Tip You can find details on the Workshop on Research Directions for the Next
Generation Internet (NGI) held in May 1997 and sponsored by the Computer
Research Association at www.cra.org/Policy/NGI/.
We have all experienced similar situations in virtually every facet of our everyday
professional lives—although QoS seems to have catapulted into a position where it is an
oft-used buzzword, there are wildly varying opinions on what it means and how to deliver
it.
3
A Quality of Service Reality Check
This book is a project we strongly felt needed be undertaken to inject some semblance of
equity into the understanding of QoS—a needed reality check, if you will. As a
community, we need to do a much better job of articulating the problem we are trying to
solve before we start thrusting multiple solutions onto the networking landscape. We also
need to define the parameters of the QoS environment and then deliver an equitable
panorama of available options to approach the QoS problem. This book attempts to
provide the necessary tools to define QoS, examine the problems, and provide an
overview of the possible solutions.
In the process of reviewing the QoS issues, we also attempt to provide an objective
summary of the benefits and liabilities of the various technologies. It is important to draw
the distinction between service quality and quality of service and to contemplate why
QoS has been such an elusive and difficult issue to pursue.
This book is geared toward all types of readers—the network manager trying to understand
the technology, the network engineering staff pondering methods to provide differentiated
services, and the consultant or systems integrator trying to verify a theory on why a
particular approach may not be optimal.
Expectations of QoS
Although QoS means different things to different people in the population at large, it also
means different things to individuals within an organization attempting to deploy it.
Empirical evidence indicates that QoS mechanisms are in more demand in corporate,
academic, and other private campus intranets (private networks that interconnect
portions of a corporation or organization) than they are on the global Internet (the Internet
proper, which is the global network of interconnected computer networks) and the ISP
(Internet Service Provider) community. Soliciting comments and discussing the issue of
QoS on the NANOG (North American Network Operators Group) mailing list, as well as
within several Usenet newsgroups, reveals that unique requirements exist in each
community. Not only do we see unique expectations within each of these communities,
but we also see different expectations within each organization.
In the simplest of terms, Usenet news, or netnews, is a forum (or, rather, several
4
thousand forums) for online discussion. Many computers around the world regularly
exchange Usenet articles on almost every subject imaginable. These computers
are not physically connected to the same network. Instead, they are connected
logically in their capability to exchange data; they form the logical network referred
to as Usenet. For more information on Usenet, see
www.eff.org/papers/eegtti/eeg_68.html.
You don’t need a bolt of lightning to strike you to realize that engineering and marketing
organizations sometimes have conflicting views of reality. The marketing organization of
an ISP, for example, may be persistent about implementing some type of QoS purely on
the expectation that it will create new opportunities for additional revenue. Conversely,
the engineering group may want a QoS implementation only to provide predictable
behavior for a particular application or to reduce the resource contention within a
particular portion of the network topology. Engineering purists are not driven by factors of
economics, but instead by the intrinsic value of engineering elegance and stability—
building a network that functions, scales, and outperforms inferior designs.
By the same token, unique expectations exist in the corporate and academic
communities with regard to what QoS can provide to their organizations. In the academic
community, it is easy to imagine the desire to give researchers preferential use of
network resources for their meritorious applications and research traffic instead of forcing
them to contend for network resources within the same environment where dormitory
students are playing Quake across the campus network. Campus network administrators
also may want to pay a slightly higher fee to their service provider to transport the
research traffic with priority, while relegating student traffic to a lower-fee service perhaps
associated with a lower best-effort service quality. Similarly, in the corporate network
environment, there may be many reasons to provide preferential treatment to a particular
application or protocol or to assign some application to a lower level of service quality.
These are the types of issues for which QoS conjures the imagination—automagic
mechanisms that provide this differential service quality. It is not a simple issue that
generates industry consensus, especially when agreement cannot be reached within the
network engineering community or, worse, within a single organization. This also
introduces a fair amount of controversy as a result of the various expectations of what
the problem is that must be solved. Definitions often are crafted to deliver functionality to
meet short-term goals. Is it technical? Or economic? Market driven? Or all of these
things?
These are the types of issues that may become more clear; while focusing on the
technical aspects, perhaps some of the political aspects will shake out.
Chapter 1: What Is Quality of Service? In this chapter, you’ll examine the elusive
definitions of Quality of Service, as well as the perceptions of what QoS provides. This
aspect is especially interesting and is intended for all audiences, both lay and technical.
We discuss how QoS needs have evolved from a historical perspective, what
5
mechanisms exist, how they are being used, and why QoS is important today as well as
in the immediate and far-reaching future.
Chapter 2 also focuses on defining policies in the network, because policies are the
fundamental building blocks with which you can begin to implement QoS in any particular
network environment. The policy aspects of QoS are intended for both network
managers and administrators, because understanding what tools are available to control
traffic flow is paramount in successfully implementing policy control. We also provide a
sanity check for network engineers, planners, and architects, so that these networking
professionals can validate their engineering, design, and architectural assumptions
against the principles that enable networks to be constructed with the properties
necessary to implement QoS. Additionally, Chapter 2 examines measurement tools for
quantifying QoS performance and looks behind the tools to the underlying framework of
network performance measurement.
Chapter 3: QoS and Queuing Disciplines, Traffic Shaping, and Admission Control.
Here, we provide an overview of the various queuing disciplines you can use to
implement differentiated QoS, because management of queuing behavior is a central
aspect of managing a network’s performance. Additionally, we highlight the importance of
traffic shaping and admission control, which are extraordinarily important tools used in
controlling network resources. Network engineers will find Chapter 3 especially
interesting, because we illustrate the intrinsic differences between each of these tools
and exactly how you can use them to complement any QoS implementation.
Chapter 4: QoS and TCP/IP: Finding the Common Denominator. Next, we examine
QoS from the perspective of a number of common transport protocols, discussing
features of QoS implementations used in IP, Frame Relay, and ATM networks. We also
examine why congestion avoidance and congestion control are increasingly important.
This chapter on TCP/IP and QoS examines the TCP/IP protocol suite and discusses
which QoS mechanisms are possible at these layers of the protocol stack. TCP/IP does
provide an extensive set of quality and differentiation hooks within the protocol, and you
will see how you can apply these within the network.
Chapter 5: QoS and Frame Relay. This chapter provides a historical perspective of
X.25, which is important in understanding how Frame Relay came into existence. We
then move to Frame Relay and discuss how you can use Frame Relay mechanisms to
differentiate traffic and provide QoS. You’ll also see how Frame Relay may not be as
effective as a QoS tool.
Chapter 6: QoS and ATM. In this chapter, you’ll examine the QoS mechanisms within
ATM as well as the pervasive misperception that ATM QoS mechanisms are “good
enough” for implementing QoS. Both Chapters 5 and 6 are fairly technical and are
geared toward the network engineering professional who has a firm understanding of
Frame Relay and ATM technologies.
6
Chapter 7: The Integrated Services Architecture. It is important to closely examine
RSVP (Resource ReSerVation Setup Protocol) as well as the Internet Engineering Task
Force’s (IETF) Integrated Services model to define and deliver QoS. In this chapter,
you’ll take an in-depth look at the Integrated Services architecture, RSVP, and ancillary
mechanisms you can use to provide QoS. By the same token, however, we play devil’s
advocate and point out several existing issues you must consider before deciding
whether this approach is appropriate for your network.
Chapter 8: QoS and Dial Access. This chapter examines QoS and dial-up services.
Dial-up services encompass a uniquely different set of issues and problems, which you
will examine in detail. Because a very large majority of network infrastructure, as least in
the Internet, consists of dial-up services, we also provide an analytical overview of QoS-
implementation possibilities that make sense in the dial-access marketplace. This
chapter provides technical details of how you might implement QoS in the dial-access
portion of the network. Therefore, product managers tasked with developing a dial-up
QoS product will find this chapter particularly interesting.
Chapter 9: QoS and Future Possibilities. Here, you’ll take a look at QoS possibilities in
the future. Several promising technologies may deliver QoS in a method that makes it
easy to understand and simple to implement. However, if current trends are any
indication, the underlying mechanics of these technologies may prove to be
extraordinarily complex and excessively difficult to troubleshoot when problems arise.
This may not be the case; however, it is difficult at this point to look into a crystal ball and
see the future. In any event, we do provide an overview of emerging technologies that
might play a role in future QoS schemes, and we provide some editorial comments on
the applicability of each scheme.
Among the possible future technologies that may have QoS significance are the IETF’s
emerging MPLS (Multi Protocol Label Switching) and QoSR (Quality of Service Routing)
efforts, each of which is still in the early development stage, each with various submitted
proposals and competing technologies. You’ll also take a look at QoS possibilities with
IPv6 (IP version 6) and several proposed extensions to RSVP (Resource ReSerVation
Setup
Chapter 10: QoS: Final Thoughts and Observations. In this final chapter, you’ll review
all the possible methods of implementing QoS and look at analyses on how each might fit
into the overall QoS scheme. This chapter is particularly relevant, because we provide an
overview of economic factors that play a role in the implementation of QoS. More
important, we provide a simple summary of actions you can undertake within the
network, as well as within host systems, to improve the service efficiency and service
quality. You’ll then examine a summary of actions that can result in levels of
differentiation in an effort to effectively provide QoS in the network environment.
Despite the efforts of many, other networking protocols are in use today, and many
networks support multiprotocol environments, so QoS within IP networks is perhaps not
all of the QoS story for many readers.
7
legacy networks to more contemporary protocols, platforms, and technologies.
It is equally important to understand the past, present, and future QoS technologies and
implementations that have been or have yet to be introduced. It doesn’t pay to reinvent
the wheel; it is much easier to learn from past mistakes (someone else’s, preferably) than
it is to foolishly repeat them. Also, network managers need to seriously consider the level
of expertise required to implement, maintain, and debug specific QoS technologies that
may be deployed in their networks. An underestimation of the proficiency level required
to comprehend the technology ultimately
could prove disastrous. It is important that the chosen QoS implementation function as
expected; otherwise, all the theoretical knowledge in the world won’t help you if your
network is broken. It is no less important that the selected QoS implementation be
somewhat cost-effective, because an economically prohibitive QoS model that may have
a greater chance of success, and perhaps more technical merit, may be disregarded in
favor of an inferior approach that is cheaper to implement. Also, sometimes brute force is
better—buying more bandwidth and more raw-switching capability is a more effective
solution than attempting a complex and convoluted QoS implementation.
It was with these thoughts in mind that we designed the approach in this book.
8
Part I
Chapter List
9
Chapter 1: What Is Quality of Service?
Overview
What’s in a name? In this case, not much, because Quality of Service (QoS)
sometimes has wildly varying definitions. This is partly due to the ambiguity
and vagueness of the words quality and service. With this in mind, you also
should understand the distinction between service quality and quality of
service.
Quality of Service: The Elusive Elephant
In this case, think of the elephant as the concept of QoS. Different people
see QoS as different concepts, because various and ambiguous QoS
problems exist. People just seem to have a natural tendency to adapt an
ambiguous set of concepts to a single paradigm that encompasses just
their particular set of problems. By the same token, this ambiguity within
QoS yields different possible solutions to various problems, which leaves
somewhat of a schism in the networking industry on the issue of QoS.
Another analogy often used when explaining QoS is that of the generic
12-step program for habitual human vices: The first step on the road to
recovery is acknowledging and defining the problem. Of course, in this
case, the immediate problem is the lack of consensus on a clear definition
of QoS.
To examine the concept of QoS, first examine the two operative words:
quality and service. Both words can be equally ambiguous. This chapter
examines why this situation exists and provides a laundry list of available
definitions.
What Is Quality?
10
reliable manner or even somehow in a manner better than normal. This
method includes the aspect of data loss, minimal (or no) induced delay or
latency, consistent delay characteristics (also known as jitter), and the
capability to determine the most efficient use of network resources (such
as the shortest distance between two endpoints or maximum efficiency of
circuit bandwidth). Quality also can mean a distinctive trait or
distinguishing property, so people also use quality to define particular
characteristics of specific networking applications or protocols.
What Is Service?
Service Guarantees
11
environment where virtually all traffic is packet-based, and structured
packet loss may not be to the detriment of an application; indeed, it may
be used as a signal to dynamically determine optimal data-flow rates.
Offering a guaranteed service of some sort implies that not only will no
loss occur, but that the performance of the network is consistent and
predictable. In a world of packet-based networks, this is a major
engineering challenge.
Two forms of latency exist, for example: real and induced. Real latency is
considered to be the physical, binding characteristics of the transport
media (electronic signaling and clocked speed) and the RTT of data as it
12
traverses the distance between two points as bound by the speed of
electromagnetic radiation. This radiation often is referred to as the speed
of light problem, because changing the speed at which light travels
generally is considered to be an impossibility and is the ultimate boundary
on how much real latency is inherently present in a network and how
quickly data can be transmitted across any arbitrary distance.
Induced latency is the delay introduced into the network by the queuing
delay in the network devices, processing delay inherent to the end-
systems, and any congestion present at intermediate points in the transit
path of the data in flight. Queuing delay is not well ordered in large
networks; queuing delay variation over time can be described only as
highly chaotic, and this resultant induced latency is the major source of
the uncertainty in protocol-level estimates of RTT.
13
relationship to human perception. Partridge refers to this phenomenon as
the “human-in-the-loop” principle—humans can absorb large amounts of
information and are sensitive to delays in the delivery and presentation of
the information. This principle describes some fascinating situations; for
example, users of a network service have a tendency to remember
infrequent failures of the network more than its overall success in
delivering information in a consistent and reliable manner. This perception
leaves users with the impression that the quality of service is poor, when
the overall quality of the service actually is quite good. As you can see,
predictable service quality is of critical importance in any QoS
implementation, because human perception is the determining factor in
the success or failure of a specific implementation.
14
Traffic shaping. Using a “leaky bucket” to map traffic into separate
output queues to provide some semblance of predictable behavior.
This can be done at the IP layer or lower in the stack at the
Asynchronous Transfer Mode (ATM) layer.
In the earlier days of networking, the concept of QoS did not really exist;
getting packets to their destination was the first and foremost concern.
You might have once considered the objective of getting packets to their
destinations successfully as a primitive binary form of QoS—the traffic
15
was either transmitted and received successfully, or it wasn’t. Part of the
underlying mechanisms of the TCP/IP (Transmission Control
Protocol/Internet Protocol) suite evolved to make the most efficient use of
this paradigm. In fact, TCP [DARPA1981a] has evolved to be somewhat
graceful in the face of packet loss by shrinking its transmission window
and going into slow start when packet loss is detected, a congestion
avoidance mechanism pioneered by Van Jacobson [Jacobson1988] at
Lawrence Berkeley Laboratory. By the same token, virtually all of the IP
[DARPA1981b] routing protocols use various timer-expiration
mechanisms to detect packet loss or oscillation of connectivity between
adjacent nodes in a network.
Prior to the transition of the NSFnet from the supervision of the National
Science Foundation in the early 1990s to the commercial Internet Service
Providers (ISPs) that now comprise the national Internet backbones in the
United States, congestion management and differentiation of services was
not nearly as critical an issue as it is now. The principal interest prior to
that time was simply keeping the traffic flowing, the links up, and the
routing system stable [and complying with the NSFnet Acceptable Use
Policy (AUP), but we won’t go into that]. Not much has changed since the
transition in that regard, other than the fact that the same problems have
been amplified significantly. The ISP community still is plagued with the
same problems and, in fact, are facing many more issues of complex
policy, scaling, and stability. Only recently has the interest in QoS built
momentum.
By the same logic, there are equally compelling reasons to provide differentiated CoS in
the corporate, academic, and research network arenas. Although the competitive and
economic incentives might not explicitly apply in these types of networks, providing
different levels of CoS still may be desirable. Imagine a university campus where the
network administrator would like to give preferential treatment to professors conducting
research between computers that span the diameter of the campus network. It certainly
would be ideal to give this type of traffic a higher priority when contending for network
resources, while giving other types of best-effort traffic secondary resources.
16
One tried-and-true method of providing ample network resources to every
user and subscriber is to over-engineer the network. This basically implies
that there will never be more aggregate traffic demand than the network
can accommodate, and thus, no resulting congestion. Of course, it is
hardly practical to believe that this can always be accomplished, but
believe it or not, this is how many networks have traditionally been
engineered. This does not always mean, however, that all link speeds
have been engineered to accommodate more aggregate traffic than what
may be destined for them. In other words, if there is traffic from five 56-
Kbps (kilobits per second) circuits destined for a single link (or
aggregation point) in a network, you would explicitly over-provision the
aggregation point circuit to accommodate more than 100 percent of all
aggregated traffic (in this example, greater than 280 Kbps; Figure 1.1).
Instead, there may be cases when the average utilization on certain links
may have been factored into the equation when determining how much
aggregate bandwidth actually is needed in the network. Suppose that you
are aggregating five 56-Kbps links that are each averaging 10 percent link
utilization. It stands to reason that the traffic from these five links can be
aggregated into a single 56-Kbps link farther along in the network
topology, and that this single link would use only, on the average, 50
percent of its available resources (Figure 1.2). This practice is called
oversubscription and is fairly common in virtually all types of networks. It
can be dangerous, however, if traffic volumes change dramatically before
additional links can be provisioned, because any substantial increase in
traffic can result in significant congestion. If each of the five links increase
to an average 50 percent link utilization, for example, the one 56-Kbps link
acting as the aggregation point suddenly becomes a bottleneck—it now is
250 percent oversubscribed.
17
For the most part, the practice of overengineering and oversubscription has
been the standard method of ensuring that subscriber traffic can be
accommodated throughout the network. However, oversubscribing often
does not reflect sound engineering principles, and overengineering is not
always economically feasible. Oversubscription is difficult to calculate; a
sudden change in traffic volume or patterns or an innocent miscalculation
can dramatically affect the performance of the overall network. Also, the
behavior of TCP as a flow-management protocol can undermine
oversubscription engineering. In this case, over-subscription is not simply a
case of multiplexing constant rate sessions by examining the time and
duration patterns of the sessions, but also a case of adding in adaptive rate
behavior within each session that attempts to sense the maximum available
bandwidth and then stabilize the data-transmission rates that this level.
Oversubscription of the network inevitably leads to congestion at the
aggregation points of the network that are oversubscribed. On the other
hand, over engineering is an extravagant luxury that is unrealistic in today’s
competitive commercial network services market.
18
Why Hasn’t QoS Been Deployed?
Most of the larger ISPs are simply peddling as fast as they can just to
meet the growing demand for bandwidth; they have historically not been
concerned with looking at ways to intelligently introduce CoS
differentiation. There are two reasons for this. First, the tools have been
primitive, and the implementation of these tools on high-speed links has
traditionally had a negative impact on packet-forwarding performance.
The complex queue-manipulation and packet-reordering algorithms that
implement queuing-based CoS differentiation have, until recently, been a
liability at higher link speeds. Second, reliable measurement tools have
not been available to ensure that one class of traffic actually is being
treated with preference over other classes of traffic. If providers cannot
adequately demonstrate to their customers that a quantifiable amount of
traffic is being treated with priority, and if the customer cannot
independently verify these levels of differentiation, the QoS
implementation is worthless.
There are also a few reasons why QoS has not historically been deployed
in corporate and campus networks, and there are a couple of possible
causes. One plausible reason is that on the campus network, bandwidth is
relatively cheap; it sometimes is a trivial matter to increase the bandwidth
between network segments when the network only spans several floors
within the same building or several buildings on the same campus. If this
is indeed the case, and because the traffic-engineering tools are primitive
and may negatively impact performance, it sometimes is preferable to
simply throw bandwidth at congestion problems. On a global scale,
however, overengineering is considered an economically prohibitive
luxury. Within a well-defined scope of deployment, overengineering can
be a cost-effective alternative to QoS structures.
WAN Efficiency
The QoS problem is evident when trying to efficiently use wide-area links,
which traditionally have a much lower bandwidth and much higher
monthly costs than the one-time cost of much of the campus
infrastructure. In a corporate network with geographically diverse locations
connected by a Wide-Area Network (WAN), it may be imperative to give
some mission-critical traffic higher priority than other types of application
traffic.
Several years ago, it was not uncommon for the average Local-Area
Network (LAN) to consist of a 10-Mbps (megabits per second) Ethernet
segment or perhaps even several of them interconnected by a router or
shared hub. During the same time frame, it was not uncommon for the
19
average WAN link to consist of a 56-Kbps circuit that connected
geographically diverse locations. Given the disparity in bandwidth, only a
very small amount of LAN traffic could be accommodated on the WAN. If
the WAN link was consistently congested and the performance proved to
be unbearable, an organization would most likely get a faster link—
unfortunately, it was also not uncommon for this to occur on a regular
basis. If the organization obtained a T-1 (approximately 1.5 Mbps), and
application traffic increased to consume this link as well, the process
would be repeated. This paradigm follows a corollary to Moore’s Law: As
you increase the capacity of any system to accommodate user demand,
user demand will increase to consume system capacity. On a humorous
note, this is also known as a “vicious cycle.”
Moore’s Law
20
This situation very much depends on the availability of fiber in specific
geographic areas. Some portions of the planet, and even more
specifically, some local exchange carriers within certain areas, seem to be
better prepared to deal with the growing demand for fiber. In some areas
of North America, for example, there may be a six-month lead time for an
exorbitantly expensive DS3/T3 (45 Mbps) circuit, whereas in another area,
an OC48 (2.4 Gb) fiber loop may be relatively inexpensive and can be
delivered in a matter of days.
A Fiber-Optic Bottleneck
There is also a disparity in the availability of devices that can handle these
higher speeds. A couple of years ago, it was common for a high-speed
circuit to consist of a T3 point-to-point circuit between two routers in the
Internet backbone. As bandwidth demands grew, a paradigm shift began
to occur in which ATM was the only technology that could deliver OC3
(155 Mbps) speeds, and router platforms could not accommodate these
speeds. After router vendors began to supply OC3 ATM interfaces for
their platforms, the need for bandwidth increased yet again to OC12 (622
Mbps). This shift could seesaw back and forth like this for the foreseeable
future, but another paradigm shift could certainly occur that levels the
playing field—it is difficult to predict (Figure 1.3).
21
Figure 1.3: Service provider backbone growth curve.
It is also difficult to gauge how traffic patterns will change over time;
paradigm shifts happen at the most unexpected times. Until recently,
network administrators engineered WAN networks under the 80/20 rule,
where 80 percent of the network traffic stays locally within the campus
LAN and only 20 percent of local traffic is destined for other points beyond
the local WAN gateway. Given this rule of thumb, network administrators
usually could design a network with an ear to the ground and an eye on
WAN-link utilization and be somewhat safe in their assumptions.
However, remember that paradigm shifts happen at the most unexpected
times. The Internet “gold rush” of the mid-1990s revealed that the 80/20
rule no longer held true—a larger percentage of traffic was destined for
points beyond the LAN. Some parts of the world are now reporting a
complete reversal of traffic dispersion along a 25/75 pattern, where 75
percent of all delivered traffic is not only non-locally originated but
imported from an international source.
Combine this paradigm shift with the fact that LAN media speeds have
increased 100-fold in recent years and that a 1000-fold increase is right
around the corner. Also consider that although WAN link speeds have also
increased 100-fold, WAN circuits are sometimes economically prohibitive
and may be unavailable in certain locations. Obviously, a WAN link will be
congested if the aggregate LAN traffic destined for points beyond the
gateway is greater than the WAN bandwidth capacity. It becomes
increasingly apparent that the LAN-to-WAN aggregation points have
become a bottleneck and that disparity in LAN and WAN link speeds still
exists.
22
Other Changes Affecting QoS
Other methods of connecting to network services also have changed dramatically. Prior
to the phenomenal growth and interest in the Internet in the early- to mid-1990s,
providing dial-up services was generally a method to accommodate off-site staff
members who needed to connect to network services. There wasn’t a need for a huge
commodity dial-up infrastructure. However, this changed inconceivably as the general
public discovered the Internet and the number of ISPs offering dial-up access to the
Internet skyrocketed. Of course, some interesting problems were introduced as ISPs
tried to capitalize on the opportunity to make money and connect subscribers to the
Internet; they soon found out that customers complained bitterly about network
performance when the number of subscribers they had connected vastly outweighed
what they had provisioned for network capacity. Many learned the hard way that you
can’t put 10 pounds of traffic in a 5-pound pipe.
The development and deployment of the World Wide Web (WWW or the Web) fueled the
fire; as more content sites on the Internet began to appear, more people clamored to get
connected. This trend has accelerated considerably over the last few years to the point
where services once unimaginable now are offered on the Web. The original Web
browser, Mosaic, which was developed at the National Center for Supercomputing
Applications (NCSA), was the initial implementation of the killer application that has since
driven the hunger for network bandwidth over the past few years.
Corporate networks have also experienced this trend, but not quite in the same vein.
Telecommuting—employees dialing into the corporate network and working from home—
enjoyed enormous popularity during this period and still is growing in popularity and
practice. Corporate networks also discovered the value of the Web by deploying
company information, documents, policies, and other corporate information assets on
internal Web servers for easier employee access.
The magic bullet that network administrators now are seeking has fallen into this gray
area called QoS in hopes of efficiently using network resources, especially in situations
where the demand for resources could overwhelm the bandwidth supply. You might think
that if there is no congestion, there is no problem, but eventually, a demand for more
resources will surface, making congestion-control mechanisms increasingly important in
heavily used or oversubscribed networks, which can play a key role in the QoS provided.
Network administrators are looking for mechanisms to identify specific traffic types that
can be treated as first class and other traffic types to be treated as coach. Of course, this
23
is a simplistic comparison, but the analogy for the most part is accurate. Additional
service classes may exist as well, such as second class, third class, and so forth.
Overbooking of the aircraft, which results in someone being “bumped” and having to take
another flight, is analogous to oversubscribing a portion of the network. In terms of
networking, this means that an element’s traffic is dropped when congestion is
introduced.
Selective Forwarding
Most of the methods of providing differentiated CoS until this point have been quite
primitive. As outdated and simplistic as some methods may first appear, many still are
used with a certain degree of success today. To determine whether some of these
simpler approaches are adequate for CoS distinction in a network today, you need to
examine the problem before deciding whether the solution is appropriate. In many cases,
the benefits outweigh the liabilities; in other cases, the exact opposite is true.
In the simplest of implementations, and when parallel paths exist in the network, it would
be ideal for traffic that can be classified as priority to be forwarded along a faster path
and traffic that can be classified as best-effort to be forwarded along a slower path. As
Figure 1.4 shows, higher-priority traffic uses the higher speed and a lower hop-count
path, whereas best-effort traffic uses the lower-speed, higher hop-count path.
This simple scheme of using parallel paths is complicated by the traditional methods in
which traffic is forwarded through a network. Traffic is forwarded based on its destination
is called destination-based forwarding. This also has been referred to as destination-
based routing, but this is a misnomer since a very important distinction must be made
between routing and forwarding. Various perceptions and implementations of QoS/CoS
hinge on each of these concepts.
24
is important to illustrate the properties of basic IP routing protocols and packet-forwarding
mechanisms at this point.
Most distance-vector routing protocols are based on the Bellman-Ford (also known as
Ford-Fulkerson) algorithm that calculates the best, or shortest, path to a particular
destination. A classic example of a routing protocol that falls into this category is RIP, or
Routing Information Protocol [IETF1988], in which the shortest distance between two
points is the one with the shortest hop count or number of routers along the transit path.
Variations of modified distance-vector routing protocols also take into account certain
other metrics in the path calculation, such as link utilization, but these are not germane to
this discussion. With link-state routing protocols, such as OSPF (Open Shortest Path
First) [IETF1994a], each node maintains a database with a topology map of all other
nodes that reside within its portion or area of the network, in addition to a matrix of path
weights on which to base a preference for a particular route. When a change in the
topology occurs that is triggered by a link-state change, each node recalculates the
topology information within its portion of the network, and if necessary, installs another
preferred route if any particular path becomes unavailable. This recalculation process
commonly is referred to as the SPF (Shortest Path First) calculation. Each time a link-
state change occurs, each node in the proximity of the change must recalculate its
topology database.
Tip The SPF calculation is based on an algorithm developed by E.W. Dijkstra. This
algorithm is described in many books, including the very thorough Introduction
to Algorithms, by T. Cormen, C. Leiserson, and R. Rivest (McGraw Hill, 1990).
To complicate matters further, an underlying frame relay or ATM switched network also
can provide routing, but not in the same sense that IP routing protocols provide routing
for packet forwarding. Packet routing and forwarding protocols are connectionless, and
determinations are made on a hop-by-hop basis on how to calculate topology information
and where to forward packets. Because ATM is a connection-oriented technology,
virtual-circuit establishment and call setup is analogous to the way in which calls are
established in the switched telephone network. These technologies do not have any
interaction with higher-layer protocols, such as TCP/IP, and do not explicitly forward
packets. These lower-layer technologies are used to explicitly provide transport—to build
an end-to-end link-layer path that high-layer protocols can use to forward packets. Frame
Relay and ATM routing mechanisms, for example, are transparent to higher-level
protocols and are designed primarily to accomplish VC (virtual circuit) rerouting in
situations where segments of the inter-switch VC path may fail between arbitrary
switches (Figure 1.5). Additionally, the ATM Forum’s specification for the PNNI (Private
Network-to-Network Interface) protocol also provides for QoS variables to be factored
into path calculations, which you will examine in more detail in Chapter 6, “QoS and
ATM.” Frame relay switching environments have similar routing mechanisms, but VC
rerouting is vendor-specific because of the current lack of an industry standard.
25
Tip Detailed PNNI specification documents are available from the ATM Forum at
www.atmforum.com.
Packet Forwarding
Packet forwarding is exactly what it sounds like—the process of forwarding traffic through
the network. Routing and forwarding are two distinct functions (Figure 1.6). Forwarding
tables in a router are built by comparing incoming traffic to available routing information
and then populating a forwarding cache if a route to the destination is present. This
process generally is referred to as on-demand or traffic-driven route-cache population. In
other words, when a packet is received by a router, it checks its forwarding cache to
determine whether it has a cache entry for the destination specified in the packet. If no
cache entry exists, the router checks the routing process to determine whether a route
table entry exists for the specified destination. If a route table entry exists (to include a
default route), the forwarding cache is populated with the destination entry, and the
packet is sent on its way. If no route table entry exists, an ICMP (Internet Control
Message Protocol) [DARPA1981c] unreachable message is sent to the originator, and
the packet is dropped. An existing forwarding cache entry would have been created by
previous packets bound for the same destination. If there is an existing forwarding cache
entry, the packet is sent to the appropriate outbound interface and sent on its way to the
next hop.
26
Tip Currently, several router vendors are working to develop policy-based
forwarding in their products that has a negligible impact on forwarding
performance.
As mentioned previously, the case of forwarding traffic along parallel paths becomes
problematic, because no practical mechanism exists to forward traffic based on origin. As
Figure 1.7 shows, it is trivial to advertise a particular destination or group of destinations
along a particular path and another destination along another path. Forwarding traffic
along parallel paths is easily accomplished by using static routing or filtering dynamic
route propagation. Unfortunately, this method deviates from the desire to forward traffic
based on source.
More elaborate variations on source-based routing are discussed later, but first you should
examine a fundamental problem in using forwarding strategies as a method for
implementing differentiation.
Insurmountable Complexity?
It has been suggested that most of the methods of implementing a QoS scheme are
much too complex and that until the issues surrounding QoS are simplified, a large
majority of network engineering folks won’t even consider the more complicated
mechanisms currently available. Not to cast a shadow on any particular technology, but
all you need to do to understand this trepidation is to explore the IETF Web site to learn
more about RSVP (Resource ReSerVation Setup Protocol) and the Integrated Services
working group efforts within the IETF.
27
produced by the various working groups (and individuals) within the IETF is
documented in Internet Drafts (I-Ds). After the I-Ds are refined to a final document,
they generally are advanced and published as RFCs (Requests For Comments).
Virtually all the core networking protocols in the Internet are published as RFCs,
and more recently, even a few technologies that are not native to the Internet have
been published as RFCs. For more information on the IETF, see www.ietf.org and
for more information on RSVP, see www.ietf.org/html.charters/rsvp-charter.html.
Although most of the technical documentation produced within the IETF has been
complex to begin with, the complexity of older protocols and other technical
documentation pales in comparison to the complexity of the documents recently
produced by the RSVP and Integrated Services working groups. It is quite
understandable why some people consider some of this technology overwhelming.
And there is a good reason why these topics are complex: The technology is difficult. It is
difficult to envision, it is difficult to develop, it is difficult to implement, and it is difficult to
manage. And just as with any sufficiently new technology, it will take some time to evolve
and mature. And it also has its share of naysayers who believe that simpler methods are
available to accomplish approximately the same goals.
You will examine the IETF RSVP and Integrated Services architecture in Chapter 7, but it
is noteworthy to mention at this point that because of the perceived complexity involved,
some people are shying away from attempting to implement QoS using RSVP and
Integrated Services today. Whether the excessive complexity involved in these
technologies is real or imagined is not important. For the sake of discussion, you might
consider that perception is reality.
One of the more unfortunate aspects of both ATM and Frame Relay is that although both
provide a great deal of flexibility, they also make it very easy for people to build sloppy
networks. In very large networks, it is unwise to attempt to make all Layer 3 devices
(routers) only one hop away from one another. Depending on the routing protocol being
used, this may introduce instability into the routing system, as the network grows larger,
because of the excessive computational overhead required to maintain a large number of
adjacencies. You’ll look at scaling and architectural issues in Chapter 2, “Implementing
Policy in Networks,” because these issues may have a serious impact on the ability to
28
provide QoS.
Problems of this nature are due to the “ships in the night” paradigm that exists when
devices in the lower layers of the OSI (Open Systems Interconnection) reference model
stack (in this case, the link layer) also provide buffering and queuing and in some cases,
routing intelligence (e.g., PNNI). This is problematic, because traditional network
management tools are based on SNMP (Simple Network Management Protocol)
[IETF1990a], which is reliant on UDP (User Datagram Protocol) [DARPA1980] and the
underlying presence of IP routing. Each of these protocols (SNMP, UDP, and IP) are at
different layers in the TCP/IP protocol stack, yet there is an explicit relationship between
them; SNMP relies on UDP for transport and an IP host address for object identification,
and UDP relies on IP for routing. Also, a clean, black-and-white comparison cannot be
made of the relationship between the TCP/IP protocol suite and the layers in the OSI
reference model (Figure 1.8).
The problem represented here is one of no interlayer communication between the link-
layer (ATM, frame relay, et al.) and the network-layer and transport-layer protocols (e.g.,
IP, UDP, and TCP)—thus the “ships in the night” comparison. The higher-layer protocols
have no explicit knowledge of the lower link-layer protocols, and the lower link-layer
protocols have no explicit knowledge of higher-layer protocols. Thus, for an organization
to manage both a TCP/IP-based network and an underlying ATM (link-layer) switched
network, it must maintain two discrete management platforms. Also, it is not too difficult
to imagine situations in which problems may surface within the link-layer network that are
not discernible to troubleshooting and debugging tools that are native to the upper-layer
protocols. Although link-layer-specific MIB (SNMP management information base)
variables may exist to help isolate link-layer problems, it is common to be unable to
recognize specific link-layer problems with SNMP. This can be especially frustrating in
situations in which ATM VC routing problems may exist, and the Network Operations
Center (NOC) staff has to quickly isolate a problem using several tools. This is clearly a
situation in which too much diversity can be a liability, depending on the variety and
complexity of the network technologies, the quality of the troubleshooting tools, and the
expertise of the NOC staff.
This problem becomes even more complicated when the network in question is
predominantly multiprotocol, or even when a small amount of multiprotocol traffic exists
in the network. The more protocols and applications present in the network, the more
expertise required of the support staff to adequately isolate and troubleshoot problems in
the network.
29
Differentiated Classes of Service (CoS)
It is possible that when most people refer to quality of service, they usually are referring
to differentiated classes of service, whether or not they realize it, coupled with perhaps a
few additional mechanisms that provide traffic policing, admission control, and
administration. Differentiation is the operative word here, because before you can
provide a higher quality of service to any particular customer, application, or protocol, you
must classify the traffic into classes and then determine the way in which to handle the
various traffic classes as traffic moves throughout the network. This brings up several
important concepts.
When differentiation is performed, it is done to identify traffic by some unique criteria and
to classify the incoming traffic into classes, each of which can be recognized distinctly by
the classification mechanisms at the network ingress point, as well as farther along in the
network topology. The differentiation can be done in any particular manner, but some of
the more common methods of initially differentiating traffic consist of identifying and
classifying traffic by
Protocol. Network and transport protocols such as IP, TCP, UDP, IPX, and so on.
Source protocol port. Application-specific protocols such as Telnet, IPX SAP’s, and
so on, dependent on their source host address.
Source host address. Protocol-specific host address indicating the originator of the
traffic.
Source device interface. Interface on which the traffic entered a particular device,
otherwise known as an ingress interface.
Flow. A combination of the source and destination host address, as well as the
source and destination port.
30
Figure 1.9: Differentiation and active policing.
Based on the preceding criteria, a network edge device can take several courses of
action after the traffic identification and classification is accomplished. One of the
simplest courses of action to take is to queue the various classes of traffic differently to
provide diverse servicing classes. Several other choices are available, such as
selectively forwarding specific classes of traffic along different paths in the network. You
could do this by traditional (or nontraditional) packet-forwarding schemes or by mapping
traffic classes to specific Layer 2 (frame relay or ATM VCs) switched paths in the network
cloud. A variation of mapping traffic classes to multiple Layer 2 switched paths is to also
31
provide different traffic-shaping schemes or congestion thresholds for each virtual circuit
in the end-to-end path. In fact, there are probably more ways to provide differentiated
service through the network than are outlined here, but the important aspect of this
exercise is that the identification, classification, and marking of the traffic is the
fundamental concept in providing differentiated CoS support. Without these basic
building blocks, chances are that any effort to provide any type of QoS will not provide
the desired behavior in the network.
v v v
The first issue a network administrator must tackle, however, is understanding the problem
to be solved. Understanding the problem is paramount in determining the appropriate
solution. There are almost as many reasons for providing and integrating QoS/CoS support
into the network as there are methods of doing so—among them, capacity management
and value-added services. On the other hand, understanding the benefits and liabilities of
each approach can be equally important. Implementing a particular technology to achieve
QoS differentiation without completely understanding the limitations of a particular
implementation can yield unpredictable and perhaps disastrous results.
32
Chapter 2: Implementing Policy in Networks
Overview
QoS Architectures
What architectural choices are available to support QoS environments? To understand
the architecture of the application of QoS in an Internet environment, you first should
understand the generic architecture of an Internet environment and then examine where
and how QoS mechanisms can be applied to the architecture.
The key to scaling very large networks is to maintain strict levels of hierarchy, commonly
33
referred to as core, distribution, and access levels (Figure 2.2). This limits the degree of
meshing among nodes. The core portion of the hierarchy generally is considered the
central portion, or backbone, of the network. The access level represents the outermost
portion of the hierarchy, and the distribution level represents the aggregation points and
transit portions between the core and access levels of the hierarchy.
The core routers can be interconnected by using dedicated point-to-point links. Core
systems also can use high-speed switched systems such as ATM as the transport
mechanism. The core router’s primary task is to achieve packet switching rates that are
as fast as possible. To achieve this, the core routers assume that all security filtering
already has been performed and that the packet to be switched is being presented to the
core router because it conforms to the access and transmis
sion policy of the network. Accordingly, the core router has the task of determining which
interface to pass the packet to, and by removing all other access-related tasks from the
router, higher switching speeds can be achieved. Core routers also have to actively
participate in routing environments, passing connectivity information along within the
routing system, and populating the forwarding tables with routing information based on
the status of the routing information.
Access routers use a different design philosophy. Access routers have to terminate a
number of customer connections, typically of lower speed than the core routing systems.
Access routers do not need to run complete routing systems and can be designed to run
with a smaller routing table and a default route pointing toward a next-hop core router.
34
Access routers generally do not have to handle massive packet switching rates driving
high-speed trunk circuits and typically have to manage a more complex traffic-filtering
environment where the packets to and from the customer are passed though a packet-
filter environment to ensure the integrity of the network environment. A typical filtering
environment checks the source address of each packet passed from the customer and
checks to ensure that the source address of the packet is one of the addresses the
customer has declared an intention to advertise via the upstream provider (to prevent a
form of source address spoofing). The filtering environment also may check packets
passed along to the customer, generally checking the source address to ensure that it is
not one advertised by the customer.
Scaling a large network is also a principal concern, especially at high speeds, and it is an
issue you need to examine carefully in the conceptual design and planning stages.
Striking the right hierarchical balance can make the difference between a high-
performance network and one on the verge of congestion collapse. It is important to
recognize the architectural concepts of applying policy control in a large network. This is
necessary to minimize the performance overhead associated with different mechanisms
that provide traffic and route filtering, bandwidth control (rate-limiting), routing policy,
congestion management, and differentiated classes of service. Scaling a network
properly also plays a role in which quality is maintained in the network. If the network is
unstable because of sloppy design, architecture, or implementation, attempting to
introduce a QoS scheme into the network is an effort in futility.
35
network topology, because mechanisms used to achieve a specific policy may have an
adverse effect on the overall network performance. Because high-speed traffic
forwarding usually is performed in the network core, and lower-speed forwarding
generally is done lower in the network hierarchy, varying degrees of performance impact
can result, depending on where these types of policies are implemented. In most cases,
a much larger percentage of traffic transits the network core than transits any particular
access point, so implementing policy in the network core has a higher degree of impact
on a larger percentage of the traffic. Therefore, policy implementation in large networks
should be done at the lowest possible levels of the hierarchy to avoid performance
degradation that impacts the entire network (noting that in this architectural model, the
access level occupies the lowest level in the hierarchy). Traffic and route filtering, for
example, are types of policies that may fall into this category.
Implementation of policy lower in the hierarchy has a nominal impact on the overall
performance of the network for several reasons compared to the impact the same
mechanism may have when implemented in the core. Some mechanisms, such as traffic
filtering, have less of an impact on forwarding performance on lower-speed lines.
Because the speed of internode connectivity generally gets faster as you go higher in the
network hierarchy, the impact of implementing policy in the higher levels of the network
hierarchy increases.
The same principle holds true for traffic accounting, access control, bandwidth
management, preference routing, and other types of policy implementations. These
mechanisms are more appropriately applied to nodes that reside in the lower portions of
the network hierarchy, where processor and memory budgets are less critical.
If traffic congestion is a principal concern in the distribution and core levels of the network
hierarchy, you should consider the implementation of an intelligent congestion-
management mechanism, such as Random Early Detection (RED) [Floyd1993], in these
portions of the network topology. You’ll examine RED in more detail in Chapter 3, “QoS
and Queuing Disciplines, Traffic Shaping, and Admission Control.”
Another compelling reason to push policy out toward the network periphery is to maintain
stability in the core of the network. In the case of BGP (Border Gateway Protocol)
[IETF1995a] peering with exterior networks, periods of route instability may exist between
exterior routing domains, which can destabilize a router’s capability to forward traffic at an
optimal rate. Pushing exterior peering out to the distribution or access levels of the network
hierarchy protects the core network and minimizes the destabilizing effect on the network’s
capability to provide maximum performance. This fundamental approach allows the overall
network to achieve maximum performance and maintain stability while accommodating the
capability to scale the network to a much larger size and higher speeds, thus injecting a
factor of stability and service quality.
36
precedence. Note that this stateless queuing-based action is relevant only when the
incoming packet rate exceeds the egress line capacity. When the network exhibits lower
load levels, such QoS mechanisms will have little or no impact on traffic performance.
This stateless approach produces a number of drawbacks. First, the universal adoption
of rule sets in every router in the network, as well as the consequent requirement to
check each packet against the rule sets for a match and then modification of the
standard FIFO (First In, First Out) queuing algorithm to allow multiple properties, is highly
inefficient. For this QoS structure to be effective, the core routers may need to be loaded
with the same rule sets as the access routers, and if the rule sets include a number of
specific cases with corresponding queuing instructions, the switching load for each
packet increases dramatically, causing the core network to require higher cost-switching
hardware or to increase the switching time (induce latency). Neither course of action
scales well. Additionally, the operator must maintain consistent rule sets and associated
queuing instructions on all routers on the network, because queuing delay is
accumulated across all routers in the end-to-end path.
Thus, stateless locally configured QoS architectures are prone to two major drawbacks:
the architecture does not scale well into the core of the network, and operational
management overhead can be prone to inconsistencies.
The second approach is a modification to the first, where the access routers modify a
field in the packet header to reflect the QoS service level to be allocated to this packet.
This marking of the packet header can be undertaken by the application of a rule set
similar to that described earlier, where the packet header matching rules can be applied
to each packet entering the network. The consequent action at this stage is to set the IP
precedence header field to the desired QoS level in accordance with the rule set. Within
the core network, the rule set now is based on the IP precedence field value (QoS value)
in every packet header. Queuing instructions or discard directives then can be expressed
in terms of this QoS value in the header instead of attempting to undertake a potentially
lengthy matching of the packet header against the large statically configured rule set.
To extend this example, you can use the 3-bit IP precedence field to represent eight
discrete QoS levels. All access routers could apply a common rule set to incoming
packets so that Usenet packets (matching the TCP—Transmission Control Protocol—
field against the decimal value 119) may have their IP precedence field set to 6, whereas
Telnet and RLOGIN packets have their IP precedence field set to 1. All other packets
have their QoS header field cleared to the value 0, which resets any values that may
have been applied by the customer or a neighbor transit network. The core routers then
can be configured to use a relatively simple switching or queuing algorithm in which all
packets with IP precedence values of 6 are placed in the low-priority queue, all packets
with values of 1 are placed in the high-priority queue, and all other packets are placed
within the normal queuing process. The router’s scheduler may use a relative weighting
of scheduling four packets from the high-precedence queue for every two from the
“normal” queue to every single packet from the low-precedence queue. This architectural
approach attempts to define the QoS level of every packet on entry to the network, and
thereby defines the routers’ scheduling of the packet on its transit through the network.
The task of the core routers can be implemented very efficiently, because the lookup is to
a fixed field in the packet and the potential actions are limited by the size of the field.
The third approach to QoS architecture is to define a flow state within a sequence of
routers, and then mark packets as being a component of the defined flow. The
implementation of flows is similar to switched-circuit establishment, in which an initial
discovery packet containing characteristics of the subsequent flow is passed through the
network, and the response acknowledgment commits each router in the path to support
37
the flow.
The major architectural issues behind QoS networks that maintain flow-state information are
predominantly concerns about the control over the setting of QoS behavior and the
transitivity of QoS across networks, as well as the computational and system-resource
impact on maintaining state information on thousands of flows established across a diverse
network.
Who can trigger QoS behavior within the network? Of the models
discussed in the preceding section, the first two implicitly assume that the
network operator sets the QoS behavior and that the customer has no
control over the setting. This may not be the desired behavior of a QoS
environment. Instead of having the network determine how it will
selectively degrade service performance under load, the network operator
may allow the customer to determine which packets should be handled
preferentially (at some premium fee, presumably). Of course, if customers
can trigger QoS behavior by generating QoS labeled packets, they may
also want to probe the network to determine whether setting the QoS bits
would make any appreciable difference in the performance of the end-to-
end application.
38
Obviously, there is an implicit expectation that if the packet is passed to a
neighbor network, the packet priority field will have a consistent
interpretation, or at worst, the field will not be interpreted at all. If the
neighbor network had chosen an opposite interpretation of field values,
with 1 as low priority and 6 as high priority, obviously, the resultant traffic
flow would exaggerate degradation levels. Because packets passed
quickly through the first network will have a very high probably of delay—
and even discard—through the second network, packets passed with high
delay through the first network without discard then are passed quickly
through the second network.
This type of problem has a number of solutions. One is that the network
operator must translate the QoS field values of all packets being passed
to a neighbor network into corresponding QoS field values that preserve
the original QoS semantics as specified by the originating customer.
Obviously, this type of manipulation of the packet headers involves some
per-packet processing cost in the router. Accordingly, this approach does
not scale easily in very high-speed network interconnect environments. A
second solution is that a number of network operators reach agreement
that certain QoS field values have a fixed QoS interpretation, so that the
originally set QoS values are honored consistently by all networks in the
end-to-end path.
39
Routing and Policy
There are basically two types of routing policy: intradomain and interdomain. Intradomain
policy is implemented within a network that is a single administrative domain, such as
within a single AS (Autonomous System) in the Internet or a single corporate network.
Interdomain policy deals with more than a single administrative network domain, which is
much more difficult, because different organizations usually have diverse policies and,
even when their policies may be similar, they rarely can agree on similar methods of
implementing them. When dealing with policy issues, it is much easier to implement
policy when working within a single organization, where you can control the end-to-end
network paths. When traffic leaves your administrative domain, you lose control over the
interaction of policy and technology.
As an illustration of intradomain routing control, Figure 2.3 shows that from node A, there
are three possible paths to node 10.1.1.15, whereas one path (path B) is a much higher
speed (T3) than the other two (paths A and C), yet a lower-speed link (path A) has a
lower hop count. The path-selection process in this case would very much depend on the
interior routing protocol being used. If RIP (Routing Information Protocol) was the routing
protocol being used, router Z would use path A to forward traffic toward 10.1.1.15,
because the hop count is 5. However, an administrator who maintained the routers in this
network could artificially increase the hop count at any point within this network to force
the path-selection process to favor one path over another.
Figure 2.3: Controlling routing policy within a single administrative domain (RIP).
If OSPF (Open Shortest Path First) is the interior routing protocol being used, each link in
the network can be assigned a cost, whereas OSPF will chose the least-cost path based
on the sum of the end-to-end path cost of each link in the transit path to the appropriate
destination. In this particular example (Figure 2.4), router Z would chose the lowest-cost
40
path (20), because it is consists of a lower end-to-end path cost than the other two paths.
This is fairly straightforward, because this path is also one that is higher bandwidth. For
policy reasons, though, an administrator could alternatively cost each path differently so
that a lower-bandwidth path is preferred over a higher-bandwidth path.
Similar methods of using other interior routing protocol metrics to manually prefer one
path over another are available. It is only important to understand that these mechanisms
can be used to implement an alternative forwarding policy in place of what the native
mechanisms of a dynamic routing protocol may choose without human intervention
(static configuration).
Predictive path selection and control of routing information between different ASs also is
important to understand; more important, it is vital to understand what is possible and
what is not possible to control. As mentioned earlier, it is rarely possible to control path
selection for traffic coming into your AS because of the routing policies of exterior
neighboring ASs. There are several ways to control how traffic exits an interior routing
domain, and there are a couple of ways to control how traffic enters your domain that
may prove feasible in specific scenarios. You will briefly look at a few of these scenarios
in a moment.
41
CIDR’s Influence
A fundamental concept in route aggregation was introduced by CIDR (Classless
InterDomain Routing), which allows for contiguous blocks of address space to be
aggregated and advertised as singular routes instead of individual entries in a routing
table. This helps keep the size of the routing tables smaller than if each address were
advertised individually and should be done whenever possible. The Internet community
generally agrees that aggregation is a good thing.
Tip The concepts of CIDR are discussed in RFC1517, RFC1518, RFC1519, and
RFC1520. You can find the CIDR Frequently Asked Questions (FAQ) at
www.rain.net/faqs/cidr.faq.html.
Tip For a comparison of classful and classless address notation, see RFC1878,
“Variable Length Subnet Table For IPv4” (T. Pummill and B. Manning,
December 1995).
However, CIDR does reduce the granularity of routing decisions. In aggregating a number
of destinations into a single routing advertisement, routing policy then can be applied only to
the aggregated routing entry. Specific exceptions for components of the aggregated routing
block can be generated by advertising the exception in addition to the aggregate, but the
cost of this disaggregation is directly related to the extent by which this differentiation of
policy should be advertised. A local differentiation within a single provider would impose a
far lower incremental cost than a global differentiation that was promulgated across the
entire Internet, but in both cases the cost is higher than no differentiation at all. This conflict
between the desire for fine-level granularity of differentiated routing policy and the desire to
42
deploy very coarse granularity to preserve functionality in the face of massive scaling is very
common within today’s Internet.
There are several methods of influencing the BGP path-selection process, each of which
has topological significance. In other words, these tools have very explicit purposes and
are helpful only when used in specific places in the network topology. These tools
revolve around the manipulation of certain BGP protocol attributes. Certain BGP attribute
tools have very specific uses in some situations and are inappropriate in others. Also,
some of these tools are nothing more than simply tie-breakers to be used when the same
prefix (route) is advertised across parallel paths. A path for which a more specific prefix is
announced generally will always be preferred over a path that advertises shorter
aggregate. This is a fundamental aspect of longest-match, destination-based routing.
The tools that can provide path-selection influence are:
• AS path prepending
• BGP communities
• Local preference
As depicted in Figure 2.6, a route for the prefix 199.1.1.0/24 that has traversed AS200 to
reach AS100 will have an AS path of 200 300 when it reaches AS100. This is because
the route originated in AS300. If multiple prefixes were originated within AS300, the route
propagation of any particular prefix could be blocked, or filtered, virtually anywhere along
the transit path based on the AS, in order to deny a forwarding path for a particular prefix
or set of prefixes. The filtering could be done on an inbound (accept) or outbound
(propagate) basis, across multiple links, based on any variation of the data found in the
AS path attribute. Filtering can be done on the origin AS, any single AS, or any set of
ASs found in the AS path information. This provides a great deal of granularity for
selectively accepting and propagating routing information in BGP. By the same token,
however, if an organization begins to get exotic with its routing policies, the level of
complexity increases dramatically, especially when multiple entry and exit points exist.
The downside to filtering routes based on information found in the AS path attribute is
that it only really provides binary selection criteria; you either accept the route and
43
propagate it, or you deny it and do not propagate it. This does not provide a mechanism
to define a primary and backup path.
AS Path Prepend
When comparing two advertisements of the same prefix with differing AS paths, the
default action of BGP is to prefer the path with the lowest number of transit AS hops, or
in other words, the preference is for the shorter AS path length. To complicate matters
further, it also possible to manipulate the AS path attributes in an attempt to influence this
form of downstream route selection. This process is called AS path prepend and is
adopted from the practice of inserting additional instances of the originating AS into the
beginning of the AS path prior to announcing the route to an exterior neighbor. However,
this method of attempting to influence downstream path selection is feasible only when
comparing prefixes of the same length, because an instance of a more specific prefix
always is preferable. In other words, in Figure 2.7, if AS400 automatically aggregates
199.1.1.0/24 into a larger announcement, say 199.1.1.0/23, and advertises it to its
neighbor in AS100, while 199.1.1.0/24 is advertised from AS200, then AS200 continues
to be the preferred path because it is a more specific route to the destination
199.1.1.0/24. The AS path length is examined only if identical prefixes are advertised
from multiple external peers.
Similar to AS path filtering, AS path prepending is a negative biasing of the BGP path-
selection process. The lengthening of the AS path is an attempt to make the path less
desirable than would otherwise be the case. It commonly is used as a mechanism of
defining candidate backup paths.
BGP Communities
Another popular method of making policy decisions using BGP is to use the BGP
community attribute to group destinations into communities and apply policy decisions
based on this attribute instead of directly on the prefixes. Recently, this mechanism has
grown in popularity, especially in the Internet Service Provider (ISP) arena, because it
provides a simple and straightforward method with which to apply policy decisions. As
depicted in Figure 2.8, AS100 has classified its downstream peers into BGP communities
100:10 and 100:20. These values need only have meaning to the administrator of
44
AS100, because he is the one using these community values for making policy
decisions. He may have a scheme for classifying all his peers into these two
communities, and depending on a prearranged agreement between AS100 and all its
peers, these two communities may have various meanings.
One of the more common applications is one in which community 100:10 could represent
all peers who want to receive full BGP routing from AS100, and community 100:20 could
represent all peers who want only partial routing. The difference in semantics here is that
full routing includes all routes advertised by all neighbors (generally considered the entire
default-free portion of the global Internet routing table), and partial includes only prefixes
originated by AS100, including prefixes originated by directly attached customer
networks. The latter could be a significantly smaller number of prefixes, depending on the
size and scope of AS100’s customer base.
BGP communities are very flexible tools, allowing groups of routing advertisements to be
grouped by a common attribute value specific to a given AS. The community can be set
locally within the AS or remotely to trigger specific actions within the target AS. You then
can use the community attribute value in a variety of contexts, such as defining backup
or primary paths, defining the scope of further advertisement of the path, or even
triggering other router actions, such as using the community attribute as an entry
condition for rewriting the TOS (Type of Service) or precedence field of IP packets, which
consequently allows a model of remote triggering of QoS mechanisms. This use of BGP
communities extends beyond manipulation of the default forwarding behavior into an
extended set of actions that can modify any of the actions undertaken by the router.
As an example, look at Figure 2.9. Here, the AS border routers B and C would propagate
a local preference of 100 and 150, respectively, to other iBGP (Internal BGP) peers
within AS100. Because the local preference with the highest value is preferred when
multiple paths for the same prefix exist, router A would prefer routes passed along from
router B when considering how to forward traffic. This is especially helpful when an
administrator prefers to send the majority of the traffic exiting the routing domain along a
higher speed link, as shown in Figure 2.9, when prefixes of the same length are being
advertised internally from both gateway routers. The downside to this approach,
however, is that when an administrator wants equitable use among parallel paths, a
single path will be used heavily, and others with a lower local preference will get very
little, if any, use. Depending on what type of links these are (local or wide area), this may
be an acceptable situation. The primary consideration observed here may be one of
economics—exorbitant recurring monthly costs for a wide-area circuit may force an
45
administrator to consider other alternatives. However, when network administrators want
to get traffic out of their network as quickly as possible (also known as hot-potato
routing), this may indeed be a satisfactory approach. The operative word in this approach
is local. As mentioned previously, this attribute is not passed along to exterior ASs and
only affects the path in which traffic is forwarded as it exits the local routing domain.
Once again, it is important to consider the level of service quality you are attempting to
deliver.
iBGP is not a different protocol from what is traditionally known simply as BGP. It is just a
specific application of the BGP protocol within an AS. In this model, the network interior
routing (e.g., OSPF, dual IS-IS) carries next-hop and interior-routing information, and
iBGP carries all exterior-routing information.
Using iBGP
Two basic principles exist for using iBGP. The first is to use BGP as a mechanism to
carry exterior-routing information along the transit path between exit peers within the
same AS, so that internal traffic can use the exterior-routing information to make a more
intelligent decision on how to forward traffic bound for external destinations. The second
principle is that this allows a higher degree of routing stability to be maintained in the
interior network. In other words, it obviates the need to redistribute BGP into interior
routing. As depicted in Figure 2.10, exterior-routing information is carried along the transit
path, between the AS exit points, via iBGP. A default route could be injected into the
interior routing system from the backbone transit-path routers and redistributed to the
lower portions of the hierarchy, or it could be statically defined on a hop-by-hop basis so
that packets bound for destinations not found in the interior-routing table (e.g., OSPF) are
forwarded automatically to the backbone, where a recursive route lookup would reveal
the appropriate destination (found in the iBGP table) and forwarding path for the
destination.
46
Figure 2.10: Internal BGP (iBGP).
Scaling iBGP becomes a concern if the number of iBGP peers becomes large, because
iBGP routers must be fully meshed from a peering perspective due to iBGP
readvertisement restrictions to prevent route looping. Quantifying large can be difficult
and can vary depending on the characteristics of the network. It is recommended to
monitor the resources on routers on an ongoing basis to determine whether scaling the
iBGP mesh is starting to become a problem. Before critical mass is reached with the
single-level iBGP mesh, and router meltdown occurs, it is necessary to introduce iBGP
route reflectors to create a small iBGP core mesh and iBGP peers that are local reflector
clients from a core mesh iBGP participant.
The primary utility of the MED is to provide a mechanism with which a network
administrator can indicate to a neighboring AS which link he prefers for traffic entering his
administrative domain when multiple links exist between the two ASs, each advertising
the same length prefix. Again, MED is a tie-breaker. As illustrated in Figure 2.11, for
example, if the administrator in AS100 wants to influence the path selection of AS200 in
relation to which link it prefers to use for forwarding traffic for the same prefix, he passes
a lower MED to AS200 on the link that connects routers A and B. It is important to ensure
that the neighboring AS is actually honoring MEDs, because they may have a routing
policy in conflict with a local administrator’s (AS100, in this case).
47
Each of these tools provides unique methods of influencing route and path selection at
different points in the network. By using these tools with one another, a network
administrator can have a higher degree of control over local and neighbor path selection
than if left completely to a dynamic routing protocol. However, these BGP knobs do not
provide a great deal of functionality, including definitive levels of QoS. Path-selection
criteria can be important in providing QoS, but it is only a part of the big picture. The
importance of this path-selection criteria should not be underestimated, however,
because more intelligent methods of integrating path selection based on QoS metrics are
not available today.
Asymmetric Routing
A brief note is appropriate at this point on the problems with asymmetric routing. Traffic in
the Internet (and other networks, for that matter) is bi-directional—a source has a path to
its destination and a destination to the source. Due to the fact that different organizations
may have radically different routing policies, it is not uncommon for traffic to traverse a
path in its journey back to its source entirely different from the path it traversed on its
initial journey to the destination. This type of asymmetry may cause application problems,
not explicitly because of the asymmetry aspects, but sometimes because of vastly
different characteristics of the paths. One path may be low latency, whereas another may
be extraordinarily high. One path may have traffic filters in place that restrict certain types
of traffic, whereas another path may not. This is not always a problem, and most
applications used today are unaffected. However, as new and emerging multimedia
applications, such as real-time audio and video, become more reliant on predictable
network behavior, this becomes increasingly problematic. The same holds true for IP-
level signaling protocols, such as RSVP (Resource ReSerVation Setup Protocol), that
require symmetry to function.
Policy, Routing, and QoS
A predictive routing system is integral to QoS—so much so that currently resources are
being expended on examining ways to integrate QoS metrics and path-selection criteria
into the routing protocols themselves. Several proposals are examined in Chapter 9,
“QoS and Future Possibilities,” but this fact alone is a clear indication that traditional
methods of routing lack the characteristics necessary for advanced QoS functionality.
These characteristics include monitoring of available bandwidth and link use on parallel
end-to-end paths, stability metrics, measured jitter and latency, and instantaneous path
selection based on these measurements.
This can be distilled into the two questions of whether link cost metrics should be fixed or
variable, and whether a link cost is represented as a vector of costs or within a single
value.
In theory, using variable link costs results in optimal routing that attempts to route around
congestion in the same way as current routing protocols attempt to route around
damage. Instead of seeing a link with an infinite cost (down) or a predetermined cost
(up), the variable cost model attempts to vary the cost depending on some form of
calculation of availability. In the 1977 ARPAnet routing model, the link cost was set to the
outcome of a PING, in other words, setting the link metric to the sum of the propagation
delay plus the queuing and processing delays. This approach does have very immediate
feedback loops, such that the selection of an alternate route as the primary path due to
transient congestion of the current primary path shifts the congestion to the alternate
path, which during the next cycle of link metric calculation yields the selection of the
original primary path—and soon.
48
Not unsurprisingly, the problem of introducing such feedback systems into the routing
protocol results in highly unstable oscillation and lack of convergence of the routing
system. Additionally, you need to factor in the cost of traffic disruption where the network
topology is in a state of flux because of continual route metric changes. In this
unconverged state, the probability of poor routing decisions increases significantly, and
the consequent result may well be a reduced level of network availability. The
consequent engineering problem in this scenario is one of defining a methodology of
damping the link cost variation to reduce the level of route oscillation and instability.
To date, no scaleable routing tools have exhibited a stable balance between route
stability and dynamic adaptation to changing traffic-flow profiles. The real question is
perhaps not the viability of this approach but one simply of return on effort. As Radia
Perlman points out, “I believe that the difference in network capacity between the
simplest form of link cost assignment (fixed values depending on delay and total
bandwidth of the link) and ‘perfect routing’ (in which a deity routes every packet optimally,
taking into account traffic currently in the network as well as traffic that will enter the
network in the near future) is not very great” [Perlman1992].
The other essential change in the route-selection process is that of using a vector of link
metrics, such as that defined in the IS-IS protocol. A vector metric could be defined as
the 4-tuple of (default, delay, cost, bandwidth), for example, and a packet could be routed
according to a converged topology that uses the delay metric if the header TOS field of
the packet specifies minimum delay. Of course, it does admit the less well-defined
situation in which a combination of metrics is selected, and effectively if there are n
specific metrics in the link metric vector, this defines 2n +1 distinct network topologies
that must be computed. Obviously, this is not a highly scaleable approach in terms of
expansion of the metric set, nor does it apply readily to large networks where
convergence of a single metric is a performance issue, and increasing the convergence
calculation by 202286321n. Again, Radia’s comments are relevant here, where she
notes that, “Personally, I dislike the added complexity of multiple metrics....I think that
any potential gains in terms of better routes will be more than offset by the overhead and
the likelihood of extreme sub-optimality due to misconfiguration or misunderstanding of
the way multiple metrics work” [Perlman1992].
Regardless of the perceived need for the integration of QoS and routing, routing as a
policy tool is more easily implemented and tenable when approached from an
intradomain perspective; from an interdomain perspective, however, there are many
barriers that may prevent successful policy implementation. This is because, within a
single routing domain, a network administrator can control the end-to-end network
characteristics. Once the traffic leaves your administrative domain, all bets are off—there
is no guarantee that traffic will behave as you might expect and, in most cases, will not
behave in any predictable manner.
Measuring QoS
Given that there are a wide variety of ways to implement QoS within a network, the
critical issue is how to measure the effectiveness of the QoS implementation chosen.
This section provides an overview of the techniques and potential approaches to
measuring QoS in the network.
49
be obtained directly from the various components of the network and what
measurements can be derived from these primary data points is at the forefront of the
QoS measurement problem.
A customer of a QoS network service may want to measure the difference in service
levels between a QoS-enabled transaction and a non-QoS-enabled transaction. The
customer also might want to measure the levels of variability of a particular transaction
and derive a metric of the constancy of the QoS environment. If the QoS environment is
based on a reservation of resources, the customer might want to measure the capability
to consume network resources up to the maximum reservation level to determine the
effectiveness of the reservation scheme.
One of the major factors behind the motivations for measurement of the QoS
environment is to determine how well it can deliver services that correspond to the
particular needs of the customer. In effect, this is measuring the level of consistency
between the QoS environment and the subscriber needs. Second, the customer is
interested in measuring the success or failure of the network provider in meeting the
agreed QoS parameters. Therefore, the customer is highly motivated to perform
measurements that correspond to the metrics of the particular QoS environment
provided.
The network operator also is keenly interested in measuring the QoS environment, for
perhaps slightly different reasons than the customer. The operator is interested in
measuring the level of network resources allocated to the QoS environment and the
impact this allocation has on non-QoS resource allocations. The operator also is keenly
interested in measuring the performance of the QoS environment as delivered to the
customer, to ensure that the QoS service levels are within the parameters of a service
contract. Similarly, the network operator is strongly motivated to undertake remote
measurements of peer-networked environments to determine the transitivity of the QoS
architecture. If there are QoS peering contracts with financial settlements, the network
operator is strongly motivated to measure the QoS exchange to ensure that the level of
resources consumed by the QoS transit is matched by the level of financial settlement
between the providers.
Not all measurements can be undertaken by all agents. Although a network operator has
direct access to the performance data generated by the networking switching elements
within the network, the customers generally do not enjoy the same level of access to this
data. The customers are limited to measuring the effects of the network’s QoS
architecture by measuring the performance data on the end-systems that generate
network transactions and cannot see directly the behavior of the network’s internal
switches. The end-system measurement can be undertaken by measuring the behavior
of particular transactions or by generating probes into the network and measuring the
responses to these probes. For remote network measurements, the agent, whether a
remote operator or a remote customer, generally is limited to these end-system
measurement techniques of transaction measurement or probe measurement.
Accordingly, there is no single means of measuring QoS and no resulting single QoS
metric.
50
Measuring QoS environments can be regarded as a special case of measuring
performance in an Internet environment, and some background in this subject area is
appropriate. Measurement tools must be based on a level of understanding of the
architecture being measured if it is to be considered an effective tool.
Tip There has been a recent effort within the IETF (Internet Engineering Task
Force) to look at the issues of measurement of Internet networks and their
performance as they relate to customer performance, as well as from the
perspective of the network operator. The IPPM (IP Provider Metric) working
group is active; its charter is available at www.ietf.org. Activities of this working
group are archived at www.advanced.org/IPPM. A taxonomy of network
measurement tools is located at www.caida.org/Tools/taxonomy.html.
The effectively random nature of packet drops in a multiuser environment and the
relatively indeterminate nature of cumulative queuing hold times across a series of
switches imply that the measurement of the performance of any Internet network does
not conform to a precise and well-defined procedure. The behavior of the network is
based on the complex interaction of the end-systems generating the data transactions
according to TCP behavior or UDP (User Datagram Protocol) application-based traffic
51
shaping, as well as the behavior of the packet-forwarding devices in the interior of the
network.
The simplest of the probe tools is the PING probe. PING is an ICMP (Internet Control
Message Protocol) echo request packet directed at a particular destination, which in turn
responds with an ICMP echo reply. (You can find an annotated listing of the PING utility
in W. Richard Stevens’ UNIX Network Programming, Prentice Hall, 1990.) The system
generating the packet can determine immediately whether the destination system is
reachable. Repeated measurement of the delay between the sending of the echo request
packet can infer the level of congestion along the path, given that the variability in ping
transaction times is assumed to be caused by switch queuing rather than route instability.
Additionally, the measurement of dropped packets can illustrate the extent to which the
congestion level has been sustained over queue thresholds.
One variation of the PING technique is to extend the probe from the simple model of
measuring the probe and its matching response to a model of emulating the behavior of
the TCP algorithm, sending a sequence of probes in accordance with TCP flow-control
algorithms. TRENO (www.psc.edu/~pscnoc/treno.html) is one such tool which, by
emulating a TCP session in its control of the emission of probes, provides an indication
of the available path capacity at a particular point in time.
52
data. The other weakness of this approach is that a probe is typically not part of a normal
TCP traffic flow, and if the QoS mechanism is based on tuning the switch performance
for TCP flow behavior, the probe will not readily detect the QoS mechanism in operation
in real time. If the QoS mechanism being deployed is a Random Early Detection (RED)
algorithm in which the QoS mechanism is a weighting of the discard algorithm, for
example, network probes will not readily detect the QoS operation.
The other approach is to measure repetitions of a known TCP flow transaction. This
could be the measurement of a file transfer, for example—measuring the elapsed time to
complete the transfer or a WWW (World Wide Web) download. Such a measurement
could be a simple measurement of elapsed time for the transaction; or it could undertake
a deeper analysis by looking at the number of retransmissions, the variation of the TCP
Round-Ttrip Timers (RTT) across the duration of the flow, the interpacket arrival times,
and similar levels of detail. Such measurements can indicate the amount of distortion of
the original transmission sequence the network imposed on the transaction. From this
data, a relatively accurate portrayal of the effectiveness of a QoS mechanism can be
depicted. Of course, such a technique does impose a heavier load on the total system
than simple short probes, and the risk in undertaking large-scale measurements based
on this approach is that the act of measuring QoS has a greater impact on the total
network system and therefore does not provide an accurate measurement.
Although such an approach can produce a set of metrics that can determine availability
of network resources, there is still a disconnect with these measurements and the user
level performance because the operator’s measurements cannot directly measure end-
to-end data performance.
From a network router perspective, the introduction of QoS behavior within the network
may be implemented as an adjustment of queue-management behavior so that data
packets are potentially reordered within the queue and potentially dropped completely.
The intention is to reduce (or possibly increase) the latency of various predetermined
classes of packets by adjusting their positions in the output buffer queue or to signal a
request to slow down the end-to-end protocol transmission rate by using packet discard.
53
When examining this approach further, it is apparent that QoS is apparent only when the
network exhibits load. In the absence of queuing in any of the network switches, there is
no switch-initiated reordering of queued packets or any requirement for packet discard.
As a consequence, no measurable change is attributable to QoS mechanisms. In such
an environment, end-system behavior dictates the behavior of the protocol across the
network. When segments of the network are placed under load, the affected switches
move to a QoS enforcement mode of behavior. In this model of QoS implementation, the
entire network does not switch operating modes. Congestion is not systemic but instead
is a local phenomenon that may occur on a single switch in isolation. As a result, end-to-
end paths that do not traverse a loaded network may never have QoS control imposed
anywhere in the transit path.
Of course, the measurement of a QoS structure is dependent on the precise nature of the
QoS mechanism used. Some forms of QoS are absolute precedence structures, in which
packets that match the QoS filters are given absolute priority over non-QoS traffic. Other
mechanisms are self-limited in a manner similar to Frame Relay Committed Information
Rate (CIR), where precedence is given to traffic that conforms to a QoS filter up to a defined
level and excess traffic is handled on a best-effort basis. The methodology of measuring the
effectiveness of QoS in these cases is dependent on the QoS structure; the measurements
may be completely irrelevant or, worse, misleading when applied out of context.
There are two primary methods of measuring QoS traffic in a network: intrusive and
nonintrusive. Each of these approaches is discussed in this section.
Nonintrusive Measurements
The nonintrusive measurement method measures the network behavior by observing the
packet-arrival rate at an end-system and making some deduction on the state of the
network, thereby deducing the effectiveness of QoS on the basis of these observations.
In general, this practice requires intimate knowledge of the application generating the
data flows being observed so that the measurement tool can distinguish between
remote-application behavior and moderation of this remote behavior by a network-
imposed state. Simply monitoring one side of an arbitrary data exchange does not yield
much information of value and can result in wildly varying interpretations of the results.
For this reason, single-ended monitoring as the basis of measurement of network
performance and, by inference, QoS performance is not recommended as an effective
approach to the problem. However, you can expect to see further deployment host-based
monitoring tools that look at the behavior of TCP flows and the timing of packets within
the flow. Careful interpretation of the matching of packets sent with arriving packets can
54
offer some indication as to the extent of queuing-induced distortion within the network
path to the remote system. This interpretation also can provide an approximate indication
of the available data capacity of the path, although the caveat here is that the
interpretation must be done carefully and the results must be reported with considerable
caution.
Intrusive Measurements
Intrusive measurements refer to the controlled injection of packets into the network and
the subsequent collection of packets (the same packet or a packet remotely generated
as a result of reception of the original packet). PING—the ICMP echo request and echo
reply—packet exchange is a very simple example of this measurement methodology. By
sending sequences of PING packets at regular intervals, a measurement station can
measure parameters such as reachability, the transmission RTT to a remote location,
and the expectation of packet loss on the round-trip path. By making a number of
secondary assumptions regarding queuing behavior within the switches, combined with
measurement of packet loss and imposed jitter, you can make some assumptions about
the available bandwidth between the two points and the congestion level. Such
measurements are not measurements of the effectiveness with regard to QoS, however.
To measure the effectiveness of a QoS structure, you must undertake a data transaction
typical of the network load and measure its performance under controlled conditions. You
can do this by using a remotely triggered data generator. The requirement methodology
is to measure the effective sustained data rate, the retransmission rate, the stability of
the RTT estimates, and the overall transaction time. With the introduction of QoS
measures on the network, a comparison of the level of degradation of these metrics from
an unloaded network to one that is under load should yield a metric of the effectiveness
of a QoS network.
The problem is that QoS mechanisms are visible only when parts of the network are
under resource contention, and the introduction of intrusive measurements into the
system further exacerbates the overload condition. Accordingly, attempts to observe the
dynamic state of a network disturbs that state, and there is a limit to the accuracy of such
network state measurements. Oddly enough, this is a re-expression of a well-understood
physical principle: Within a quantum physical system, there is an absolute limit to the
accuracy of a simultaneous measurement of position and momentum of an electron.
Werner Heisenberg described the general principle in 1927 with The Heisenberg
Uncertainty Principle, which relates the limit of accuracy of simultaneous
measurement of a body’s position and velocity to Planck’s constant, h.
This observation is certainly an appropriate comparison for the case of networks in which
intrusive measurements are used. No single consistent measurement methodology is
55
available that will produce deterministic measurements of QoS performance, so
effectively measuring QoS remains a relatively imprecise area and one in which popular
mythology continues to dominate.
Why, then, is measurement so important? The major reason appears to be that the
implementation of QoS within a network is not without cost to the operator of the network.
This cost is expressed both in terms of the cost of deployment of QoS technology within
the routers of the network and in terms of cost of displacement of traffic, where
conventional best-effort traffic is displaced by some form of QoS-mediated preemption of
network resources. The network operator probably will want to measure this cost and
apply some premium to the service fee to recoup this additional cost. Because customers
of the QoS network service probably will be charged some premium for use of a QoS
service, they undoubtedly will want to measure the cost effectiveness of this investment
in quality. Accordingly, the customer will be well motivated to undertake comparative
measurements that will indicate the level of service differentiation obtained by use of the
QoS access structures.
Despite this wealth of motivation for good measurement tools, such tools and techniques
are still in their infancy, and it appears that further efforts must be made to understand the
protocol-directed interaction between the host and the network, the internal interaction
between individual elements of the network, and the moderation of this interaction through
the use of QoS mechanisms before a suite of tools can be devised to address these needs
of the provider and the consumer.
56
Chapter 3: QoS and Queuing Disciplines,
Traffic Shaping, and Admission Control
Overview
57
Managing Queues
The choices of the algorithm for placing packets into a queue and the maximal length of
the queue itself may at first blush appear to be relatively simple and indeed trivial
choices. However, you should not take these choices lightly, because queue
management is one of the fundamental mechanisms for providing the underlying quality
of the service and one of the fundamental mechanisms for differentiating service levels.
The correct configuration choice for queue length can be extraordinarily difficult because
of the apparently random traffic patterns present on a network. If you impose too deep a
queue, you introduce an unacceptable amount of latency and RTT (Round-Trip Time)
jitter, which can break applications and end-to-end transport protocols.
Not to mention how unhappy some users may become if the performance of the network
degenerates considerably. If the queues are too shallow, which sometimes can be the
case, you may run into the problem of trying to dump data into the network faster than
the network can accept it, thus resulting in a significant amount of discarded packets or
cells. In a reliable traffic flow, such discarded packets have to be identified and
retransmitted in a sequence of end-to-end protocol exchanges. In a real-time unreliable
flow (such as audio or video), such packet loss is manifested as signal degradation.
Queuing occurs as both a natural artifact of TCP (Transmission Control Protocol) rate
growth (where the data packets are placed onto the wire in a burst configuration during
slow start and two packets are transmitted within the ACK timing of a single packet’s
reception) and as a natural artifact of a dynamic network of multiple flows (where the
imposition of a new flow on a fully occupied link will cause queuing while the flow rates all
adapt to the increased load). Queuing also can happen when a Layer 3 device (a router)
queues and switches traffic toward the next-hop Layer 3 device faster than the
underlying Layer 2 network devices (e.g., frame relay or ATM switches) in the transit path
can accommodate. Some intricacies in the realm of queue management, as well as their
effect on network traffic, are very difficult to understand.
Tip Len Kleinrock is arguably one of the most prolific theorists on traffic queuing
and buffering in networks. You can find a bibliography of his papers at
millennium.cs.ucla.edu/LK/Bib/.
In any event, the following descriptions of the queuing disciplines focus on output
queuing strategies, because this is the predominate strategic location for store-and-
forward traffic management and QoS-related queuing.
FIFO Queuing
FIFO (First In, First Out) queuing is considered to be the standard method for store-and-
forward handling of traffic from an incoming interface to an outgoing interface. For the
sake of this discussion, however, you can consider anything more elaborate than FIFO
queuing to be exotic or “abnormal.” This is not to say that non-FIFO queuing
mechanisms are inappropriate—quite the contrary. Non-FIFO queuing techniques
certainly have their merit and usefulness. It is more an issue of knowing what their
limitations are, when they should be considered, and perhaps more important,
understanding when they should be avoided.
As Figure 3.2 shows, as packets enter the input interface queue, they are placed into the
appropriate output interface queue in the order in which they are received—thus the
name first in, first out.
58
Figure 3.2: FIFO queuing.
FIFO queuing usually is considered default behavior, and many router vendors have
highly optimized forwarding performances that make this standard behavior as fast as
possible. In fact, when coupled with topology-driven forwarding cache population, this
particular combination of technologies quite possibly could be considered the fastest of
technology implementation available today as far as packets-per-second forwarding is
concerned. This is because, over time, developers have learned how to optimize the
software to take advantage of simple queuing technologies. When more elaborate
queuing strategies are implemented instead of FIFO, there is a strong possibility that
there may very well be some negative impact on forwarding performance and an
increase (sometimes dramatically) on the computational overhead of the system. This
depends, of course, on the queuing discipline and the quality of the vendor
implementation.
When the load on the network increases, the transient bursts cause significant queuing
delay (significant in terms of the fraction of the total transmission time), and when the
queue is fully populated, all subsequent packets are discarded. When the network
operates in this mode for extended periods, the offered service level inevitably
degenerates. Different queuing strategies can alter this service-level degradation,
allowing some services to continue to operate without perceptible degradation while
imposing more severe degradation on other services. This is the fundamental principle of
using queue management as the mechanism to provide QoS arbitrated differentiated
services.
Priority Queuing
One of the first queuing variations to be widely implemented was priority queuing. This is
based on the concept that certain types of traffic can be identified and shuffled to the
front of the output queue so that some traffic always is transmitted ahead of other types
of traffic. Priority queuing certainly could be considered a primitive form of traffic
differentiation, but this approach is less than optimal for certain reasons. Priority queuing
may have an adverse effect on forwarding performance because of packet reordering
(non-FIFO queuing) in the output queue. Also, because the router’s processor may have
to look at each packet in detail to determine how the packet must be queued, priority
queuing also may have an adverse impact on processor load. On slower links, a router
has more time to closely examine and manipulate packets. However, as link speeds
increase, the impact on the performance of the router becomes more noticeable.
59
As shown in Figure 3.3, as packets are received on the input interface, they are
reordered based on a user-defined criteria as to the order in which to place certain
packets in the output queue. In this example, high-priority packets are placed in the
output queue before normal packets, which are held in packet memory until no further
high-priority packets are awaiting transmission.
Another possible vulnerability in this queuing approach is that if the volume of high-
priority traffic is unusually high, normal traffic waiting to be queued may be dropped
because of buffer starvation—a situation that can occur for any number of reasons.
Buffer starvation usually occurs because of overflow caused by too many packets waiting
to be queued and not enough room in the queue to accommodate them. Another
consideration is the adverse impact induced latency may have on applications when
traffic sits in a queue for an extended period. It sometimes is hard to calculate how non-
FIFO queuing may inject additional latency into the end-to-end round-trip time. In a
worst-case scenario, some applications may not function correctly because of added
latency or perhaps because some more time-sensitive routing protocols may time-out
due to acknowledgments not being received within a predetermined period of time.
The problem here is well known in operating systems design; absolute priority scheduling
systems can cause complete resource starvation to all but the highest-priority tasks.
Thus, the use of priority queues creates an environment where the degradation of the
highest-priority service class is delayed until the entire network is devoted to processing
only the highest-priority service. The side-effect of this resource preemption is that the
lower levels of service are starved of system resources very quickly, and during this
phase of reallocation of critical resources, the total effective throughput of the network
degenerates dramatically.
60
In any event, priority queuing has been used for a number of years as a primitive method
of differentiating traffic into various classes of service. Over the course of time, however,
it has been discovered that this mechanism simply does not scale to provide the desired
performance at higher speeds.
The configuration in Figure 3.4, for example, has created three buffers: high, medium,
and low. The router could be configured to service 200 bytes from the high-priority
queue, 150 bytes from the medium-priority queue, and then 100 bytes from the low-
priority queue on each rotation. After traffic in each queue is processed, packets continue
to be serviced until the byte count exceeds the configured threshold or the current queue
is empty. In this fashion, traffic that has been categorized and classified to be queued
into the various queues have a reasonable chance of being transmitted without inducing
noticeable amounts of latency and allowing the system to avoid buffer starvation. CBQ
also was designed with the concept that certain classes of traffic, or applications, may
need minimal queuing latency to function properly; CBQ provides the mechanisms to
configure how much traffic can be drained off each queue in a servicing rotation,
providing a method to ensure that a specific class does not sit in the outbound queue for
too long. Of course, an administrator may have to fumble around with the various queue
parameters to gauge whether the desired behavior is achieved. The implementation may
be somewhat hit and miss.
61
resource allocation, because the resource level is reduced for low-precedence traffic and
the end-systems still can receive the signal of the changing state of the network and
adapt their transmission rates accordingly.
CBQ also can be considered a primitive method of differentiating traffic into various
classes of service, and for several years, it has been considered a reasonable method of
implementing a technology that provides link sharing for Classes of Service (CoS) and an
efficient method for queue-resource management. However, CBQ simply does not scale
to provide the desired performance in some circumstances, primarily because of the
computational overhead and forwarding impact packet reordering and intensive queue
management imposes in networks with very high-speed links. Therefore, although CBQ
does provide the basic mechanisms to provide differentiated CoS, it is appropriate only at
lower-speed links, which limits its usefulness.
Tip A great deal of informative research on CBQ has been done by Sally Floyd and
Van Jacobson at the Lawrence Berkeley National Laboratory for Network
Research. For more detailed information on CBQ, see
ftp.ee.lbl.gov/floyd/cbq.html.
The weighted aspect of WFQ is dependent on the way in which the servicing algorithm is
affected by other extraneous criteria. This aspect is usually vendor-specific, and at least
62
one implementation uses the IP precedence bits in the TOS (Type of Service) field in the
IP packet header to weight the method of handling individual traffic flows. The higher the
precedence value, the more access to queue resources a flow is given. You’ll look at IP
precedence in more depth in Chapter 4, “QoS and TCP/IP: Finding the Common
Denominator.”
Another drawback of WFQ is in the granularity—or lack of granularity—in the control of the
mechanisms that WFQ uses to favor some traffic flows over others. By default, WFQ
protects low-volume traffic flows from larger ones in an effort to provide equity for all data
flows. The weighting aspect is attractive from an unfairness perspective; however, no knobs
are available to tune these parameters to alter the behavior of injecting a higher degree of
unfairness into the queuing scheme—at least not from the router configuration perspective.
Of course, you could assume that if the IP precedence value in each packet were set by the
corresponding hosts, for example, they would be treated accordingly. You could assume
that higher-precedence packets could be treated with more priority, lesser precedence with
lesser priority, and no precedence treated fairly in the scope of the traditional WFQ queue
servicing. The method of preferring some flows over others is statically defined in the
vendor-specific implementation of the WFQ algorithm, and the degree of control over this
mechanism may leave something to be desired.
63
Traffic Shaping and Admission Control
As an alternative, or perhaps as a complement, to using non-FIFO queuing disciplines in
an attempt to control the priority in which certain types of traffic is transmitted on a router
interface, there are other methods, such as admission control and traffic shaping, which
are used to control what traffic actually is transmitted into the network or the rate at which
it is admitted. Traffic shaping and admission control each have very distinctive
differences, which you will examine in more detail. There are several schemes for both
admission control and traffic shaping, some of which are used as stand-alone
technologies and others that are used integrally with other technologies, such as IETF
Integrated Services architecture. How each of these schemes is used and the approach
each attempts to use in conjunction with other specific technologies defines the purpose
each scheme is attempting to serve, as well as the method and mechanics by which
each is being used.
It also is important to understand the basic concepts and differences between traffic
shaping, admission control, and policing. Traffic shaping is the practice of controlling the
volume of traffic entering the network, along with controlling the rate at which it is
transmitted. Admission control, in the most primitive sense, is the simple practice of
discriminating which traffic is admitted to the network in the first place. Policing is the
practice of determining on a hop-by-hop basis within the network beyond the ingress
point whether the traffic being presented is compliant with prenegotiated traffic-shaping
policies or other distinguishing mechanisms.
As mentioned previously, you should consider the resource constraints placed on each
device in the network when determining which of these mechanisms to implement, as
well as the performance impact on the overall network system. For this reason, traffic
shaping and admission-control schemes need to be implemented at the network edges
to control the traffic entering the network. Traffic policing obviously needs one of two
things in order to function properly: Each device in the end-to-end path must implement
an adaptive shaping mechanism similar to what is implemented at the network edges, or
a dynamic signaling protocol must exist that collects path and resource information,
maintains state within each transit device concerning the resource status of the network,
and dynamically adapts mechanisms within each device to police traffic to conform to
shaping parameters.
Traffic Shaping
Traffic shaping provides a mechanism to control the amount and volume of traffic being
sent into the network and the rate at which the traffic is being sent. It also may be
necessary to identify traffic flows at the ingress point (the point at which traffic enters the
network) with a granularity that allows the traffic-shaping control mechanism to separate
traffic into individual flows and shape them differently.
Two predominate methods for shaping traffic exist: a leaky-bucket implementation and a
token-bucket implementation. Both these schemes have distinctly different properties
and are used for distinctly different purposes. Discussions of combinations of each
scheme follow. These schemes expand the capabilities of the simple traffic-shaping
paradigm and, when used in tandem, provide a finer level of granularity than each
method alone provides.
Leaky-Bucket Implementation
You use a leaky bucket (Figure 3.6) to control the rate at which traffic is sent to the
network. A leaky bucket provides a mechanism by which bursty traffic can be shaped to
64
present a steady stream of traffic to the network, as opposed to traffic with erratic bursts
of low- and high-volume flows. An appropriate analogy for the leaky bucket is a scenario
in which four lanes of automobile traffic converge into a single lane. A regulated
admission interval into the single lane of traffic flow helps the traffic move. The benefit of
this approach is that traffic flow into the major traffic arteries (the network) is predictable
and controlled. The major liability is that when the volume of traffic is vastly greater than
the bucket size, in conjunction with the drainage-time interval, traffic backs up in the
bucket beyond bucket capacity and is discarded.
The leaky bucket was designed to control the rate at which ATM cell traffic is transmitted
within an ATM network, but it has also found uses in the Layer 3 (packet datagram)
world. The size (depth) of the bucket and the transmit rate generally are user-
configurable and measured in bytes. The leaky-bucket control mechanism uses a
measure of time to indicate when traffic in a FIFO queue can be transmitted to control the
rate at which traffic is leaked into the network. It is possible for the bucket to fill up and
subsequent flows to be discarded. This is a very simple method to control and shape the
rate at which traffic is transmitted to the network, and it is a fairly straightforward
implementation.
The important concept to bear in mind here is that this type of traffic shaping has an
important and subtle significance in controlling network resources in the core of the
network. Essentially, you could use traffic shaping as a mechanism to conform traffic
flows in the network to a use threshold that a network administrator has calculated
arbitrarily. This is especially useful if he has oversubscribed the network capacity.
However, although this is an effective method of shaping traffic into flows with a fixed
rate of admission into the network, it is ineffective in providing a mechanism that provides
traffic shaping for variable rates of admission.
It also can be argued that the leaky-bucket implementation does not efficiently use
available network resources. Because the leak rate is a fixed parameter, there will be
many instances when the traffic volume is very low and large portions of network
resources (bandwidth) are not being used. Therefore, no mechanism exists in the leaky-
bucket implementation to allow individual flows to burst up to port speed, effectively
consuming network resources at times when there would not be resource contention in
the network core.
Token-Bucket Implementation
65
Another method of providing traffic shaping and ingress rate control is the token bucket
(Figure 3.7). The token bucket differs from the leaky bucket substantially. Whereas the
leaky bucket fills with traffic and steadily transmits traffic at a continuous fixed rate when
traffic is present, traffic does not actually transit the token
bucket. The token bucket is a control mechanism that dictates when traffic can
transmitted based on the presence of tokens in the bucket. The token bucket also more
efficient uses available network resources by allowing flows to burst up to a configurable
burst threshold.
The token bucket contains tokens, each of which can represent a unit of bytes. The
administrator specifies how many tokens are needed to transmit however many number
of bytes; when tokens are present, a flow is allowed to transmit traffic. If there are no
tokens in the bucket, a flow cannot transmit its packets. Therefore, a flow can transmit
traffic up to its peak burst rate if there are adequate tokens in the bucket and if the burst
threshold is configured appropriately.
The token bucket is similar in some respects to the leaky bucket, but the primary
difference is that the token bucket allows bursty traffic to continue transmitting while there
are tokens in the bucket, up to a user-configurable threshold, thereby accommodating
traffic flows with bursty characteristics. An appropriate analogy for the way a token
bucket operates is similar to that of a toll plaza on the interstate highway system.
Vehicles (packets) are permitted to pass as long as they pay the toll. The packets must
be admitted by the toll plaza operator (the control and timing mechanisms), and the
money for the toll (tokens) is controlled by the toll plaza operator, not the occupants of
the vehicles (packets).
A variation of the simple token bucket implementation with a singular bucket and a
singular burst rate threshold is one that includes multiple buckets and multiple thresholds
(Figure 3.8). In this case, a traffic classifier could interact with separate token buckets,
each with a different peak-rate threshold, thus permitting different classes of traffic to be
shaped independently. You could use this approach with other mechanisms in the
network to provide differentiated CoS; if traffic could be discriminantly tagged after it
exceeds a certain threshold, it could be treated differently as it travels through the
network. In the next section, you’ll look at possible ways of accomplishing this by using
IP layer QoS, IP precedence bits, and congestion-control mechanisms.
66
Figure 3.8: Multiple token buckets.
Another variation on these two schemes is a combination of the leaky bucket and token
bucket. The reasoning behind this type of implementation is that, while a token bucket
allows individual flows to burst to their peak rate, it also allows individual flows to
dominate network resources for the time during which tokens are present in the token
bucket or up to the defined burst threshold. This situation, if not configured correctly,
allows some traffic to consume more bandwidth than other flows and may interfere with
the capability of other traffic flows to transmit adequately due to resource contention in
the network. Therefore, a leaky bucket could be used to smooth the traffic rates at a
user-configurable threshold after the traffic has been subject to the token bucket. In this
fashion, both tools can be used together to create an altogether different traffic shaping
mechanism. This is described in more detail in the following paragraph.
The bottom line is that traffic-shaping mechanisms are yet another tool in controlling how
traffic flows into (and possibly out of) the network. By themselves, these traffic-shaping tools
are inadequate to provide QoS, but coupled with other tools, they can provide useful
mechanisms for a network administrator to provide differentiated CoS.
67
Network-Admission Policy
Network-admission policy is an especially troublesome topic and one for which there is
no clear agreed-on approach throughout the networking industry. In fact, network-
admission policy has various meanings, depending on the implementation and context.
By and large, the most basic definition is one in which admission to the network, or basic
access to the network itself, is controlled by the imposition of a policy constraint. Some
traffic (traffic that conforms to the policy) may be admitted to the network, and the
remainder may not. Some traffic may be admitted under specific conditions, and when
the conditions change, the traffic may be disallowed. By contrast, admission may be
based on identity through the use of an authentication scheme.
Also, you should not confuse the difference between access control and admission
policy—subtle differences, as well as similarities, exist in the scope of each approach.
Access Control
Access control can be defined as allowing someone or some thing (generally a remote
daemon or service) to gain access to a particular machine, device, or virtual service. By
contrast, admission policy can be thought of as controlling what type of traffic is allowed
to enter or transit the network. One example of primitive access control dates back to the
earliest days of computer networking: the password. If someone cannot provide the
appropriate password to a computer or other networking device, he is denied access.
Network dial-up services have used the same mechanism; however, several years ago,
authentication control mechanisms began to appear that provided more granular and
manageable remote access control. With these types of mechanisms, access to network
devices is not reliant on the password stored on the network devices but instead on a
central authentication server operated under the supervision of the network or security
administrator.
Admission Policy
Admission policy can be a very important area of networking, especially in the realm of
providing QoS. Revisiting earlier comments on architectural principles and service
qualifications, if you cannot identify services (e.g., specific protocols, sources, and
destinations), the admission-control aspect is less important. Although the capability to
limit the sheer amount of traffic entering the network and the volume at which it enters is
indeed important (to some degree), it is much less important if the capability to
distinguish services is untenable, because no real capability exists to treat one type of
traffic differently than another.
Admission policy is a crucial part of any QoS implementation. If you cannot control the
traffic entering the network, you have no control over the introduction of congestion into
the network system and must rely on congestion-control and avoidance mechanisms to
maintain stability. This is an undesirable situation, because if the traffic originators have
the capability to induce severe congestion situations into the network, the network may
68
be ill-conceived and improperly designed, or admission-control mechanisms may be
inherently required to be implemented with prejudice. The prejudice factor can be
determined by economics. Those who pay for a higher level of service, for example, get
a shot at the available bandwidth first, and those who do not get throttled, or dropped,
when utilization rates reach a congestion state. In this fashion, admission control could
feasibly be coupled with traffic shaping.
Admission policy, within the context of the IETF Integrated Services architecture,
determines whether a network node has sufficiently available resources to supply the
requested QoS. If the originator of the traffic flow requests QoS levels from the network
using parameters within the RSVP specification, the admission-control module in the RSVP
process checks the requester’s TSpec and RSpec (described in more detail in Chapter 7) to
determine whether the resources are available along each node in the transit path to
provide the requested resources. In this vein, the QoS parameters are defined by the IETF
Integrated Services working group, which is discussed in Chapter 7, “The Integrated
Services Architecture.”
69
Chapter 4: QoS and TCP/IP: Finding the
Common Denominator
Overview
As mentioned several times in this book, wildly varying opinions exist on how to provide
differentiation of Classes of Service (CoS) and where it can be provided most effectively
in the network topology.
After many heated exchanges, it is safe to assume that there is sufficient agreement on
at least one premise: The most appropriate place to provide differentiation is within the
most common denominator, where common is defined in terms of the level of end-to-end
deployment in today’s networks. In this vein, it becomes an issue of which component
has the most prevalence in the end-to-end traffic path. In other words, what is the
common bearer service? Is it ATM? Is it IP? Is it the end-to-end application protocol?
It is also the case that this particular technology operates in an end-to-end fashion, using
a signaling mechanism that spans the entire traversal of the network in a consistent
fashion. IP is the end-to-end transportation service in most cases, so that although it is
possible to create QoS services in substrate layers of the protocol stack, such services
cover only part of the end-to-end data path. Such partial measures often have their
effects masked by the effects of the traffic distortion created from the remainder of the
end-to-end path in which they are not present, and hence the overall outcome of a partial
QoS structure often is ineffectual.
When the end-to-end path does not consist of a single pervasive data-link layer, any
70
effort to provide differentiation within a particular link-layer technology most likely will not
provide the desired result. This is the case for several reasons. In the Internet, for
example, an IP packet may traverse any number of heterogeneous link-layer paths, each
of which may or may not possess characteristics that inherently provide methods to
provide traffic differentiation. However, the packet also inevitably traverses links that
cannot provide any type of differentiated services at the data-link layer. With data-link
layer technologies such as Ethernet and token ring, for example, only a framing
encapsulation is provided, and in which a MAC (Media Access Control) address is
rewritten in the frame and then forwarded on its way toward its destination. Of course,
solutions have been proposed that may provide enhancements to specific link-layer
technologies (e.g., Ethernet) in an effort to manage bandwidth and provide flow-control.
(This is discussed in Chapter 9, “QoS and Future Possibilities.”) However, for the sake of
this discussion, you will examine how you can use IP for differentiated CoS because, in
the Internet, an IP packet is an IP packet is an IP packet. The only difference is that a
few bytes are added and removed by link-layer encapsulations (and subsequently are
unencapsulated) as the packet travels along its path to its destination.
Of course, the same can be said of other routable Layer 3 and Layer 4 protocols, such as
IPX/SPX, AppleTalk, DECnet, and so on. However, you should recognize that the
TCP/IP suite is the fundamental, de facto protocol base used in the global Internet.
Evidence also exists that private networks are moving toward TCP/IP as the ubiquitous
end-to-end protocol, but migrating legacy network protocols and host applications
sometimes is a slow and tedious process. Also, some older legacy network protocols
may not have inherent hooks in the protocol specification to provide a method to
differentiate packets as to how they are treated as they travel hop by hop to their
destination.
Tip A very good reference on the way in which packet data is encapsulated by the
various devices and technologies as it travels through heterogeneous networks
is provided in Interconnections: Bridges and Routers, by Radia Perlman,
published by Addison-Wesley Publishing.
Differentiation by Class
Several interesting proposals exist on providing differentiation of CoS within IP; two
particular approaches merit mention.
Per-Flow Differentiation
One approach proposes per-flow differentiation. You can think of a flow in this context as
a sequence of packets that share some unique information. This information consists of a
4-tuple (source IP address, source UDP or TCP port, destination IP address, destination
UDP or TCP port) that can uniquely identify a particular end-to-end application-defined
flow or conversation. The objective is to provide a method of extracting information from
the IP packet header and have some capability to associate it with previous packets. The
intended result is to identify the end-to-and application stream of which the packet is a
member. Once a packet can be assigned to a flow, the packet can be forwarded with an
associated class of service that may be defined on a per-flow basis.
This is very similar in concept to assigning CoS characteristics to Virtual Circuits (VCs)
within a frame relay or ATM network. The general purpose of per-flow differentiation is to
be able to define similar CoS characteristics to particular IP end-to-end sessions,
allowing real-time flows, for example, to be forwarded with CoS parameters different from
other non-real-time flows. However, given the number of flows that may be active in the
core of the Internet at any given time (hint: in many cases, there are in excess of 256,000
active flows), this approach is widely considered to be impractical as far as scaleability is
71
concerned. Maintaining state and manipulating flow information for this large a number of
flows would require more computational overhead than is practical or desired. This is
primarily the approach that RSVP takes, and it has a lot of people gravely concerned that
it will not be able to scale in a sufficiently large network, let alone the Internet. Thus, a
simpler and more scaleable approach may be necessary for larger networks.
The TOS field has been a part of the IP specification since the beginning and has been
little used in the past. The semantics of this field are documented in RFC1349, “Type of
Service in the Internet Protocol Suite” [IETF1992b], which suggests that the values
specified in the TOS field could be used to determine how packets are treated with
monetary considerations. As an aside, at least two routing protocols, including OSPF
(Open Shortest Path First) [IETF1994a] and Integrated IS-IS [IETF1990b], can be
configured to compute paths separately for each TOS value specified.
In contrast to the use of this 4-bit TOS field as described in the original 1981 DARPA
Internet Protocol Specification RFC791 [DARPA19982b], where each bit has its own
meaning, RFC1349 attempts to redefine this field as a set of bits that should be
considered collectively rather than individually. RFC1349 redefined the semantics of
these field values as depicted here:
In any event, the concept of using the 4-bit TOS values in this fashion did not gain in
72
popularity, and no substantial protocol implementation (aside from the two routing
protocols mentioned earlier) has made any tangible use of this field. It is important to
remember this, because you will revisit the topic of using the IP TOS field during the
discussion of QoS-based routing schemes in Chapter 9, “QoS and Future Possibilities.”
On the other hand, at least one I-D (Internet Draft) in the IETF has proposed using the IP
precedence in a protocol negotiation, of sorts, for transmitting traffic across
administrative boundaries [ID1996e]. Also, you can be assured that components in the IP
TOS field are supported by all IP-speaking devices, because use of the precedence bits
in the TOS field is required to be supported in the IETF “Requirements for IP Version 4
Routers” document, RFC1812 [IETF1995b]. Also, RFC1812 discusses
Although RFC1812 does not explicitly describe how to perform these IP precedence-
related functions in detail, it does furnish references on previous related works and an
overview of some possible implementations where administration, management, and
inspection of the IP precedence field may be quite useful.
The issue here with the proposed semantics of the IP precedence fields is that, instead
of the router attempting to perform an initial classification of a packet into one of
potentially many thousands of active flows and then apply a CoS rule that applies to that
form of flow, you can use the IP precedence bits to reduce the scope of the task
considerably. There is not a wide spectrum of CoS actions the interior switch may take,
and the alternative IP-precedence approach is to mark the IP packet with the desired
CoS when the packet enters the network. On all subsequent interior routers, the required
action is to look up the IP precedence bits and apply the associated CoS action to the
packet. This approach can scale quickly and easily, given that the range of CoS actions
is a finite number and does not grow with traffic volume, whereas flow identification is a
computational task related to traffic volume.
The format and positioning of the TOS and IP precedence fields in the IP packet header
are shown in Figure 4.2.
73
Reviewing Topological Significance
When examining real-world implementations of differentiated CoS using the IP
precedence field, it is appropriate to review the topological significance of where specific
functions take place—where IP precedence is set, where it is policed, and where it is
used to administer traffic transiting the network core. As mentioned previously,
architectural principles exist regarding where certain technologies should be deployed in
a network. Some technologies, or implementations of a particular technology, have more
of an adverse impact on network performance than other technologies.
As discussed in the section on implementing policy in Chapter 3, and again within the
section on admission control and traffic shaping, this is another example of a technology
implementation that needs to be performed at the edge of the network. In the network
core, you can simply examine the precedence field and make some form of forwarding,
queuing, and scheduling decision. Once you understand why the architectural principles
are important, it becomes an issue of understanding how using IP precedence can
actually deliver a way of differentiating traffic in a network.
TCP is considered to be self-clocking; when a TCP sender determines that a packet has
been lost (because of a loss or time-out of a packet-receipt acknowledgment), it backs off,
ceases transmitting, shrinks its transmission window size, and retransmits the lost packet at
what effectively can be considered a slower rate in an attempt to complete the original data
transfer. After a reliable transmission rate is established, TCP commences to probe whether
more resources have become available and, to the extent permitted by the sender and
receiver, attempts to increase the transmission rate until the limit of available network
resources is reached. This process commonly is referred to as a TCP ramp up and comes
in two flavors: the initial aggressive compound doubling of the rate in slow-start mode and
74
the more conservative linear increase in the rate in congestion-avoidance mode. The critical
signal that this maximum level has been exceeded is the occurrence of packet loss, and
after this occurs, the transmitter again backs off the data rate and reprobes. This is the
nature of TCP self-clocking.
75
TCP Congestion Avoidance
One major drawback in any high-volume IP network is that when there are congestion
hot spots, uncontrolled congestion can wreak havoc on the overall performance of the
network to the point of congestion collapse. When thousands of flows are active at the
same time, and a congestion situation occurs within the network at a particular
bottleneck, each flow conceivably could experience loss at approximately the same time,
creating what is known as global synchronization. Global, in this case, has nothing to do
with an all-encompassing planetary phenomenon; instead, it refers to all TCP flows in a
given network that traverse a common path. Global synchronization occurs when
hundreds or thousands of flows back off and go into TCP slow start at roughly the same
time. Each TCP sender detects loss and reacts accordingly, going into slow-start,
shrinking its window size, pausing for a moment, and then attempting to retransmit the
data once again. If the congestion situation still exists, each TCP sender detects loss
once again, and the process repeats itself over and over again, resulting in network
gridlock [Zhang1990].
Figure 4.3: RED selects traffic from random flows to discard in an effort to avoid
buffer overflow.
76
Figure 4.4: RED: The more the queue fills, the more traffic is discarded.
The RED approach does not possess the same undesirable overhead characteristics as
some of the non-FIFO queuing techniques discussed earlier. With RED, it is simply a
matter of who gets into the queue in the first place—no packet reordering or queue
management takes place. When packets are placed into the outbound queue, they are
transmitted in the order in which they are queued. Priority, class-based, and weighted-fair
queuing, however, require a significant amount of computational overhead because of
packet reordering and queue management. RED requires much less overhead than
fancy queuing mechanisms, but then again, RED performs a completely different
function.
IP rate-adaptive signaling happens in units of end-to-end Round Trip Time (RTT). For this
reason, when network congestion occurs, it can take some time to clear, because
transmission rates will not immediately back off in response to packet loss. The signaling
involved is that packet loss will not be detected until the receiver’s timer expires, and the
transmitter will not see the signal until the receiver’s NAK (negative acknowledgment)
arrives back at the sender. Hence, when congestion occurs, it is not cleared quickly. The
objective of RED is to start the congestion-signaling process at a slightly earlier time than
queue saturation. The use of random selection of flows to drop packets could be argued to
favor dropping packets from flows in which the rate has opened up and flows are at their
longest duration; these flows generally are considered to be the greatest contributor to the
longevity of normal congestion situations.
77
Introducing Unfairness
RED can be said to be fair: It chooses random flows from which to discard traffic in an
effort to avoid global synchronization and congestion collapse, as well as to maintain
equity in which traffic actually is discarded. Fairness is all well and good, but what is
really needed here is a tool that can induce unfairness—a tool that can allow the network
administrator to predetermine what traffic is dropped first or last when RED starts to
select flows from which to discard packets. You can’t differentiate services with fairness.
78
Figure 4.5: Using IP precedence to indicate drop preference with congestion
avoidance.
This is a political issue, not a technical one, and network administrators need to
determine whether they will passively admit traffic with precedence already set and
simply bill accordingly for the service or actively police traffic as it enters their network.
The mention of billing for any service assumes that the necessary tools are in place to
adequately measure traffic volumes generated by each downstream subscriber. Policing
assumes that the network operator determines the mechanisms for allocation of network
resources to competing clients, and presumably the policy enforced by the policing
function yields the most beneficial economic outcome to the operator.
The case can be made, however, that the decision as to which flow is of greatest
economic value to the customer is a decision that can be made only by the customer and
that IP precedence should be a setting that can be passed only into the network by the
customer and not set (or reset) by the network. Obviously, within this scenario, the
transportation of a packet in which precedence is set attracts a higher transport fee than
if the precedence is not set, regardless of whether the network is in a state of congestion.
It is not immediately obvious which approach yields the greatest financial return for the
network operator. On the one hand, the policing function creates a network that exhibits
congestion loading in a relatively predictable manner, as determined by the policies of
the network administrator. The other approach allows the customer to identify the traffic
flows of greatest need for priority handling, and the network manages congestion in a
way that attempts to impose degradation on precedence traffic to the smallest extent
possible.
Threshold Triggering
The interesting part of the threshold-triggering approach is in the traffic-shaping
mechanisms and the associated thresholds, which can be used to provide a method to
mark a packet’s IP precedence. As mentioned previously, precedence can be set in two
places: by the originating host or by the ingress router that polices incoming traffic. In the
latter case, you can use token-bucket thresholds to set precedence. You can implement
a token bucket to define a particular bit-rate threshold, for example, and when this
threshold is exceeded, mark packets with a lower IP precedence. Traffic transmitted
within the threshold can be marked with a higher precedence. This allows traffic that
conforms to the specified bit rate to be marked as a lower probability of discard in times
of congestion. This also allows traffic in excess of the configured bit-rate threshold to
burst up to the port speed, yet with a higher probability of discard than traffic that
conforms to the threshold.
You can realize additional flexibility by adding multiple token buckets, each with similar or
dissimilar thresholds, for various types of traffic flows. Suppose that you have an
interface connected to a 45-Mbps circuit. You could configure three token buckets: one
for FTP, one for HTTP, and one for all other types of traffic. If, as a network
administrator, you want to provide a better level of service for HTTP, a lesser quality of
service for FTP, and a yet lesser quality for all other types of traffic, you could configure
each token bucket independently. You also could select precedence values based on
what you believe to be reasonable service levels or based on agreements between
yourself and your customers.
For example, you could strike a service-level agreement that states that all FTP traffic up
to 10 Mbps is reasonably important, but that all traffic (with the exception of HTTP) in
excess of 10 Mbps simply should be marked as best effort (Figure 4.6). In times of
congestion, this traffic is discarded first.
79
Figure 4.6: Multiple token buckets—thresholds for marking precedence.
This gives you, as the network administrator, a great deal of flexibility in determining the
value of the traffic as well as deterministic properties in times of congestion. Again, this
approach does not require a great deal of computational overhead as do the fancy queuing
mechanisms discussed earlier, because RED is still the underlying congestion-avoidance
mechanism, and it does not have to perform packet reordering or complicated queue-
management functions.
Of course, a large contingent of people will want guaranteed delivery, and this is what
RSVP attempts to provide. You’ll look at this in more detail in Chapter 7, “The Integrated
Services Architecture,” as well as some opinions on how these two approaches stack up.
80
Chapter 5: QoS and Frame Relay
Overview
What is Frame Relay? It has been described as an “industry-standard, switched data
link-layer protocol that handles multiple virtual circuits using HDLC encapsulation
between connected devices. Frame Relay is more efficient than X.25, the protocol for
which it is generally considered a replacement. See also X.25” [Cisco1995].
Perhaps you should examine “see also X.25” and then look at Frame Relay as a transport
technology. This will provide the groundwork for looking at QoS mechanisms that are
supportable across a Frame Relay network. So to start, here’s a thumbnail sketch of the
X.25 protocol.
The X.25 protocol specifications do not specify the internals of the packet-switched
network but do specify the interface between the network and the client. The network
boundary point is the Data Communications Equipment (DCE), or the network switch,
and the Customer Premise Equipment (CPE) is the Data Termination Equipment (DTE),
where the appropriate equipment is located at the customer premise.
About Precision
Many computer and network protocol descriptions suffer from “acronym density.”
The excuse given is that to use more generic words would be too imprecise in the
context of the protocol definition, and therefore each protocol proudly develops its
own terminology and related acronym set. X.25 and Frame Relay are no exception
to this general rule. This book stays with the protocol-defined terminology and
acronyms for the same reason of precision and brevity of description. We ask for
your patience as we work through the various protocol descriptions.
The major control operations in X.25 are call setup, data transfer, and call clear. Call
setup establishes a virtual circuit between two computers. The call setup operation
consists of a handshake: One computer initiates the call, and the call is answered from
the remote end as it returns a signal confirming receipt of the call. Each DCE /DTE
interface then has a locally defined Local Channel Identifier (LCI) associated with the
81
call. All further references to the call use the LCI as the identification of the call instead of
referring to the called remote DCE.
The LCI is not a constant value within the network—each data-link circuit within the end-
to-end path uses a unique LCI so that the initial DTE/DCE LCI is mapped into successive
channel identifiers in each X.25 switch along the end-to-end call path. Thus, when the
client computer refers to an LCI as the prefix for a data transfer, it is only a locally
significant reference. Then, when a frame is transmitted out on a local LCI, the DTE
simply assumes that it is correctly being transmitted to the appropriate switch because of
the configured association between the LCI and the virtual circuit. The X.25 switch
receives the HDLC (High-Level Data Link Control) frame and then passes it on to the
next hop in the path, using local lookup tables to determine how to similarly switch the
frame out on ne of various locally defined channels. Because each DCE uses only the
LCI as the identification for this particular virtual circuit, there is no need for the DCE to
be aware of the remote LCI.
This level of network functionality allows relatively simple end-systems to make a number
of assumptions about the transference of data. In the normal course of operation, all data
passed to the network will be delivered to the call destination in order and without error,
with the original framing preserved.
“Smartness”
In the same way that telephony uses simple peripherial devices (dumb handsets)
and a complex interior switching system (smart network), X.25 attempts to place
the call- and flow-management complexity into the interior of the network and
create a simple interface with minimal functional demands on the connected
peripheral devices. TCP (Transmission Control Protocol) implements the opposite
technology model, with a simple best-effort datagram delivery network (dumb
network) and flow control and error detection and recovery in the end-system
(smart perhiperals).
No special mechanisms really exist that provide Quality of Service (QoS) within the X.25
protocol. Therefore, any differentiation of services needs to be at the network layer (e.g., IP)
or by preferential queuing (e.g., priority, CBQ, or WFQ). This is a compelling reason to
consider Frame Relay instead of X.25 as a wide-area technology for implementing QoS-
based services.
82
Frame Relay
So how does Frame Relay differ from X.25? Frame Relay has been described as being
faster, more streamlined, and more efficient as a transport protocol than X.25. The reality
is that Frame Relay removes the switch-to-switch flow control, sequence checking, and
error detection and correction from X.25, while preserving the connection orientation of
data calls as defined in X.25. This allows for higher-speed data transfers with a lighter-
weight transport protocol.
Tip You can find approved and pending Frame Relay technical specifications at the
Frame Relay Forum Web site, located at www.frforum.com.
Frame Relay’s origins lie in the development of ISDN (Integrated Services Digital
Network) technology, where Frame Relay originally was seen as a packet-service
technology for ISDN networks. The Frame Relay rationale proposed was the perceived
need for the efficient relaying of HDLC framed data across ISDN networks. With the
removal of data link-layer error detection, retransmission, and flow control, Frame Relay
opted for end-to-end signaling at the transport layer of the protocol stack model to
undertake these functions. This allows the network switches to consider data-link frames
as being forwarded without waiting for positive acknowledgment from the next switch.
This in turn allows the switches to operate with less memory and to drive faster circuits
with the reduced switch functionality required by Frame Relay.
However, like X.25, Frame Relay has a definition of the interface between the client and
the Frame Relay network called the UNI (User-to-Network Interface). Switches within the
confines of the Frame Relay network may use varying technologies, such as cell relay or
HDLC frame passing. However, whereas interior Frame Relay switches have no
requirement to undertake error detection and frame retransmission, the Frame Relay
specification does specify that frames must be delivered in their original order, which is
most commonly implemented using a connection-oriented interior switching structure.
Current Frame Relay standards address only permanent virtual circuits that are
administratively configured and managed in the Frame Relay network; however, Frame
Relay Form standards-based work currently is underway to support Switched Virtual
Circuits (SVCs). Additionally, work recently was completed within the Frame Relay
Forum to define Frame Relay high-speed interfaces at HSSI (52 Mbps), T3 (45 Mbps)
and E3 (34 Mbps) speeds, augmenting the original T1/E1 specifications.
83
Recommendation Q.921/I.441 [ANSI T1S1 “DSSI Core Aspects of Frame Relay,” March
1990]. Figure 5.2 shows this format. The minimum, and the default Frame Address field,
is 16 bits. In the Frame Address field, the Data Link Connection Identifier (DLCI) is
addressed using 10 bits, the extended address field is 2 bits, the Forward Explicit
Congestion Notification (FECN) is 1 bit, the Backward Explicit Congestion Notification
(BECN) is 1 bit, and the Discard Eligible (DE) field is 1 bit.
These final 3 bits within the Frame Relay header are perhaps the most significant
components of Frame Relay when examining QoS possibilities.
The first-hop Frame Relay switch (DCE) has the responsibility of enforcing the CIR zt the
ingress point of the Frame Relay network. When the information rate is exceeded,
frames are marked as exceeding the CIR. This allows the network to subsequently
enforce the committed rate at some point internal to the network. This is implemented
using a rate filter on incoming frames. When the frame arrival rate at the DCE exceeds
the CIR, the DCE marks the excess frames with the Discard Eligible bit set to 1 (DE = 1).
The DE bit instructs the interior switches of the Frame Relay network to select those
frames with the DE bit set as discard eligible in the event of switch congestion and
discard these frames in preference of frames with their DE field set to 0 (DE = 0).
As long as the overall capacity design of the Frame Relay network is sufficiently robust to
allow the network to meet the sustained requirements for all PVCs operating at their
respective CIRs, bandwidth in the network may be consumed above the CIR rate up to
the port speed of each network-attached DTE device. This mechanism of using the DE
bit to discard frames as congestion is introduced into the Frame Felay network provides
a method to accommodate traffic bursts while providing capacity protection for the Frame
Relay network.
No signaling mechanism (to speak of) is available between the network DCE and the
DTE to indicate that a DE marked frame has been discarded (Figure 5.3). This is an
extremely important aspect of Frame Relay to understand. The job of recognizing that
frames somehow have been discarded in the Frame Relay network is left to higher-layer
protocols, such as TCP.
84
Figure 5.3: Frames with the DE bit set.
The architecture of this ingress rate tagging is a useful mechanism. The problem with
Frame Relay, however, is that the marking of the DE bit is not well integrated with the
igher-level protocols. Frames normally are selected for DE tagging by the DCE switch
without any signaling from the higher-level application or protocol engine that resides in
the DTE device.
These mechanisms traditionally are implemented within a Frame Relay switch so that it
typically uses three queue thresholds for frames held in the switch queues, awaiting
access to the transmission-scheduling resources.
When the frame queue exceeds the first threshold, the switch sets the FECN and BECN
bits of all frames. Both bits are not set simultaneously—the precise action of whether the
notification is forward or backward is admittedly somewhat arbitrary and appears to
depend on whether the notification is generated at the egress from the network (FECN)
or at the ingress (BECN). The intended result is to signal the sender or receiver on the
UNI interface that there is congestion in the interior of the network. No specific action is
defined for the sending or receiving node on receipt of this signal, although the objective
is that the node recognizes that congestion may be introduced if the present traffic level
is sustained and that some avoidance action may be necessary to reduce the level of
transmitted traffic.
If the queue length continues to grow past the second threshold, the switch then discards
all frames that have the Discard Eligible (DE) bit set. At this point, the switch is
functionally enforcing the CIR levels on all VCs that pass through the switch in an effort
to reduce queue depth. The intended effect is that the sending or receiving nodes
recognize that traffic has been discarded and subsequently throttle traffic rates to operate
within the specified CIR level, at least for some period before probing for the availability
of burst capacity. The higher-level protocol is responsible for detecting lost frames and
retransmitting them and also is responsible for using this discard information as a signal
to reduce transmission rates to help the network back off from the congestion point.
The third threshold is the queue size itself, and when the frame queue reaches this
threshold, all further frames are discarded (Figure 5.4).
85
Figure 5.4: Frame Relay switch queue thresholds.
However, can Frame Relay congestion management mechanisms be used so that the
end user can set IP QoS policies, which can in turn provide some direction to the
congestion management behavior of the underlying Frame Relay network? Also, can the
Frame Relay congestion signals (FECN and BECN) be used to trigger IP layer
congestion management behavior?
VC Congestion Signals
When considering such questions, the first observation to be made is that Frame Relay
uses connection-based Victual Circuits (VCs). Frame Relay per VC congestion signaling
flows along these fixed paths, whereas IP flows do not use fixed paths through the
network. Given that Frame Relay signals take some time to propagate back through the
network, there is always the risk that the end-to-end IP path may be dynamically altered
before the signal reaches its destination, and that the consequent action may be
completely inappropriate for handling the current situation. Countering this is the more
pragmatic observation that the larger the network, the greater the pressure to dampen
the dynamic nature of outing-induced, logical topology changes. The resulting probability
of a topology change occurring within any single TCP end-to-end session then becomes
very low indeed. Accordingly, it is relevant to consider whether any translation between
IP QoS signaling and Frame Relay QoS signaling is feasible.
The BECN and FECN signals are analogous to the ICMP (Internet Control Message
86
Protocol) source quench signal. They are intended to inform the transmitter’s protocol
stack that congestion is being experienced in the network and that some reduction of the
transmission rate is advisable. However, this signal is not used in the implementation of
IP over Frame Relay for good reason. As indicated in the IETF (Internet Engineering
Task Force) document, “Requirements for IP Version 4 Routers,” RFC1812 [IETF1995b],
“Research seems to suggest that Source Quench consumes network bandwidth but is an
ineffective (and unfair) antidote to congestion.” In the case of BECN and FECN, no
additional bandwidth is being consumed (the signal is a bit set in the Frame Relay
header so that there is no additional traffic overhead), but the issue of effectiveness and
fairness is relevant. Although these notifications can indeed be signaled back (or forward,
as the case may be) to the CPE, where the transmission rate corresponding to the DLCI
can be reduced, such an action must be done at the Frame Relay layer. To translate this
back up the protocol stack to the IP layer, a subsequent reaction would be necessary for
the Frame Relay interface equipment, upon receipt of a BECN or FECN signal, to set a
condition that generates an ICMP source quench for all IP packets that correspond to
such signaled frames. In this situation, the cautionary advice of RFC1812 is particularly
relevant.
However, the most interesting observation about the interaction of Frame Relay and IP is
one that indicates what is missing rather than what is provided. The DE bit is a powerful
mechanism that allows the interior of the Frame Relay network to take rapid and
predictable actions to reduce traffic load when under duress. The challenge is to relate
this action to the environment of the end-user of the IP network.
Within the UNI specification, the Frame Relay specification allows the DTE (router) to set
the DE bit in the frame header before passing it to the DCE (switch), and in fact, this is
possible in a number of router vender implementations. This allows a network
administrator to specify a simple binary priority. However, this rarely is done in a
heterogeneous network, simply because it is somewhat self-defeating if no other
subscriber undertakes the same action.
87
the DCE, which then can confirm or clear the DE bit in accordance with the procedures
outlined earlier.
Without coherence between the data-link transport-signaling structures and the higher-
level protocol stack, the result is somewhat more complex. Currently, the Frame Relay
network works within a locally defined context of using selective frame discard as a
means of enforcing rate limits on traffic as it enters the network. This is done as the
primary response to congestion. The basis of this selection is undertaken without respect
to any hints provided by the higher-layer protocols. The end-to-end TCP protocol uses
packet loss as the primary signaling mechanism to indicate network congestion, but it is
recognized only by the TCP session originator. The result is that when the network starts
to reach a congestion state, the method in which end-system applications are degraded
matches no particular imposed policy, and in this current environment, Frame Relay
offers no great advantage over any other data-link transport technology in addressing
this.
However, if the TOS field in the IP header were used to allow a change to the DE bit
semantics in the UNI interface, it is apparent that Frame Relay would allow a more
graceful response to network congestion by attempting to reduce load in accordance
with upper-layer protocol policy directives. In other words, you can construct an IP over
Frame Relay network that adheres to QoS policies if you can modify the standard
frame relay mode of operation.
88
Part II
Chapter List
Chapter 6: QoS and ATM
89
Chapter 6: QoS and ATM
Overview
Asynchronous Transfer Mode (ATM) is undeniably the only technology that provides
data-transport speeds in excess of OC-3 (155 Mbps) today. This is perhaps the
predominant reason why ATM has enjoyed success in the Internet backbone
environment. In addition to providing a high-speed bit-rate clock, ATM also provides a
complex subset of traffic-management mechanisms, Virtual Circuit (VC) establishment
controls, and Quality of Service (QoS) parameters. It is important to understand why
these underlying mechanisms are not being exploited by a vast number of organizations
that are using ATM as a data-transport mechanism for Internet networks in the wide
area. The predominate use of ATM in today’s Internet networks is simply because of the
high data-clocking rate and multiplexing flexibility available with ATM implementations.
This chapter is not intended to be a detailed description of the inner workings of ATM
networking, but a glimpse at the underlying mechanics is necessary to understand why
ATM and certain types of QoS are inextricably related. The majority of information in this
chapter is condensed from the ATM Forum Traffic Management Specification Version 4.0
[AF1996a] and the ATM Forum Private Network-Network Interface Specification Version
1.0 [AF1996c].
ATM Background
Historically, organizations have used TDM (Time Division Multiplexing) equipment to
combine, or mux, different data and voice streams into a single physical circuit (Figure
6.1), and subsequently de-mux the streams on the receiving end, effectively breaking
them out into their respective connections on the remote customer premise. Placing a
mux on both ends of the physical circuit in this manner provided a means to an end; it
was considered economically more attractive to mux multiple data streams together into
a single physical circuit than it was to purchase different individual circuits for each
application. This economic principle still holds true today.
There are, of course, some drawbacks to using the TDM approach, but the predominant
liability is that once multiplexed, it is impossible to manage each individual data stream.
This is sometimes an unacceptable paradigm in a service-provider environment,
especially when management of data services is paramount to the service provider’s
economic livelihood. By the same token, it may not be possible to place a mux on both
ends of a circuit because of the path a circuit may traverse. In the United States, for
example, it is common that one end of a circuit may terminate in one RBOC’s (Regional
90
Bell Operating Company) network, whereas the other end may terminate in another
RBOC’s network on the other side of the country. Since the breakup of AT&T in the mid-
1980s, the resulting RBOC’s (Pacific Bell, Bell Atlantic, U.S. West, et al.) commonly have
completely different policies, espouse different network philosophies and architectures,
provide various services, and often deploy noninteroperable and diverse hardware
platforms. Other countries have differing regulatory restrictions on different classes of
traffic, particularly where voice is concerned, and the multiplexing of certain classes of
traffic may conflict with such regulations. Another drawback to this approach is that the
multiplexors represent a single point of failure.
ATM Switching
The introduction of ATM in the early 1990s provided an alternative method to traditional
multiplexing, in which the basic concept (similar to its predecessor, frame relay) is that
multiple Virtual Channels (VCs) or Virtual Paths (VPs) now could be used for multiple
data streams. Many VCs can be delivered on a single VP, and many VPs can be
delivered on a single physical circuit (Figure 6.2). This is very attractive from an
economic, as well as a management, perspective. The economic allure is obvious.
Multiple discrete traffic paths can be configured and directed through a wide-area ATM
switched network, while avoiding the monthly costs of several individual physical circuits.
Traffic is switched end-to-end across the network—a network consisting of several ATM
switches. Each VC or VP can be mapped to a specific path through the network, either
statically by a network administrator or dynamically via a switch-to-switch routing protocol
used to determine the best path from one end of the ATM network to the other, as
simplified in Figure 6.3.
91
and switching environment that can support virtually any type of traffic, such as voice,
data, or video applications. ATM segments and multiplexes user data into 53-byte cells.
Each cell is identified with a VC and a VP Identifier (VCI and VPI, respectively) which
indicate how the cell is to be switched from its origin to its destination in the ATM
switched network.
The ATM switching function is fairly straightforward. Each device in the ATM end-to-end
path rewrites the VPI/VCI value, because it is only locally significant. That is, the VPI/VCI
value is used only on a switch to indicate which local interface and/or VPI/VCI a cell is to
be forwarded on. An ATM switch, or router, receives a cell on an incoming interface with
a known VPI/VCI value, looks up the value in a local translation table to determine the
outbound interface and the corresponding VPI/VCI value, rewrites the VPI/VCI value, and
then switches the cell onto the outbound interface for retransmission with the appropriate
connection identifiers.
Foiled by Committee
Why 53 bytes per ATM cell? This is a good example of why technology choices
should not be made by committee vote. The 53 bytes are composed of a 5-byte cell
header and a 48-byte payload. The original committee work refining the ATM
model resulted in two outcomes, with one group proposing 128 bytes of payload
per cell and the other 16 bytes of payload per cell. Further negotiation within the
committee brought these two camps closer, until the two proposals were for 64 and
32 bytes of payload per cell. The proponents of the smaller cell size argued that the
smaller size reduced the level of nework-induced jitter and the level of signal loss
associated with the drop of a single cell. The proponents of the large cell size
argued that the large cell size permitted a higher data payload in relation to the cell
header. Interestingly enough, both sides were proposing data payload sizes that
were powers of 2, allowing for relatively straightforward memory mapping of data
structures into cell payloads with associated efficiency of payload descriptor fields.
The committee resolved this apparent impasse simply by taking the median value
of 48 for the determined payload per cell. The committee compromise of 48 really
suited neither side. It is considered too large for voice use and too small for data
use; current measurements indicate that there is roughly a 20 percent efficiency
overhead in using ATM as the transport substrate for an IP network. Sometimes,
technology committees can concentrate too heavily on reaching a consensus and
lose sight of their major responsibility to define rational technology.
ATM Connections
ATM networks essentially are connection oriented; a virtual circuit must be set up and
established across the ATM network before any data can be transferred across it. There
are two types of ATM connections: Permanent Virtual Connections (PVCs) and Switched
Virtual Connections (SVCs). PVCs generally are configured statically by some external
mechanism—usually, a network-management platform of some sort. PVCs are
92
configured by a network administrator. Each incoming and outgoing VPI/VCI, on a
switch-by-switch basis, must be configured for each end-to-end connection.
Obviously, when a large number of VCs must be configured, PVCs require quite a bit of
administrative overhead. SVCs are set up automatically by a signaling protocol, or rather,
the interaction of different signaling protocols. There are also soft PVCs; the end-points of
the soft PVC—that is, the segment of the VC between the ingress or egress switch to the
end-system or router—remain static. However, if a VC segment in the ATM network
(between switches) becomes unavailable, experiences abnormal levels of cell loss, or
becomes overly congested, an interswitch-routing protocol reroutes the VC within the
confines of the ATM network. Thus, to the end-user, there is no noticeable change or
availability on the status of the local PVC.
93
ATM Traffic-Management Functions
As mentioned several times earlier, certain architectural choices in any network design
may impact the success of a network. The same principles ring just as true with regard to
ATM as they would with other networking technologies. Being able to control traffic in the
ATM network is crucial to ensuring the success of delivering differentiated QoS to the
various applications that request and rely on the controls themselves. The primary
responsibility of traffic-management mechanisms in the ATM network is to promote
network efficiency and avoid congestion situations so that the overall performance of the
network does not degenerate. It is also a critical design objective of ATM that the network
utilization imposed by transporting one form of application data does not adversely
impact the capability to efficiently transport other traffic in the network. It may be critically
important, for example, that the transport of bursty traffic does not introduce an excessive
amount of jitter into the transportation of constant bit rate, real-time traffic for video, or
audio applications.
To deliver this stability, the ATM Forum has defined the following set of functions to be
used independently or in conjunction with one another to provide for traffic management
and control of network resources:
Connection Admission Control (CAC). Actions taken by the network during call
setup to determine whether a connection request can be accepted or rejected.
Usage Parameter Control (UPC). Actions taken by the network to monitor and
control traffic and to determine the validity of ATM connections and the associated
traffic transmitted into the network. The primary purpose of UPC is to protect the
network from traffic misbehavior that can adversely impact the QoS of already
established connections. UPC detects violations of negotiated traffic parameters and
takes appropriate actions—either tagging cells as CLP = 1 or discarding cells
altogether.
Cell Loss Priority (CLP) control. If the network is configured to distinguish the
indication of the CLP bit, the network may selectively discard cells with their CLP bit
set to 1 in an effort to protect traffic with cells marked as a higher priority (CLP = 0).
Different strategies for network resource allocation may be applied, depending on
whether CLP = 0 or CLP = 1 for each traffic flow.
Frame discard. A congested network may discard traffic at the AAL (ATM
Adaptation Layer) frame level, rather than at the cell level, in an effort to maximize
discard efficiency.
ABR flow control. You can use the Available Bit Rate (ABR) flow-control protocol to
adapt subscriber traffic rates in an effort to maximize the efficiency of available
network resource utilization. You can find ABR flow-control details in the ATM Forum
Traffic Management Specification 4.0 [AF1996a]. ABR flow control also provides a
crankback mechanism to reroute traffic around a particular node when loss or
94
congestion is introduced, or when the traffic contract is in danger of being violated as
a result of a local CAC (connection admission control) determination. With the
crankback mechanism, an intervening node signals back to the originating node that
it no longer is viable for a particular connection and no longer can deliver the
committed QoS.
The major strengths of any networking technology are simplicity and consistency.
Simplicity yeilds scaleable implementations that can readily interoperate.
Consistency results in a set of capabilites that are complementary. The preceding
list of ATM functions may look like a grab bag of fashionable tools for traffic
management without much regard for simplicity or consistency across the set of
functions. This is no accident. Again, as an outcome of the committee process, the
ATM technology model is inclusive, without the evidence of operation of a filter of
consistency. It is left as an exercise for the network operator to take a subset of
these capabilities and create a stable set of network services.
The ATM policing function is called Usage Parameter Control (UPC) and also is
performed at the ingress ATM switch. Although connection monitoring at the public or
private UNI is referred to as UPC and connection monitoring at a NNI (Network-to-
Network Interface) can be called NPC (Network Parameter Control), UPC is the generic
reference commonly used to describe either one. UPC is the activity of monitoring and
controlling traffic in the network at the point of entry.
The primary objective of UPC is to protect the network from malicious, as well as
unintentional, misbehavior that can adversely affect the QoS of other, already
established connections in the network. The UPC function checks the validity of the VPI
and/or VCI values and monitors the traffic entering the network to ensure that it conforms
to its negotiated traffic contract. The UPC actions consist of allowing the cells to pass
unmolested, tagging the cell with CLP = 1 (marking the cell as discard eligible), or
discarding the cells altogether. No priority scheme to speak of is associated with ATM
connection services. However, an explicit bit in the cell header indicates when a cell may
be dropped—usually, in the face of switch congestion. This bit is called the CLP (Cell
Loss Priority) bit. Setting the CLP bit to 1 indicates that the cell may be dropped in
preference to cells with the CLP bit set to 0. Although this bit may be set by end-systems,
it is set predominantly by the network in specific circumstances. This bit is advisory and
not mandatory. Cells with the CLP set to 1 are not dropped when switch congestion is
not present. Cells with CLP set to 0 may be dropped if there is switch congestion. The
function of the CLP bit is a two-level prioritization of cells used to determine which cells to
discard first in the event of switch congestion.
95
ATM Signaling and Routing
There are two basic types of ATM signaling: the User-to-Network Interface (UNI) and the
Network-to-Network Interface (NNI), which sometimes is referred to as the Network-to-
Node Interface. UNI signaling is used between ATM-connected end-systems, such as
routers and ATM-attached workstations, as well as between separate, interconnected
private ATM networks. A public UNI signaling is used between an end-system and a
public ATM network or between different private ATM networks; a private UNI signaling is
used between an end-system and a private ATM network. NNI signaling is used between
ATM switches within the same administrative ATM switch network. A public NNI signaling
protocol called B-ICI (BISDN or Broadband ISDN) Inter Carrier Interface, which is
depicted in Figure 6.4 and described in [AF1995a], is used to communicate between
public ATM networks.
The UNI signaling request is mapped by the ingress ATM switch into NNI signaling, and
then is mapped from NNI signaling back to UNI signaling at the egress switch. An end-
system UNI request, for example, may interact with an interswitch NNI signaling protocol,
such as PNNI (Private Network-to-Network Interface).
PNNI is a dynamic signaling and routing protocol that is run within the ATM network
between switches and sets up SVCs through the network. PNNI uses a complex
algorithm to determine the best path through the ATM switch network and to provide
rerouting services when a VC failure occurs. The generic PNNI reference is specified in
[AF1996c].
The original specification of PNNI, Phase 0, also is called IISP (Interim Interswitch
Signaling Protocol) . The name change is intended to avoid confusion between PNNI
Phase 0 and PNNI Phase 1.
PNNI provides for highly complex VC path-selection services that calculate paths through
the network based on the cost associated with each interswitch link. The costing can be
configured by the network administrator to indicate preferred links in the switch topology.
PNNI is similar to OSPF (Open Shortest Path First) in many regards—both are fast
convergence link-state protocols. However, whereas PNNI is used only to route signaling
requests across the ATM network and ultimately provide for VC establishment, OSPF is
used at the network layer in the OSI (Open Systems Interconnection) reference model to
calculate the best path for packet forwarding. PNNI does not forward packets, and PNNI
does not forward cells; it simply provides routing and path information for VC
establishment.
PNNI does provide an aspect of QoS within ATM, however. Unlike other link-state routing
protocols, PNNI not only advertises the link metrics of the ATM network, it also
advertises information about each node in the ATM network, including the internal state
96
of each switch and the transit behavior of traffic between switches in the network. PNNI
also performs source routing (also known as explicit routing), in which the ingress switch
determines the entire path to the destination, as opposed to path calculation being done
on a hop-by-hop basis. This behavior is one of the most attractive features of ATM
dynamic VC establishment and path calculation—the capability to determine the state of
the network, to determine the end-to-end path characteristics (such as congestion,
latency, and jitter), and to build connections according to this state. With the various ATM
service categories (listed in the following section), as well as the requested QoS
parameters (e.g., cell delay, delay variance, and loss ratio), PNNI also provides an
admission-control function (CAC). When a connection is requested, the ingress switch
determines whether it can honor the request based on the traffic parameters included in
the request. If it cannot, the connection request is rejected.
Tip You can find approved and pending ATM technical specifications, including the
Traffic Management 4.0 and PNNI 1.0 Specifications, at the ATM Forum Web site,
located at www.atmforum.com.
97
Real-Time Variable Bit (For further study) Statistical mux, real time
Rate (rt-VBR)
Available Bit Rate (ABR) Available Bit Rate (ABR) Resource exploitation,
feedback control
The basic differences among these service categories are described in the following
sections.
98
service.
You can use the VBR service categories for any class of applications that might benefit
from sending data at variable rates to most efficiently use network resources. You could
use Real-Time VBR (rt-VBR), for example, for multimedia applications with lossy
properties—applications that can tolerate a small amount of cell loss without noticeably
degrading the quality of the presentation. Some multimedia protocol formats may use a
lossy compression scheme that provides these properties. You could use Non-Real-Time
VBR (nrt-VBR), on the other hand, for transaction-oriented applications, such as
interactive reservation systems, where traffic is sporadic and bursty.
ABR uses Resource Management (RM) cells to provide feedback that controls the traffic
source in response to fluctuations in available resources within the interior ATM network.
The specification for ABR flow control uses these RM cells to control the flow of cell
traffic on ABR connections. The ABR service expects the end-system to adapt its traffic
rate in accordance with the feedback so that it may obtain its fair share of available
network resources. The goal of ABR service is to provide fast access to available
network resources at up to the specified Peak Cell Rate (PCR).
These service categories provide a method to relate traffic characteristics and QoS
requirements to network behavior. ATM network functions such as VC/VP path
establishment, CAC, and bandwidth allocation are structured differently for each category.
The service categories are characterized as being real-time or non-real-time. There are two
real-time service categories: CBR and rt-VBR, each of which is distinguished by whether
the traffic descriptor contains only the Peak Cell Rate (PCR) or both the PCR and the
Sustained Cell Rate (SCR) parameters. The remaining three service categories are
considered non-real-time services: nrt-VBR, UBR, and ABR. Each service class differs in its
method of obtaining service guarantees provided by the network and relies on different
mechanisms implemented in the end-systems and the higher-layer protocols to realize
them. Selection of an appropriate service category is application specific.
99
ATM Traffic Parameters
Each ATM connection contains a set of parameters that describes the traffic
characteristics of the source. These parameters are called source traffic parameters.
Source traffic parameters, coupled with another parameter called the CDVT (Cell Delay
Variation Tolerance) and a conformance-definition parameter, characterize the traffic
properties of an ATM connection. Not all these traffic parameters are valid for each
service category. When an end-
system requests an ATM SVC (Switched Virtual Connection) to be set up, it indicates to
the ingress ATM switch the type of service required, the traffic parameters of each data
flow (in both directions), and the QoS parameters requested in each direction. These
parameters form the traffic descriptor for the connection. You just examined the service
categories; the traffic parameters consist of the following:
Peak Cell Rate (PCR). The maximum allowable rate at which cells can be
transported along a connection in the ATM network. The PCR is the determining
factor in how often cells are sent in relation to time in an effort to minimize jitter. PCR
generally is coupled with the CDVT, which indicates how much jitter is allowable.
Sustainable Cell Rate (SCR). A calculation of the average allowable, long-term cell
transfer rate on a specific connection.
Maximum Burst Size (MBS). The maximum allowable burst size of cells that can be
transmitted contiguously on a particular connection.
Minimum Cell Rate (MCR). The minimum allowable rate at which cells can be
transported along an ATM connection.
Two other important aspects of topology information carried around in PNNI routing
updates are topology attributes and topology metrics. A topology metric is the cumulative
information about each link in the end-to-end path of a connection. A topology attribute is
the information about a single link. The PNNI path-selection process determines whether
a link is acceptable or desirable for use in setting up a particular connection based on the
topology attributes of a particular link or node. These parameters are where you begin to
get into the mechanics of ATM QoS. The topology metrics follow:
Cell Delay Variation (CDV). An algorithmic determination for the variance in the cell
100
delay, primarily intended to determine the amount of jitter. The CDV is a required
metric for CBR and rt-VBR service categories. It is not applicable to nrt-VBR, ABR,
and UBR service categories.
Maximum Cell Transfer Delay (maxCTD). A cumulative summary of the cell delay
on a switch-by-switch basis along the transit path of a particular connection,
measured in microseconds. The maxCTD is a required topology metric for CBR, rt-
VBR, and nrt-VBR service categories. It is not applicable to UBR and ABR service.
Cell Loss Ratio (CLR). CLR is the ratio of the number of cells unsuccessfully
transported across a link, or to a particular node, compared to the number of cells
successfully transmitted. CLR is a required topology attribute for CBR, rt-VBR, and
nrt-VBR service categories and is not applicable to ABR and UBR service categories.
CLR is defined for a connection as
Available Cell Rate (AvCR). AvCR is a measure of effective available capacity for
CBR, rt-VBR, and nrt-VBR service categories. For ABR service, AvCR is a measure
of capacity available for Minimum Cell Rate (MCR) reservation.
Cell Rate Margin (CRM). CRM is the difference between effective bandwidth
allocation and the allocation for Sustained Cell Rate (SCR) measured in units of cells
per second. CRM is an indication of the safety margin allocated above the aggregate
sustained cell rate. CRM is an optional topology attribute for rt-VBR and nrt-VBR
service categories and is not applicable to CBR, ABR, and UBR service categories.
Figure 6.5 provides a chart illustrating the PNNI topology state parameters. Figure 6.6
shows a matrix of the various ATM service categories and how they correspond to their
respective traffic and QoS parameters.
101
Figure 6.5: PNNI topology state parameters [AF1996c].
The ATM Forum’s Traffic Management Specification 4.0 specifies six QoS service
parameters that correspond to network-performance objectives. Three of these
parameters may be negotiated between the end-system and the network, and one or
more of these parameters may be offered on a per-connection basis.
The following three negotiated QoS parameters were described earlier; they are
repeated here because they also are topology metrics carried in the PNNI Topology
State Packets (PTSPs). Two of these negotiated QoS parameters are considered delay
parameters (CDV and maxCTD), and one is considered a dependability parameter
(CLR):
Cell Error Ratio (CER). Successfully transferred cells and errored cells contained in
cell blocks counted as SECBR (Severely Errored Cell Block Rate) cells should be
excluded in this calculation. The CER is defined for a connection as
102
determination occurs when a specific threshold of errored, lost, or misinserted cells
are observed. The SECBR is defined as
Cell Misinsertion Rate (CMR). The CMR most often is caused by an undetected
error in the header of a cell being transmitted. This performance parameter is defined
as a rate rather than a ratio, because the mechanism that produces misinserted cells
is independent of the number of transmitted cells received. The SECBR should be
excluded when calculating the CMR. The CMR can be defined as
Table 6.2 lists the cell-transfer performance parameters and their corresponding QoS
characterizations.
103
ATM QoS Classes
There are two types of ATM QoS classes: one that explicitly specifies performance
parameters (specified QoS class) and one for which no performance parameters are
specified (unspecified QoS class). QoS classes are associated with a particular
connection and specify a set of performance parameters and objective values for each
performance parameter specified. Examples of performance parameters that could be
specified in a given QoS class are CTD, CDV, and CLR.
An ATM network may support several QoS classes. At most, however, only one
unspecified QoS class can be supported by the network. It also stands to reason that the
performance provided by the network overall should meet or exceed the performance
parameters requested by the ATM end-system. The ATM connection indicates the
requested QoS by a particular class specification. For PVCs, the Network Management
System (NMS) is used to indicate the QoS class across the UNI signaling. For SVCs, a
signaling protocol’s information elements are used to communicate the QoS class across
the UNI to the network.
A correlation for QoS classes and ATM service categories results in a general set of
service classes:
QoS class 1. Supports a QoS that meets service class A performance requirements.
This should provide performance comparable to digital private lines.
QoS class 2. Supports a QoS that meets service class B performance requirements.
Should provide performance acceptable for packetized video and audio in
teleconferencing and multimedia applications.
QoS class 3. Supports a QoS that meets service class C performance requirements.
Should provide acceptable performance for interoperability connection-oriented
protocols, such as Frame Relay.
QoS class 4. Supports a QoS that meets service class D performance requirements.
Should provide for interoperability of connectionless protocols, such as IP.
The primary difference between specified and unspecified QoS classes is that with an
unspecified QoS class, no objective is specified for the performance parameters. However,
the network may determine a set of internal QoS objectives for the performance
parameters, resulting in an implicit QoS class being introduced. For example, a UBR
connection may select best-effort capability, an unspecified QoS class, and only a traffic
parameter for the PCR with a CLP = 1. This criteria then can be used to support data
capable of adapting the traffic flow into the network based on time-variable resource
fluctuation.
104
ATM and IP Multicast
Although it does not have a direct bearing on QoS issues, it is nonetheless important to
touch on the basic elements that provide for the interaction of IP multicast and ATM.
These concepts will surface again later and be more relevant when discussing the IETF
Integrated Services architecture, the RSVP (Resource ReSerVation Protocol), and ATM.
Of course, there are no outstanding issues when IP multicast is run with ATM PVCs,
because all ATM end-systems are static and generally available at all times. Multicast
receivers are added to a particular multicast group as they normally would in any point-
to-point or shared media environment.
The case of ATM SVCs is a bit more complex. There are basically two methods for using
ATM SVCs for IP multicast traffic. The first is the establishment of an end-to-end VC for
each sender-receiver pair in a the multicast group. This is fairly straightforward; however,
depending on the number of nodes participating in a particular multicast group, this
approach has obvious scaling issues associated with it. The second method uses ATM
SVCs to provide an ingenious mechanism to handle IP multicast traffic by point-to-
multipoint VCs. As multicast receivers are added to the multicast tree, new branches are
added to the point-to-multipoint VC tree (Figure 6.7).
ATM-based IP hosts and routers may alternatively use a Multicast Address Resolution
Server (MARS) [IETF1996b] to support RFC 1112 [IETF1989a] style Level 2 IP multicast
over the ATM Forum’s UNI 3.0/3.1 point-to-multipoint connection service. The MARS server
is an extension of the ATM ARP (Address Resolution Protocol) server described in
RFC1577 [IETF1994c], and for matters of practicality, the MARS functionality can be
incorporated into the router to facilitate multicast-to-ATM host address resolution services.
MARS messages support the distribution of multicast group membership information
between the MARS server and multicast end-systems. End-systems query the MARS
server when an IP address needs to be resolved to a set of ATM endpoints making up the
multicast group, and end-systems inform MARS when they need to join or leave a multicast
group.
105
Factors That May Affect ATM QoS Parameters
It is important to consider factors that may have an impact on QoS parameters—factors
that may be the result of undesirable characteristics of a public or private ATM network.
As outlined in [AF1996d], there are several reasons why QoS might become degraded,
and certain network events may adversely impact the network’s capability to provide
qualitative QoS. One of the principal reasons why QoS might become degraded is
because of the ATM switch architecture itself. The ATM switching matrix design may be
suboptimal, or the buffering strategy may be shared across multiple ports, as opposed to
providing per-port or per-VC buffering. Buffering capacity therefore may be less than
satisfactory, and as a result, congestion situations may be introduced into the network.
Other sources of QoS degradation include media errors; excessive traffic load; excessive
capacity reserved for a particular set of connections; and failures introduced by port, link,
or switch loss. Table 6.3 lists the QoS parameters associated with particular degradation
scenarios.
Progagation delay X
Switch architecture X X X
Buffer capacity X X X X
Number of tandem X X X X X X
nodes
Traffic load X X X X
Failures X X X
Resource allocation X X X
106
IP packets in fixed-length ATM cells.
The last cell of an AAL5 frame, for example, will contain anywhere between 0 to 39 bytes
of padding, which can be considered wasted bandwidth. Assuming that a broad range of
packet sizes exists in the Internet, you could conclude that the average waste is about 20
bytes per packet. Based on an average packet size of 200 bytes, for example, the waste
caused by cell padding is about 10 percent. However, because of the broad distribution
of packet sizes, the actual overhead may vary substantially. Note that this 10 percent is
in addition to the 10 percent overhead imposed by the 5-byte ATM cell headers (5 bytes
subtracted from the cell size of 53 bytes is approximately a 10 percent overhead) and
various other overhead (some of which also is present in frame-over-SONET,
Synchronous Optical Network, schemes).
Suppose that you want to estimate the ATM bandwidth available on an OC3 circuit. With
OC-3 SONET, 155.520 Mbps is reduced to 149.760 Mbps due to section, line, and
SONET path overhead. Next, you reduce this figure by 10 percent, because an average
of 20 bytes per 200-byte packet is lost (due to ATM cell padding), which results in
134.784 Mbps. Next, you can subtract 9.43 percent due to ATM headers of 5 bytes in 53-
byte packets. Thus, you end up with a 122.069-Mbps available bandwidth figure, which is
about 78.5 percent of the nominal OC-3 capacity. Of course, this figure may vary
depending on the size of the packet data and the amount of padding that must be done
to segment the packet into a 48-byte cell payload and fully populate the last cell with
padding. Additional overhead is added for AAL5 (the most common ATM adaptation
layer used to transmit data across ATM networks), framing (4 bytes length and 4 bytes
CRC), and LLC (Link Layer Control) SNAP (SubNetwork Access Protocol) (8 bytes)
encapsulation of frame-based traffic.
When you compare this scenario to the 7 bytes of overhead for traditional PPP (Point-to-
Point Protocol) encapsulation, which traditionally is run on point-to-point circuits, you can
see how this produces a philosophical schism between IP engineering purists and ATM
proponents. Although conflicts in philosophies regarding engineering efficiency clearly
exist, once you can get beyond the cell tax, as the ATM overhead is called, ATM does
provide interesting traffic-management capabilities. This is why you can consider the
philosophical arguments over the cell tax as fruitless: After you accept the fact that ATM
does indeed consume a significant amount of overhead, you still can see that ATM
provides more significant benefits than liabilities.
In this vein, you can see that virtual multiplexing technologies such as ATM and frame
relay also provide the necessary tools that enable people to build sloppy networks—poor
designs that attempt to create a flat network, in which all end-points are virtually one hop
away from one another, regardless of how many physical devices are in the transit path.
This design approach is not a reason for concern in ATM networks, in which an
insignificantly small number of possible ATM end-points exists. However, in networks in
which a substantially large number of end-points exists, this design approach presents a
reason for serious concern over scaling the Layer 3 routing system. Many Layer 3 routing
protocols require that routers maintain adjacencies or peering relationships with other
routers to exchange routing and topology information. The more peers or adjacencies,
the greater the computational resources consumed by each device. Therefore, in a flat
network topology, which has no hierarchy, a much larger number of peers or adjacencies
107
exists. Failure to introduce a hierarchy into a large network in an effort to promote scaling
sometimes can be suicidal.
Many people feel that ATM is excessively complex and that when tested against the
principle of Occam’s Razor, ATM by itself would not be the choice for QoS services,
simply because of the complexity involved compared with other technologies that provide
similar results. However, the application of Occam’s Razor does not provide assurances
that the desired result will be delivered; instead, it simply expresses a preference for
simplicity.
Occam’s Razor
ATM enthusiasts correctly point out that ATM is complex for good reason; to provide
predictive, proactive, and real-time services, such as dynamic network resource
allocation, resource guarantees, virtual circuit rerouting, and virtual circuit path
establishment to accommodate subscriber QoS requests, ATM’s complexity is
unavoidable.
It also has been observed that higher-layer protocols, such as TCP/IP, provide the end-
to-end transportation service in most cases, so that although it is possible to create QoS
services in a lower layer of the protocol stack, namely ATM in this case, such services
may cover only part of the end-to-end data path. This gets to the heart of the problem in
delivering QoS with ATM, when the true end-to-end bearer service is not pervasive ATM.
Such partial QoS measures often have their effects masked by the effects of the traffic
distortion created from the remainder of the end-to-end path in which they do not reside,
and hence the overall outcome of a partial QoS structure often is ineffectual.
In other words, if ATM is not pervasively deployed end-to-end in the data path, efforts to
deliver QoS using ATM can be ineffectual. The traffic distortion is introduced into the
ATM landscape by traffic-forwarding devices that service the ATM network and upper-
layer protocols such as IP, TCP, and UDP, as well as other upper-layer network
protocols. Queuing and buffering introduced into the network by routers and non-ATM-
attached hosts skew the accuracy with which the lower-layer ATM services calculate
delay and delay variation. Routers also may introduce needless congestion states,
dependent on the quality of the hardware platform or the network design.
108
Differing Lines of Reasoning
An opposing line of reasoning suggests that end-stations simply could be ATM-attached.
However, a realization of this suggestion introduces several new problems, such as the
inability to aggregate downstream traffic flows and provide adequate bandwidth capacity
in the ATM network. Efficient utilization of bandwidth resources in the ATM network
continues to be a primary concern for network administrators, nonetheless. Yet another
line of reasoning suggests that upper-layer protocols are unnecessary, because they
tend to render ATM QoS mechanisms ineffectual by introducing congestion bottlenecks
and unwanted latency into the equation. The flaw in this line of reasoning is that native
ATM applications do not exist for the majority of popular, commodity, off-the-shelf
software applications, and even if they did, the capability to build a hierarchy and the
separation of administrative domains into the network system is diminished severely.
Scaleability such as exists in the global Internet is impossible with native ATM.
On a related note, some have suggested that most traffic on ATM networks would be
primarily UBR or ABR connections, because higher-layer protocols and applications
cannot request specific ATM QoS service classes and therefore cannot fully exploit the
QoS capabilities of the VBR service categories. A cursory examination of deployed ATM
networks and their associated traffic profiles reveals that this is indeed the case, except
in the rare instance when an academic or research organization has developed its own
native ATM-aware applications that can fully exploit the QoS parameters available to the
rt-VBR and nrt-VBR service categories. Although this certainly is possible and has been
done on many occasions, real-world experience reveals that this is the proverbial
exception and not the rule.
Aside from traditional data services that may use UBR, ABR, or VBR services, it is clear
that circuit-emulation services that may be provisioned using the CBR service category
clearly can provide the QoS necessary for telephony communications. However, this
109
becomes an exercise in comparing apples and oranges. Delivering voice services on
virtual digital circuits using circuit emulation is quite different from delivering packet-based
data found in local area and wide-area networks.
By the same token, providing QoS in these two environments is substantially different; it
is substantially more difficult to deliver QoS for data, because the higher-layer
applications and protocols do not provide the necessary hooks to exploit the QoS
mechanisms in the ATM network. As a result, an intervening router must make the QoS
request on behalf of the application, and thus the ATM network really has no way of
discerning what type of QoS the application may truly require. This particular deficiency
has been the topic of recent research and development efforts to address this
shortcoming and investigate methods of allowing the end-systems to request network
resources using RSVP [IETF1997f], and then map these requests to native ATM QoS
service classes as appropriate. You will revisit this issue in Chapter 7, “The Integrated
Services Architecture.”
This leads to the conclusion that the gratuitous use of the term guarantee is quite
misleading and should not be taken in a literal sense. Although ATM certainly is capable
of delivering QoS when dealing with native cell-based traffic, the introduction of packet-
based traffic (i.e., IP) and Layer 3 forwarding devices (routers) into this environment may
have an adverse impact on the ATM network’s capability to properly deliver QoS and
certainly may produce unpredictable results. With the upper-layer protocols, there is no
equivalent of a guarantee. In fact, packet loss is expected to occur to implicitly signal the
traffic source that errors are present or that the network or the specified destination is not
capable of accepting traffic at the rate at which it is being transmitted. When this occurs,
these discrete mechanisms that operate at various substrates of the protocol’s stack
(e.g., ATM traffic parameter monitoring, TCP congestion avoidance, random early
detection, ABR flow control) may well demonstrate self-defeating behavior because of
these internetworking discrepancies—the inability for these different mechanisms to
explicitly communicate with one another. ATM provides desirable properties with regard
to increased speed of data-transfer rates, but in most cases, the underlying signaling and
QoS mechanisms are viewed as excess baggage when the end-to-end bearer service is
not ATM.
When considering this question, the basic differences in the design of ATM and IP
110
become apparent. The prevailing fundamental design philosophy for the Internet is to
offer coherent end-to-end data delivery services that are not reliant on any particular
transport technology and indeed can function across a path that uses a diverse collection
of transport technologies. To achieve this functionality, the basic TCP/IP signaling
mechanism uses two very basic parameters for end-to-end characterization: a dynamic
estimate of end-to-end Round Trip Time (RTT) and packet loss. If the network exhibits a
behavior in which congestion occurs within a window of the RTT, the end-to-end
signaling can accurately detect and adjust to the dynamic behavior of the network.
ATM, like many other data-link layer transport technologies, uses a far richer set of
signaling mechanisms. The intention here is to support a wider set of data-transport
applications, including a wide variety of real-time applications and traditional non-real-
time applications. This richer signaling capability is available simply because of the
homogenous nature of the ATM network, and the signaling capability can be used to
support a wide variety of traffic-shaping profiles that are available in ATM switches.
However, this richer signaling environment, together with the use of a profile adapted
toward real-time traffic with very low jitter tolerance, can create a somewhat different
congestion paradigm. For real-time traffic, the response to congestion is immediate load
reduction, on the basis that queuing data can dramatically increase the jitter and
lengthen the congestion event duration. The design objective in a real-time environment
is the immediate and rapid discarding of cells to clear the congestion event. Given the
assumption that integrity of real-time traffic is of critical economic value, data that
requires integrity will use end-to-end signaling to detect and retransmit the lost data;
hence, the longer recovery time for data transfer is not a significant economic factor to
the service provider.
The result of this design objective is that congestion events in an ATM environment occur
and are cleared (or at the very least, are attempted to be cleared) within time intervals
that generally are well within a single end-to-end IP round-trip time. Therefore, when the
ATM switch discards cells to clear local queue overflow, the resultant signaling of IP
packet loss to the destination system (and the return signal of a NAK for missing a
packet) takes a time interval of up to one RTT. By the time the TCP session reduces the
transmit window in response to this signaling, the ATM congestion event is cleared. The
resulting observation indicates that it is a design challenge to define the ATM traffic-
shaping characteristics for IP-over-ATM traffic paths in order for end-to-end TCP
sessions to sustain maximal data-transfer rates. This, in turn, impacts the overall
expectation that ATM provide a promise of increased cost efficiency through multiplexing
different traffic streams over a single switching environment; it is countered by the risks
of poor payload delivery efficiency.
v v v
The QoS objective for networks similar in nature to the Internet lies principally in directing
the network to alter the switching behavior at the IP layer so that certain IP packets are
delayed or discarded at the onset of congestion to delay (or completely avoid if at all
possible) the impact of congestion on other classes of IP traffic. When looking at IP-over-
ATM, the issue (as with IP-over-frame relay) is that there is no mechanism for mapping
such IP-level directives to the ATM level, nor is it desirable, given the small size of ATM
cells and the consequent requirement for rapid processing or discard. Attempting to
increase the complexity of the ATM cell discard mechanics to the extent necessary to
preserve the original IP QoS directives by mapping them into the ATM cell is
counterproductive.
Thus, it appears that the default IP QoS approach is best suited to IP-over-ATM. It also
stands to reason that if the ATM network is adequately dimensioned to handle burst
loads without the requirement to undertake large-scale congestion avoidance at the ATM
111
layer, there is no need for the IP layer to invoke congestion-management mechanisms.
Thus, the discussion comes full circle to an issue of capacity engineering, and not
necessarily one of QoS within ATM.
Clearly defining the services to be provided. The first task faced by this working
group was to define and document this “new and improved” enhanced Internet
service model.
Quality of Service, in the context of the Integrated Services framework, refers to the
nature of the packet delivery service provided by the network, as characterized by
parameters such as achieved bandwidth, packet delay, and packet loss rates
[ITEF1997e]. A network node is any component of the network that handles data packets
and is capable of imposing QoS control over data flowing through it. Nodes include
routers, subnets (the underlying link-layer transport technologies), and end-systems. A
QoS-capable or IS-capable node can be described as a network node that can provide
one or more of the services defined in the Integrated Services model. A QoS-aware or
IS-aware node is a network node that supports the specific interfaces required by the
Integrated Services service definitions but cannot provide the requested service.
Although a QoS-aware node may not be able to provide any of the QoS services
themselves, it can simply understand the service request parameters and deny QoS
112
service requests accordingly.
Service or QoS control service refers to a coordinated set of QoS control capabilities
provided by a single node. The definition of a service includes a specification of the
functions to be performed by the node, the information required by the node to perform
these functions, and the information made available by a specific node to other nodes in
the etwork
The Integrated Services architecture consists of five key components: QoS requirements,
resource-sharing requirements, allowances for packet dropping, provisions for usage
feedback, and a resource reservation protocol (in this case, RSVP).
113
Chapter 7: The Integrated Services
Architecture
Overview
Anyone faced with the task of reviewing, understanding, and comparing approaches in
providing Quality of Service might at first be overwhelmed by the complexities involved in
the IETF (Internet Engineering Task Force) Integrated Services architecture, described in
detail in [IETF1994b]. The Integrated Services architecture was designed to provide a set
of extensions to the best-effort traffic delivery model currently in place in the Internet. The
framework was designed to provide special handling for certain types of traffic and to
provide a mechanism for applications to choose between multiple levels of delivery
services for its traffic.
The IETF I-Ds (Internet Drafts) referenced in this book should be considered works
in progress because of the ongoing work by their authors (and respective working
group participants) to refine the specifications and semantics contained in them.
The I-D document versions may change over the course of time, and some of
these drafts may be advanced and subsequently published as Requests for
Comments (RFCs) as they are finalized. Links to updated and current versions of
these documents, proposals, and technical specifications mentioned in this chapter
generally can be found on the IETF Web site, located at www.ietf.org, within the
Integrated Services (IntServ), Integrated Services over Specific Link Layers
(ISSLL), or Resource ReSerVation Setup Protocol (RSVP) working groups
sections.
The Integrated Services architecture, in and of itself, is amazingly similar in concept to the
technical mechanics espoused by ATM—namely, in an effort to provision “guaranteed”
services, as well as differing levels of best effort via a “controlled-load” mechanism. In fact,
the Integrated Services architecture and ATM are somewhat analogous; IntServ provides
signaling for QoS parameters at Layer 3 in the OSI (Open Systems Interconnection)
reference model, and ATM provides signaling for QoS parameters at Layer 2.
114
A Background on Integrated Services Framework
The concept of the Integrated Services framework begins with the suggestion that the
basic underlying Internet architecture does not need to be modified to provide
customized support for different applications. Instead, it suggests that a set of extensions
can be developed that provide services beyond the traditional best-effort service. The
IETF IntServ working group charter [INTSERVa] articulates that efforts within the working
group are focused on three primary goals:
Clearly defining the services to be provided. The first task faced by this working
group was to define and document this “new and improved” enhanced Internet
service model.
Quality of Service, in the context of the Integrated Services framework, refers to the
nature of the packet delivery service provided by the network, as characterized by
parameters such as achieved bandwidth, packet delay, and packet loss rates
[ITEF1997e]. A network node is any component of the network that handles data packets
and is capable of imposing QoS control over data flowing through it. Nodes include
routers, subnets (the underlying link-layer transport technologies), and end-systems. A
QoS-capable or IS-capable node can be described as a network node that can provide
one or more of the services defined in the Integrated Services model. A QoS-aware or
IS-aware node is a network node that supports the specific interfaces required by the
Integrated Services service definitions but cannot provide the requested service.
Although a QoS-aware node may not be able to provide any of the QoS services
themselves, it can simply understand the service request parameters and deny QoS
service requests accordingly.
Service or QoS control service refers to a coordinated set of QoS control capabilities
provided by a single node. The definition of a service includes a specification of the
functions to be performed by the node, the information required by the node to perform
these functions, and the information made available by a specific node to other nodes in
115
the network
The Integrated Services architecture consists of five key components: QoS requirements,
resource-sharing requirements, allowances for packet dropping, provisions for usage
feedback, and a resource reservation protocol (in this case, RSVP).
116
QoS Requirements
The Integrated Services model is concerned primarily with the time-of-delivery of traffic;
therefore, per-packet delay is the central theme in determining QoS commitments.
Understanding the characterization of real-time and non-real-time, or elastic, applications
and how they behave in the network is an important aspect of the Integrated Services
model.
The amount of latency introduced is variable, because latency is the cumulative sum of
the transmission times and queuing hold times (where queuing hold-times can be highly
variable). This variation in latency or jitter in the real-time signal is what must be
smoothed by the playback. The receiver compensates for this jitter by buffering the
received data for a period of time (an offset delay) before playing back the data stream,
in an attempt to negate the effects of the jitter introduced by the network. The trick is in
calculating the offset delay, because having an offset delay that is too short for the
current level of jitter effectively renders the original real-time signal pretty much
worthless.
The ideal scenario is to have a mechanism that can dynamically calculate and adjust the
offset delay in response to fluctuations in the average jitter induced. An application that
can adjust its offset delay is called an adaptive playback application. The predominate
trait of a real-time application is that it does not wait for the late arrival of packets when
playing back the data signal at the receiver; it simply imposes an offset delay prior to
processing.
For tolerant applications, the Integrated Services model recommends the use of a
predictive service, otherwise known as a controlled-load service. For intolerant
applications, IntServ recommends a guaranteed service model. The fundamental
difference in these two models is that one provides a reliable upper bound on delay
(guaranteed) and the other controlled-load) provides a less-than-reliable delay bound.
117
the data. Several types of elastic applications exist: interactive burst (e.g., Telnet),
interactive bulk transfer (e.g., File Transfer Protocol, or FTP), and asynchronous bulk
transfer (e.g., Simple Mail Transport Protocol, or SMTP). The delay sensitivity varies
dramatically with each of the types, so they can be referred to as belonging to a best-
effort service class.
The effect of real-time traffic flows is to lock the sender and receiver into a common
clocking regime, where the only major difference is the offset of the receiver’s
clock, as determined by the network-propagtion delay. This state must be imposed
on the network as distinct from the sender adapting the rate clocking to the current
state of the network.
118
NON_IS_HOP
The NON_IS_HOP parameter provides information about the presence of nodes that do
not implement QoS control services along the data path. In this vein, the IS portion of this
and other parameters also means Integrated Services–aware, in that an IS-aware
element is one that conforms to the requirements specified in the Integrated Services
architecture. A flag is set in this object if a node does not implement the relevant QoS
control service or knows that there is a break in the traffic path of nodes that implement
the service. This also is called a break bit, because it represents a break in the chain of
network elements required to provide an end-to-end traffic path for the specified QoS
service class.
NUMBER_OF_IS_HOPS
The NUMBER_OF_IS_HOPS parameter is represented by a counter that is a cumulative
total incremented by 1 at each IS-aware hop. This parameter is used to inform the flow
end-points of the number of IS-aware nodes which lie in the data path. Valid values for
this parameter range from 1 to 255, and in practice, is limited by the bound on the IP hop
count.
AVAILABLE_PATH_BANDWIDTH
The AVAILABLE_PATH_BANDWIDTH parameter provides information about the
available bandwidth along the path followed by a data flow. This is a local parameter and
provides an estimate of the bandwidth nodes available for traffic following the path.
Values for this parameter are measured in bytes per second and range in value from 1
byte per second to 40 terabytes per second (which is believed to be the theoretical
maximum bandwidth of a single strand of fiber).
MINIMUM_PATH_LATENCY
The MINIMUM_PATH_LATENCY local parameter is a representation of the latency in
the forwarding process associated with the node, where the latency is defined to be the
smallest possible packet delay added by the node itself. This delay results from speed-
of-light propagation delay, packet-processing limitations, or both. It does not include any
variable queuing delay that may be introduced. The purpose of this parameter is to
provide a baseline minimum path latency figure to be used with services that provide
estimates or bounds on additional path delay, such as the guaranteed service class.
Together with the queuing delay bound offered by the guaranteed service class, this
parameter gives the application a priori knowledge of both the minimum and maximum
packet-delivery delay. Knowing both minimum and maximum latencies experienced by
traffic allows the receiving application to attempt to accurately compute buffer
requirements to remove network-induced jitter.
PATH_MTU
The PATH_MTU parameter is a representation of the Maximum Transmission Unit
(MTU) for packets traversing the data path, measured in bytes. This parameter informs
the end-point of the packet MTU size that can traverse the data path without being
fragmented. A correct and valid value for this parameter must be specified by all IS-
aware nodes. This value is required to invoke QoS control services that require the IP
packet size to be strictly limited to a specific MTU. Existing MTU discovery mechanisms
cannot be used, because they provide information only to the sender, and they do not
directly allow for QoS control services to specify MTUs smaller than the physical MTU.
The local parameter is the IP MTU, where the MTU of the node is defined as the
maximum size the node can transmit without fragmentation, including upper-layer and IP
119
headers but excluding link-layer headers.
TOKEN_BUCKET_TSPEC
The TOKEN_BUCKET_TSPEC parameter describes traffic parameters using a simple
token-bucket filter and is used by data senders to characterize the traffic it expects to
generate. This parameter also is used by QoS control services to describe the
parameters of traffic for which the subsequent reservation should apply. This parameter
takes the form of a token-bucket specification plus a peak rate, a minimum policed unit,
and a maximum packet size. The token-bucket specification itself includes an average
token rate and a bucket depth.
The token rate (r) is measured in bytes of IP datagrams per second and may range in
value from 1 byte per second to 40 terabytes per second. The token-bucket depth (b) is
measured in bytes and values range from 1 byte to 250 gigabytes. The peak traffic rate
(p) is measured in bytes of IP datagrams per second and may range in value from 1 byte
per second to 40 terabytes per second.
The minimum policed unit (m) is an integer measured in bytes. The purpose of this
parameter is to allow a reasonable estimate of the per-packet resources needed to
process a flow’s packets; the maximum packet rate can be computed from the values
expressed in b and m. The size includes the application data and all associated protocol
headers at or above the IP layer. It does not include the link-layer headers, because
these may change in size as a packet traverses different portions of a network. All
datagrams less than size m are treated as being of size m for the purposes of resource
allocation and policing.
The maximum packet size (M) is the largest packet that will conform to the traffic
specification, also measured in bytes. Packets transmitted that are larger than M may not
receive QoS-controlled service, because they are considered to be nonconformant with
the traffic specification.
The range of values that can be specified in these parameters is intentionally designed
large enough to allow for future network technologies—a node is not expected to support
the full range of values.
To ensure that this set of conditions is met, the application requesting the controlled-load
service provides the network with an estimation of the traffic it will generate—the TSpec
or traffic specification. The controlled-load service uses the TOKEN_BUCKET_SPEC to
describe a data flow’s traffic parameters and therefore is synonymous with the term
TSpec referenced hereafter. In turn, each node handling the controlled-load service
request ensures that sufficient resources are available to accommodate the request. The
amount of accuracy with which the TSpec matches available resources in the network
does not have to be precise. If the requested resources fall outside the bounds of what is
120
available, the traffic originator may experience a negligible amount of induced delay or
possibly dropped packets because of congestion situations. However, the degree at
which traffic may be dropped or delayed should be slight enough for the adaptive real-
time applications to function without noticeable degradation.
The controlled-load service does not accept or use specific values for control parameters
that include information about delay or loss. Acceptance of a controlled-load request
implies a commitment to provide a better-than-best-effort service that approximates
network behavior under nominal network-utilization conditions.
The method a node uses to determine whether adequate resources are available to
accommodate a service request is purely a local matter and may be implementation
dependent; only the control parameters and message formats are required to be
interoperable.
Links on which the controlled-load service is run are not allowed to fragment packets.
Packets larger that the MTU of the link must be treated as nonconformant with the
TSpec.
The controlled-load service is provided to a flow when traffic conforms to the TSpec
given at the time of flow setup. When nonconformant packets are presented with a
|ontrolled-load flow, the node must ensure that three things happen. First, the node must
ensure that it continues to provide the contracted QoS to those controlled-load flows that
are conformant. Second, the node should prevent nonconformant traffic in a controlled-
load flow from unfairly impacting other conformant controlled-load flows. Third, the node
must attempt to forward nonconformant traffic on a best-effort basis if sufficient resources
are available. Nodes should not assume that nonconformant traffic is indicative of an
error, because large numbers of packets may be nonconformant as a matter of course.
This nonconformancy occurs because some downstream nodes may not police extended
bursts of traffic to conform with the specified TSpec and in fact will borrow available
bandwidth resources to clear traffic bursts that have queued up. If a flow obtains its exact
fixed-token rate in the presence of an extended burst, for example, there is a danger that
the queue will fill up to the point of packet discard. To prevent this situation, the
controlled-load node may allow the flow to exceed its token rate in an effort to reduce the
queue buildup. Thus, nodes should be prepared to accommodate bursts larger than the
advertised TSpec.
The guaranteed service guarantees that packets will arrive within a certain delivery time
and will not be discarded because of queue overflows, provided that the flow’s traffic
stays within the bounds of its specified traffic parameters. The guaranteed service does
not control the minimal or average delay of traffic, and it doesn’t control or minimize jitter
(the variance between the minimal and maximal delay)—it only controls the maximum
queuing delay.
121
The guarantee service is invoked by a sender specifying the flow’s traffic parameters (the
TSpec) and the receiver subsequently requesting a desired service level (the RSpec).
The guaranteed service also uses the TOKEN_BUCKET_TSPEC parameter as the
TSpec. The RSpec (reservation specification) consists of a data rate (R) and a slack term
(S), where R must be greater than or equal to the token-bucket data rate (r). The rate (R)
is measured in bytes of IP datagrams per second and has a value range of between 1
byte per second to 40 terabytes per second. The slack term (S) is measured in
microseconds. The RSpec rate can be larger than the TSpec rate, because higher rates
are assumed to reduce queuing delay. The slack term represents the difference between
the desired delay and the delay obtained by using a reservation level of R. The slack
term also can be used by the network to reduce its resource reservation for the flow.
Because of the end-to-end and hop-by-hop calculation of two error terms (C and D),
every node in the data path must implement the guaranteed service for this service class
to function.. The first error term (C) provides a cumulative representation of the delay a
packet might experience because of rate parameters of a flow, also referred to as packet
serialization. The error term (C) is measured in bytes. The second error term (D) is a
rate-independent, per-element representation of delay imposed by time spent waiting for
transmission through a node. The error term (D) is measured in units of 1 microsecond.
The cumulative end-to-end calculation of these error terms (Ctot and Dtot) represent a
flow’s deviation from the fluid model. The fluid model states that service flows within the
available total service model can operate independently of each other.
As with the controlled-load service, links on which the guaranteed service is run are not
allowed to fragment packets. Packets larger than the MTU of the link must be treated as
nonconformant with the TSpec.
Two types of traffic policing are associated with the guaranteed service: simple policing
and reshaping. Policing is done at the edges of the network, and reshaping is done at
intermediate nodes within the network. Simple policing is comparing traffic in a flow
against the TSpec for conformance. Reshaping consists of an attempt to restore the
flow’s traffic characteristics to conform to the TSpec. Reshaping mechanics delay the
forwarding of datagrams until they are in conformance of the TSpec. As described in
[IETF1997i], reshaping is done by combining a token bucket with a peak-rate regulator
and buffering a flow’s traffic until it can be forwarded in conformance with the token-
bucket (r) and peak-rate (p) parameters. Such reshaping may be necessary because of
small levels of distortion introduced by the packet-level of use of any transmission path.
This packet-level quantification of flows is what is addressed by reshaping. In general,
reshaping adds a small amount to the total delay, but it can reduce the overall jitter of the
flow.
Traffic Control
The Integrated Services model defines four mechanisms that comprise the traffic-control
functions at Layer 3 (the router) and above:
122
Packet classifier. This maps each incoming packet to a specific class so that these
classes may be acted on individually to deliver traffic differentiation.
Admission control. Determining whether a flow can be granted the requested QoS
without affecting other established flows in the network.
Figure 7.1 shows a reference model illustrating the relationship of these functions.
Resource-Sharing Requirements
It is important to understand that the allocation of network resources is accomplished on a
flow-by-flow basis, and that although each flow is subject to admission-control criteria, many
flows share the available resources on the network, which is described as linkharing. With
link sharing, the aggregate bandwidth in the network is shared by various types of traffic.
These types of traffic generally can be different network protocols (e.g., IP, IPX, SNA),
different services within the same protocol suite (e.g., Telenet, FTP, SMTP), or simply
different traffic flows that are segregated and classified by sender. It is important that
different traffic types do not unfairly utilize more than their fair share of network resources,
because that could result in a disruption of other traffic. The Integrated Services model also
focuses on link sharing by aggregate flows and link sharing with an additional admission-
control function—a fair queuing (i.e., Weighted Fair Queuing or WFQ) mechanism that
provides proportional allocation of network resources.
Packet-Dropping Allowances
The Integrated Services model outlines different scenarios in which traffic control is
implicitly provided by dropping packets. One concept is that some packets within a given
flow may be preemptable or subject to drop. This concept is based on situations in which
the network is in danger of reneging on established service commitments. A router
simply could discard traffic by acting on a particular packet’s preemptability option to
avoid disrupting established commitments. Another approach classifies packets that are
not subject to admission-control mechanisms.
Several other interesting approaches could be used, but naming each is beyond the
scope of this book. Just remember that it is necessary to drop packets in some cases to
control traffic in the network. Also, [IETF1997i] suggests that some guaranteed service
implementers may want to use preemptive packet dropping as a substitute for traffic
123
reshaping—if the result produces the same effect as reshaping at an intermediate
node—by using a combined token bucket and peak-rate regulator to buffer traffic until it
conforms to the TSpec. A preliminary proposal that provides guidelines for replacement
services was published in [ID1997e].
Tip Packet dropping can be compared to the Random Early Detection (RED) mechanism
for TCP (Transmission Control Protocol), as described in Chapter 4, “QoS and
TCP/IP: Finding the Common Denominator.” A common philosophy is that it is better
to reduce congestion in a controlled fashion before the onset of resoure saturation
until the congestion event is cleared, instead of waiting until all resources are fully
consumed, resulting in complete discard. This leads to the observation that
controlling the quality of a service often is the task of controlling the way in which a
service degrades in the face of congestion. The point here it that it is more stable to
degrade incrementally instead of waiting until the buffer resources are exhausted and
they degrade to complete exhaustion in a single catastrophic collapse.
124
RSVP: The Resource Reservation Model
As described earlier, the Integrated Services architecture provides a framework for
applications to choose between multiple controlled levels of delivery services for their
traffic flows. Two basic requirements exist to support this framework. The first
requirement is for the nodes in the traffic path to support the QoS control mechanisms
defined earlier he controlled-load and guaranteed services. The second requirement is
for a mechanism by which the applications can communicate their QoS requirements to
the nodes along the transit path, as well as for the network nodes to communicate
between one another the QoS requirements that must be provided for the particular
traffic flows. This could be provided in a number of ways, but as fate would have it, it is
provided by a resource reservation setup protocol called RSVP [IETF1997f].
In general terms, RSVP is used to provide QoS requests to all router nodes along the
transit path of the traffic flows and to maintain the state necessary in the router required
to actually provide the requested services. RSVP requests generally result in resources
being reserved in each router in the transit path for each flow. RSVP establishes and
maintains a soft state in nodes along the transit path of a reservation data path. A hard
state is what other technologies provide when setting up virtual circuits for the duration of
a data-transfer session; the connection is torn down after the transfer is completed. A
soft state is maintained by periodic refresh messages sent along the data path to
maintain the reservation and path state. In the absence of these periodic messages,
which typically are sent every 30 seconds, the state is deleted as it times out. This soft
state is necessary, because RSVP is essentially a QoS reservation protocol and does
not associate the reservation with a specific static path through the network. As such, it is
entirely possible that the path will change, so that the reservation state must be refreshed
periodically.
RSVP also provides dynamic QoS; the resources requested may be changed at any
given time for a number of reasons:
• An RSVP receiver may modify its requested QoS parameters at any time.
125
• A new sender can start sending to a multicast group with a larger traffic specification
than existing senders, thereby causing larger reservations to be requested by the
appropriate receivers.
• A new receiver in a multicast group may make a reservation request that is larger than
existing reservations.
The last two reasons are related inextricably to how reservations are merged in a
multicast tree. Figure 7.2 shows a simplified version of RSVP in hosts and routers.
Method of Operation
RSVP requires the receiver to be responsible for requesting specific QoS services
instead of the sender. This is an intentional design in the RSVP protocol that attempts to
provide for efficient accommodation of large groups (generally, for multicast traffic),
dynamic group membership (also for multicast), and diverse receiver requirements.
126
Figure 7.4: Traffic flow of the RSVP Resv message.
One option concerns the treatment of reservations for different senders within the same
RSVP session. The option has two modes: establish a distinct reservation for each
upstream sender or establish a shared reservation used for all packets of specified
senders.
Another option controls the selection of the senders. This option also has two modes: an
explicit list of all selected senders or a wildcard specification that implicitly selects all
senders for the session. In an explicit sender-selection reservation, each Filter Spec
must match exactly one sender. In a wildcard sender selection, no Filter Spec is needed.
As depicted in Figure 7.5 and outlined earlier, the Wildcard-Filter (WF) style implies a
shared reservation and a wildcard sender selection. A WF style reservation request
creates a single reservation shared by all flows from all upstream senders. The Fixed-
Filter (FF) style implies distinct reservations with explicit sender selection. The FF style
reservation request creates a distinct reservation for a traffic flow from a specific sender,
and the reservation is not shared with another sender’s traffic for the same RSVP
session. A Shared-Explicit (SE) style implies a shared reservation with explicit sender
selection. An SE style reservation request creates a single reservation shared by
selected upstream senders.
The RSVP specification does not allow the merging of shared and distinct style
reservations, because these modes are incompatible. The specification also does not
127
allow merging of explicit and wildcard style sender selection, because this most likely
would produce unpredictable results for a receiver that may specify an explicit style
sender selection. As a result, the WF, FF, and SE styles are all incompatible with one
another.
RSVP Messages
An RSVP message contains a message-type field in the header that indicates the
function of the message. Although seven types of RSVP messages exist, there are two
fundamental RSVP message types: the Resv (reservation) and Path messages, which
provide for the basic operation of RSVP. As mentioned earlier, an RSVP sender
transmits Path messages downstream along the traffic path provided by a discrete
routing protocol (e.g., OSPF). Path messages store path information in each node in the
traffic path, which includes at a minimum the IP address of each Previous Hop (PHOP) in
the traffic path. The IP address of the previous hop is used to determine the path in
which the subsequent Resv messages will be forwarded. The Resv message is
generated by the receiver and is transported back upstream toward the sender, creating
and maintaining reservation state in each node along the traffic path, following the
reverse path in which Path messages previously were sent and the same path data
packets will subsequently use.
RSVP control messages are sent as raw IP datagrams using protocol number 46.
Although raw IP datagrams are intended to be used between all end-systems and their
next-hop intermediate node router, the RSVP specification allows for end-systems that
cannot accommodate raw network I/O services to encapsulate RSVP messages in UDP
(User Datagram Protocol) packets.
Path, PathTear, and ResvConf messages must be sent with the Router Alert option
[IETF1997a] set in their IP headers. The Router Alert option signals nodes on the arrival
of IP datagrams that need special processing. Therefore, nodes that implement high-
performance forwarding designs can maximize forwarding rates in the face of normal
traffic and be alerted to situations in which they may have to interrupt this high-
performance forwarding mode to process special packets.
Path Messages
The RSVP Path message contains information in addition to the Previous Hop (PHOP)
address, which characterizes the sender’s traffic. These additional information elements
are called the Sender Template, the Sender TSpec, and the Adspec.
The Path message is required to carry a Sender Template, which describes the format of
data traffic the sender will originate. The Sender Template contains information called a
Filter Spec (filter specification), which uniquely identifies the sender’s flow from other
flows present in the same RSVP session on the same link. The Path message is required
to contain a Sender TSpec, which characterizes the traffic flow the sender will generate.
The TSpec parameter characterizes the traffic the sender expects to generate; it is
transported along the intermediate network nodes and received by the intended
receiver(s). The Sender TSpec is not modified by the intermediate nodes.
128
transit path or whether a specific QoS control service is available at each router in the
transit path. The Adspec also provides default or service-specific information for the
characterization parameters for the guaranteed service class.
Information also can be generated or modified within the network and used by the
receivers to make reservation decisions. This information may include specifics on
available resources, delay and bandwidth estimates, and various parameters used by
specific QoS control services. This information also is carried in the Adspec object and is
collected from the various nodes as it makes its way toward the receiver(s). The
information in the Adspec represents a cumulative summary, computed and updated
each time the Adspec passes through a node. The RSVP sender also generates an initial
Adspec object that characterizes its QoS control capabilities. This forms the starting point
for the accumulation of the path properties; the Adspec is added to the RSVP Path
message created and transmitted by the sender.
As mentioned earlier, the information contained in the Adspec is divided into fragments;
each fragment is associated with a specific control service. This allows the Adspec to
carry information about multiple services and allows the addition of new service classes
in the future without modification to the mechanisms used to transport them. The size of
the Adspec depends on the number and size of individual per-service fragments
included, as well as the presence of nondefault parameters.
At each node, the Adspec is passed from the RSVP process to the traffic-control module.
The traffic-control process updates the Adspec by identifying the services specified in the
Adspec and calling each process to update its respective portion of the Adspec as
necessary. If the traffic-control process discovers a QoS service specified in the Adspec
that is unsupported by the node, a flag is set to report this to the receiver. The updated
Adspec then is passed from the traffic-control process back to the RSVP process for
delivery to the next node in the traffic path. After the RSVP Path message is received by
the receiver, the Sender TSpec and the Adspec are passed up to the RAPI (RSVP
Application Programming Interface).
The Adspec carries flag bits that indicate that a non-IS-aware (or non-RSVP-aware)
router lies in the traffic path between the sender and receiver. These bits are called break
bits and correspond to the NON_IS_HOP characterization parameter described earlier. A
set break bit indicates that at least one node in the traffic path did not fully process the
Adspec, so the remainder of the information in the Adspec is considered unreliable.
Resv Messages
The Resv message contains information about the reservation style, the appropriate
Flowspec object, and the Filter Spec that identify the sender(s). The pairing of the
Flowspec and the Filter Spec is referred to as the Flow Descriptor. The Flowspec is used
to set parameters in a node’s packet-scheduling process, and the Filter Spec is used to
set parameters in the packet-classifier process. Data that does not match any of the
Filter Specs is treated as best-effort traffic.
Resv messages are sent periodically to maintain the reservation state along a particular
traffic path. This is referred to as soft state, because the reservation state is maintained
by using these periodic refresh messages.
129
the necessary parameters required to invoke the QoS service (Receiver RSpec). This
information is contained in the Flowspec (flow specification) object carried in the Resv
messages. The information contained in the Flowspec object may be modified at any
intermediate node in the traffic path because of reservation merging and other factors.
The format of the Flowspec is different depending on whether the sender is requesting
controlled-load or guaranteed service. When a receiver requests controlled-load service,
only a TSpec is contained in the Flowspec. When requesting guaranteed service, both a
TSpec and an RSpec are contained in the Flowspec object. (The RSpec element was
described earlier in relation to the guaranteed service QoS class.)
In RSVP version 1, all receivers in a particular RSVP session are required to choose the
same QoS control service. This restriction is due to the difficulty of merging reservations
that request different QoS control services and the lack of a service-replacement
mechanism. This restriction may be removed in future revisions of the RSVP
specification.
At each RSVP-capable router in the transit path, the Sender TSpecs arriving in Path
messages and the Flowspecs arriving in Resv messages are used to request the
appropriate resources from the appropriate QoS control service. State merging, message
forwarding, and error handling proceed according to the rules defined in the RSVP
specification. Also, the merged Flowspec objects arriving at each RSVP sender are
delivered to the application, informing the sender of the merged reservation request and
the properties of the data path.
The PathErr and ResvErr messages simply are sent upstream to the sender that created
the error and do not modify the path state in the nodes through which they pass. A
PathErr message indicates an error in the processing of Path messages and are sent
back to the sender. ResvErr messages indicate an error in the processing of Resv
messages and are sent to the receiver(s).
RSVP teardown messages remove path or reservation state from nodes as soon as they
are received. It is not always necessary to explicitly tear down an old reservation,
however, because the reservation eventually times out if periodic refresh messages are
not received after a certain period of time. PathTear messages are generated explicitly
by senders or by the time-out of path state in any node along the traffic path and are sent
to all receivers. An explicit PathTear message is forwarded downstream from the node
that generated it; this message deletes path state and reservation state that may rely on
it in each node in the traffic path. A ResvTear message is generated explicitly by
receivers or any node in which the reservation state has timed out and is sent to all
pertinent senders. Basically, a ResvTear message has the opposite effect of a Resv
message.
ResvConf
A ResvConf message is sent by each node in the transit path that receives a Resv
message containing a reservation confirmation object. When a receiver wants to obtain a
confirmation for its reservation request, it can include a confirmation request
(RESV_CONFIRM) object in a Resv message. A reservation request with a Flowspec
larger than any already in place for a session normally results in a ResvErr or a
130
ResvConf message being generated and sent back to the receiver. Thus, the ResvConf
message acts as an end-to-end reservation confirmation.
Merging
The concept of merging is necessary for the interaction of multicast traffic and RSVP.
Merging of RSVP reservations is required because of the method multicast uses for
delivering packets—replicating packets that must be delivered to different next-hop
nodes. At each replication point, RSVP must merge reservation requests and compute
the maximum of their Flowspecs.
Flowspecs are merged when Resv messages, each originating from different RSVP
receivers and initially traversing diverse traffic paths, converge at a merge point node
and are merged prior to being forwarded to the next RSVP node in the traffic path (Figure
7.6). The largest Flowspec from all merged Flowspecs—the one that requests the most
stringent QoS reservation state—is used to define the single merged Flowspec, which is
forwarded to the next hop node. Because Flowspecs are opaque data elements to
RSVP, the methods for comparing them are defined outside of the base RSVP
specification.
As mentioned earlier, different reservation styles cannot be merged, because they are
fundamentally incompatible.
You can find specific ordering and merging guidelines for message parameters within the
scope of the controlled-load and guaranteed service classes in [IETF1997h] and
[IETF1997i], respectively.
Non-RSVP Clouds
RSVP still can function across intermediate nodes that are not RSVP-capable. End-to-
end resource reservations cannot be made, however, because non-RSVP-capable
devices in the traffic path cannot maintain reservation or path state in response to the
appropriate RSVP messages. Although intermediate nodes that do not run RSVP cannot
provide these functions, they may have sufficient capacity to be useful in accommodating
tolerant real-time applications.
131
arrives at the next RSVP-capable node after traversing an arbitrary non-RSVP cloud, it
carries with it the IP address of the previous RSVP-capable node. Therefore, the Resv
message then can be forwarded directly back to the next RSVP-capable node in the
path.
Although RSVP functions in this manner, its use may severely distort the QoS request by
the receiver.
Low-Speed Links
Low-speed links, such as analog telephone lines, ISDN connections, and sub-T1 rate
lines present unique problems with regard to providing QoS, especially when multiple
flows are present. It is problematic for a user to receive consistent performance, for
example, when different applications are active at the same time, such as a Web
browser, an FTP (File Transfer Protocol) transfer, and a streaming-audio application.
Although the Integrated Services model is designed implicitly for situations in which some
network traffic can be treated preferentially, it does not provide tailored service for low-
speed links such as those described earlier.
At least one proposal has been submitted to the IETF’s Integrated Services over Specific
Link Layers (ISSLL) working group [ID1997j] that proposes the combination of enhanced,
compressed, real-time transport protocol encapsulation; optimized header compression;
and extensions to the PPP (Point-to-Point Protocol) to permit fragmentation and a method
to suspend the transfer of large packets in favor of packets belonging to flows that require
QoS services. The interaction of this proposal and the IETF Integrated Services model is
outlined in [ID1997k].
132
Integrated Services and RSVP-over-ATM
The issues discussed in this section are based on information from a collection of
documents currently being drafted in the IETF (each is referenced individually in this
section). This discussion is based on the ATM Forum Traffic Management Specification
Version 4.0 [AF1996a].
A functional disconnect exists between IP and ATM services, the least of which are their
principal modes of operation: IP is non-connection oriented and ATM is connection
oriented. An obvious contrast exists in how each delivers traffic. IP is a best-effort
delivery service, whereas ATM has underlying technical mechanisms to provide
differentiated levels of QoS for traffic on virtual connections. ATM uses point-to-point and
point-to-multipoint VCs. Point-to-multipoint VCs allow nodes to be added and removed
from VCs, providing a mechanism for supporting IP multicast.
Although several models exist for running IP-over-ATM networks [IETF1996a], any one
of these methods will function as long as RSVP control messages (IP protocol 46) and
data packets follow the same data path through the network. The RSVP Path messages
must follow the same path as data traffic so that path state may be installed and
maintained along the appropriate traffic path. With ATM, this means that the ingress and
egress points in the network must be the same in both directions (remember that RSVP
is only unidirectional) for RSVP control messages and data.
Background
The technical specifications for running “Classical” IP-over-ATM is detailed in
[IETF1994c]. It is based on the concept of an LIS (Logical IP Subnetwork), where hosts
within an LIS communicate via the ATM network, and communication with hosts that
reside outside the LIS must be through an intermediate router. Classical IP-over-ATM
also provides a method for resolving IP host addresses to native ATM addresses called
an ATM ARP (Address Resolution Protocol) server. The ATM Forum provides similar
methods for supporting IP-over-ATM in its MPOA (Multi-Protocol Over ATM) [AF1997a]
and LANE (LAN Emulation) [AF1995b] specifications. By the same token, IP multicast
traffic and ATM interaction can be accommodated by a Multicast Address Resolution
Server (MARS) [IETF1996b].
The technical specifications for LANE , Classical IP, and NHRP (Next Hop Resolution
Protocol) [ID1997p] discuss methods of mapping best-effort IP traffic onto ATM SVCs
(Switched Virtual Connections). However, when QoS requirements are introduced, the
mapping of IP traffic becomes somewhat complex. Therefore, the industry recognizes
that ongoing examination and research is necessary to provide for the complete
integration of RSVP and ATM.
Using RSVP over ATM PVCs is rather straightforward. ATM PVCs emulate dedicated
point-to-point circuits in a network, so the operation of RSVP is no different than when
implemented on any point-to-point network model using leased lines. The QoS of the
PVCs, however, must be consistent with the Integrated Services classes being
implemented to ensure that RSVP reservations are handled appropriately in the ATM
network. Therefore, there is no apparent reason why RSVP cannot be successfully
implemented in an ATM network today that solely uses PVCs.
133
Using SVCs in the ATM network is more problematic. The complexity, cost, and
efficiency to set up SVCs can impact their benefit when used in conjunction with RSVP.
Additionally, scaling issues can be introduced when a single VC is used for each RSVP
flow. The number of VCs in any ATM network is limited. Therefore, the number of RSVP
flows that can be accommodated by any one device is limited strictly to the number of
VCs available to a device.
ATM point-to-multipoint VCs provide an adequate mechanism for dealing with multicast
traffic. With the introduction of ATM Forum 4.0, a new concept has been introduced
called Leaf Initiated Join (LIJ), which allows an ATM end-system to join an existing point-
to-multipoint VC without necessarily contacting the source of the VC. This reduces the
resource burden on the ATM source as far as setting up new branches, and it more
closely resembles the receiver-based model of RSVP and IP multicast. However, several
scaling issues still exist, and new branches added to an existing point-to-multipoint VC
will end up using the existing QoS parameters as the existing branches, posing yet
another problem. Therefore, a method must be defined to provide better handling of
heterogeneous RSVP and multicast receivers with ATM SVCs.
By the same token, a major difference exists in how ATM and RSVP QoS negotiation is
accomplished. ATM is sender oriented and RSVP is receiver oriented. At first glance, this
might appear to be a major discrepancy. However, RSVP receivers actually determine
the QoS required by the parameters included in the sender’s TSpec, which is included in
received Path messages. Therefore, whereas the resources in the network are reserved
in response to receiver-generated Resv messages, the resource reservations actually
are initiated by the sender. This means that senders will establish ATM QoS VCs and
receivers must accept incoming ATM QoS VCs. This is consistent with how RSVP
operates and allows senders to use different RSVP flow-to-VC mappings for initiating
RSVP sessions.
134
Figure 7.7: ATM edge functions [ID1997o].
As depicted in Figure 7.8, the mapping of the Integrated Services QoS classes to ATM
service categories would appear to be straightforward. The ATM CBR and rt-VBR service
categories possess characteristics that make them prospective candidates for
guaranteed service, whereas the nrt-VBR and ABR (albeit with an MCR) service
categories provide characteristics that are the most compatible with the controlled-load
service. Best-effort traffic fits well into the UBR service class.
The practice of tagging nonconformant cells with CLP = 1, which designates cells as
lower priority, can have a special use with ATM and RSVP. As outlined previously, you
can determine whether cells are tagged as conformant by using a GCRA (Generic Cell
Rate Algorithm) leaky-bucket algorithm. Also recall that traffic in excess of controlled-load
or guaranteed service specifications must be transported as best-effort traffic. Therefore,
the practice of dropping cells with the CLP bit set should be exercised with excess
guaranteed service or controlled-load traffic. Of course, this is an additional nuance you
should consider in an ATM/RSVP interworking implementation.
Several ATM QoS parameters exist for which there are no IP layer equivalents.
Therefore, these parameters must be configured manually as a matter of local policy.
Among these parameters are CLR (Cell Loss Ratio), CDV (Cell Delay Variation), SECBR
(Severely Errored Cell Block Ratio), and CTD (Cell Transfer Delay).
The following section briefly outlines the mapping of the Integrated Services classes to
ATM service categories. This is discussed in much more detail in [ID1997o].
135
Therefore, rt-VBR is the most appropriate ATM service category for guaranteed service
traffic because of its inherent adaptive characteristics. The selection of the rt-VBR
service category, however, requires two specified rates to be quantified: the SCR
(Sustained Cell Rate) and PCR (Peak Cell Rate). These two parameters provide a burst
and average tolerance profile for traffic with bursty characteristics. The rt-VBR service
also should specify a low enough CLR for guaranteed service traffic so that cell loss is
avoided as much as possible.
When mapping guaranteed service onto an rt-VBR VC, [ID1997o] suggests that the ATM
traffic descriptor values for PCR, SCR, and MBS (Maximum Burst Size) should be set
within the following bounds:
where R = RSpec
p = peak rate
r = Receiver TSpec
b = bucket depth
In other words, the RSpec should be less than or equal to the PCR, which in turn should
be less than or equal to the minimum peak rate, or alternatively, the minimum line rate.
The Receiver TSpec (assumed here to be identical to the Sender TSpec) should be less
than or equal to the SCR, which in turn should be less than or equal to the PCR. The
MBS should be greater than or equal to zero (generally, greater than zero, of course) but
less than or equal to the leaky-bucket depth defined for traffic shaping.
The ABR service category best aligns with the model for controlled-load service, which is
characterized as being somewhere between best-effort and a requiring service
guarantees. Therefore, if the ABR service class is used for controlled-load traffic, it
requires that an MCR (Minimum Cell Rate) be specified to provide a lower bound for the
data rate. The TSpec rate should be used to determine the MCR.
The nrt-VBR service category also can be used for controlled-load traffic. However, the
maxCTD (Maximum Cell Transfer Delay) and CDV (Cell Delay Variation) parameters
must be chosen for the edge ATM device and is done manually as a matter of policy.
When mapping controlled-load service onto an nrt-VBR VC, [ID1997o] suggests that the
ATM traffic descriptor values for PCR and MBS should be set within the following
bounds:
136
r = Receiver TSpec
b = bucket depth
It has been suggested that RSVP heterogeneity can be supported over ATM by mapping
RSVP reservations onto ATM VCs by using one of four methods proposed in [ID1997m].
In the full heterogeneity model, a separate VC is provided for each distinct multicast QoS
level requested, including requests for best-effort traffic. In the limited heterogeneity
model, each ATM device participating in an RSVP session would require two VCs: one
point-to-multipoint VC for best-effort traffic and one point-to-multipoint QoS VC for RSVP
reservations. Both these approaches require what could be considered inefficient
quantities of network resources. The full heterogeneity model can provide users with the
QoS they require but makes the most inefficient use of available network resources. The
limited heterogeneity model requires substantially less network resources. However, it is
still somewhat inefficient, because packets must be duplicated at the network layer and
sent on two VCs.
The fourth model, the aggregation model, proposes that a single, large point-to-ultipoint
VC be used for multiple RSVP reservations. This model is attractive for a number of
reasons, primarily because it solves the inefficiency problems associated with full
heterogeneity, because concerns about induced latency imposed by setting up an
individual VC for each flow are negated. The primary problem with the aggregation model
is that it may be difficult to determine the maximum QoS for the aggregate VC.
The term variegated VCs [ID1997m] has been created to describe point-to-multipoint
VCs that allow a different QoS on each branch. However, cell-drop mechanisms require
137
further research to retain the best-effort delivery characterization for nonconformant
packets that traverse certain branch topologies. Implementations of Early Packet Discard
(EPD) should be deployed in these situations so that all cells belonging to the same
packet can be discarded—instead of discarding only a few arbitrary cells from several
packets, making them useless to their receivers.
The concept of a subnetwork Bandwidth Manager first was described in [ID1997q] and
provides a mechanism to accomplish several things on a LAN subnet that otherwise
would be unavailable. Among these are admission control, traffic policing, flow
segregation, packet scheduling, and the capability to reserve resources (to include
maintaining soft state) on the subnet.
138
Figure 7.9: Conceptual operation of the Bandwith Manager [ID1997q].
When a DSBM client sends or forwards a Path message over an interface attached to a
managed segment, it sends the message to its DSBM instead of to the RSVP session
destination address, as is done in conventional RSVP message processing. After
processing and possibly updating the Adspec, the DSBM forwards the Path message to
its destination address. As part of its processing, the DSBM builds and maintains a Path
state for the session and notes the Previous Hop (PHOP) of the node that sent the
message. When a DSBM client wants to make a reservation for an RSVP session, it
follows the standard RSVP essage-processing rules and sends an RSVP Resv message
to the corresponding PHOP address specified in a received Path message. The DSBM
processes received Resv messages based on the bandwidth available and return
ResvErr messages to the requester if the request cannot be granted. If sufficient
resources are available, and the reservation request is granted, the DSBM forwards the
Resv message to the PHOP based on the local Path state for the session. The DSBM
also merges and orders reservation requests in accordance with traditional RSVP
message-processing rules.
In the example in Figure 7.10, an “intelligent” LAN switch is designated as the DSBM for
the managed segment. The “intelligence” is only abstract, because all that is required is
that the switch implement the SBM mechanics as defined in [ID1997r]. As the DSBM
client, host A, sends a Path message upstream, it is forwarded to the DSBM, which in
turn forwards it to its destination session address, which lies outside the link-layer
domain of the managed segment. In Figure 7.11, the Resv message processing occurs
in exactly the same order, following the Path state constructed by the previously
processed Path message.
139
Figure 7.10: SBM-managed LAN segment: forwarding of Path messages.
The addition of the DSBM for admission control for managed segments results in some
additions to the RSVP message-processing rules at a DSBM client. In cases in which a
DSBM needs to forward a Path message to an egress router for further processing, the
DSBM may not have the Layer 3 routing information available to make the necessary
forwarding decision when multiple egress routers exist on the same segment. Therefore,
new RSVP objects have been proposed, called LAN_NHOP (LAN Next Hop) objects,
which keep track of the Layer 3 hop as the Path message traverses a Layer 2 domain
between two Layer 3 devices.
When a DSBM client sends out a Path message to its DSBM, it must include
LAN_NHOP information in the message. In the case of unicast traffic, the LAN_HOP
address indicates the destination address or the IP address of the next hop router in the
path to the destination. As a result, when a DSBM receives a Path message, it can look
at the address specified in the LAN_NHOP object and appropriately forward the
message to the appropriate egress router. However, because the link-layer devices (LAN
switches) must act as DSBMs, the level of “intelligence” of these devices may not include
an ARP capability that enables it to resolve MAC (Media Access Control) addresses to IP
addresses. For this reason, [ID1997r] requires that LAN_HOP information contain both
the IP address (LAN_NHOP_L3) and corresponding MAC address (LAN_NHOP_L2) for
the next Layer 3 device.
Because the DSBM may not be able to resolve IP addresses to MAC addresses, a
mechanism is needed to dispense with this translation requirement when processing
Resv messages. Therefore, the RSVP_HOP_L2 object is used to indicate the Layer 2
MAC address of the previous hop. This provides a mechanism for SBM-capable devices
to maintain the Path state necessary to accommodate forwarding Resv messages along
140
link-layer paths that cannot provide IP-address-to-MAC-address resolution.
There is at least one additional proposed new RSVP object, called the TCLASS (Traffic
Class) object. The TCLASS object is used with IEEE 802.1p user-priority values, which
can be used by Layer 2 devices to discriminate traffic based on these priority values.
These values are discussed in more detail in the following section. The priority value
assigned to each packet is carried in the new extended frame format defined by IEEE
802.1Q [IEEE-5]. As an SBM Layer 2 switch, which also functions as an 802.1p device,
receives a Path message, it inserts a TCLASS object. When a Layer 3 device (a router)
receives Path messages, it retrieves and stores the TCLASS object as part of the
process of building Path state for the session. When the same Layer 3 device needs to
forward a Resv message back toward the sender, it must include the TCLASS object in
the Resv message.
The Integrated Services model is implemented via an SBM client in the sender, as
depicted in Figure 7.12, and in the receiver, as depicted in Figure 7.13.
Figure 7.14 shows the SBM implementation in a LAN switch. The components of this
model are defined in the following summary [ID1997s]:
141
Figure 7.14: SBM in a LAN switch [ID1997s].
Local admission control. One local admission control module on each switch port
manages available bandwidth on the link attached to that port. For half-duplex links,
this involves accounting for resources allocated to both transmit and receive flows.
Input SBM module. One instance per port. This module performs the network
portion of the client-network peering relationship. This module also contains
information about the mapping of Integrated Service classes to IEEE 802.1p
user_priority, if applicable.
SBM propagation. Relays requests that have passed admission control at the input
port to the relevant output port’s SBM module(s). As indicated in Figure 7.14, this
requires access to the switch’s forwarding table and port spanning-tree states.
Output SBM module. Forwards messages to the next Layer 2 or Layer 3 network
hop.
Classifier, Queuing, and scheduler. The classifier function identifies the relevant
QoS information from incoming packets and uses this, with information contained in
the normal bridge forwarding database, to determine which queue of the appropriate
output port to direct the packet for transmission. The queuing and scheduling
functions manage the output queues and provide the algorithmic calculation for
servicing the queues to provide the promised service (controlled-load or guaranteed
service).
Ingress traffic class mapper and policing. This optional module may check on
whether the data in the traffic classes conforms to specified behavior. The switch
may police this traffic and remap to another class or discard the traffic altogether. The
default behavior should be to allow traffic through unmodified.
Egress traffic class mapper. This optional module may apply remapping of traffic
classes on a per-output port basis. The default behavior should be to allow traffic
through unmodified.
142
media types using an extended frame format. Of course, this also implies that 802.1p-
compliant hardware may have to be deployed to fully realize these capabilities.
The current 802.1p draft defines the user_priority field as a 3-bit value, resulting in a
variable range of values between 0 and 7, with 7 indicating high priority and 0 indicating
low priority. The IEEE 802 specifications do not make any suggestions on how the
user_priority should be used by the end-system by network elements. It only suggests
that packets may be queued by LAN devices based on their user_priority values.
A proposal submitted recently in the Integrated Services over Specific Link Layers
(ISSLL) working group of the IETF provides a suggestion on how to use the IEEE 802.1p
user_priority value to an Integrated Services class [ID1997s]. Because no practical
experience exists for mapping these parameters, the suggestions are somewhat arbitrary
and provide only a framework for further study. As shown in Table 7.1, two of the
user_priority values provide separate classifications for guaranteed services traffic with
different delay requirements. The less-than-best-effort category could be used by devices
that tag packets that are in nonconformance of a traffic commitment and may be dropped
elsewhere in the network.
1 Best effort
2 Reserved
3 Reserved
4 Controlled load
7 Reserved
Because no explicit traffic class or user_priority field exists in Ethernet 802.3 [IEEE-3]
packets, the user_priority value must be regenerated at a downstream node or LAN
switch by some predefined default criteria, or by looking further into the higher-layer
protocol fields in the packet and matching some parameters to a predefined criteria.
Another option is to use the IEEE 802.1Q encapsulation proposal [IEEE-5] tailored for
VLANs (Virtual Local Area Networks), which may be used to provide an explicit traffic
class field on top of the basic MAC format.
The token-ring standard [IEEE-4] does provide a priority mechanism, however, that can
be used to control the queuing of packets and access to the shared media. This priority
mechanism is implemented using bits from the Access Control (AC) and Frame Control
(FC) fields of an LLC frame. The first three bits (the token priority bits) and the last three
bits (the reservation bits) of the AC field dictate which stations get access to the ring. A
143
token-ring station theoretically is capable of separating traffic belonging to each of the
eight levels of requested priority and transmitting frames in the order of indicated priority.
The last three bits of the FC field (the user priority bits) are obtained from the higher layer
in the user_priority parameter when it requests transmission of a packet. This parameter
also establishes the access priority used by the MAC. This value usually is preserved as
the frame is passed through token-ring bridges; thus, the user_priority can be transported
end-to-end unmolested.
144
Observations
It has been suggested that the Integrated Services architecture and RSVP are
excessively complex and possess poor scaling properties. This suggestion is
undoubtedly prompted by the existence of the underlying complexity of the signaling
requirements. However, it also can be suggested that RSVP is no more complex than
some of the more advanced routing protocols, such as BGP (Border Gateway Protocol).
An alternative viewpoint might suggest that the underlying complexity is required
because of the inherent difficulty in establishing and maintaining path and reservation
state information along the transit path of data traffic. The suggestion that RSVP has
poor scaling properties deserves additional examination, however, because deployment
of RSVP has not been widespread enough to determine the scope of this assumption.
As discussed in [IETF1997k], there are several areas of concern about the wide-scale
deployment of RSVP. With regard to concerns of RSVP scaleability, the resource
requirements (computational processing and memory consumption) for running RSVP on
routers increase in direct proportion to the number of separate RSVP reservations, or
sessions, accommodated. Therefore, supporting a large number of RSVP reservations
could introduce a significant negative impact on router performance. By the same token,
router-forwarding performance may be impacted adversely by the packet-classification
and scheduling mechanisms intended to provide differentiated services for reserved
flows. These scaling concerns tend to suggest that organizations with large, high-speed
networks will be reluctant to deploy RSVP in the foreseeable future, at least until these
concerns are addressed. The underlying implications of this concern also suggest that
without deployment by Internet service providers, who own and maintain the high-speed
backbone networks in the Internet, the deployment of pervasive RSVP services in the
Internet will not be forthcoming.
At least one interesting proposal has been submitted to the IETF [ID1997l] that suggests
a rationale for grouping similar guaranteed service flows to reduce the bandwidth
requirements that guaranteed service flows might consume individually. This proposal
does not suggest an explicit implementation method to provide this grouping but instead
provides the reasoning for identifying identical guaranteed service flows in an effort to
group them. The proposal does suggest offhand that some sort of tunneling mechanism
could be used to transport flow groups from one intermediate node to another, which
could conceivably reduce the amount of bandwidth required in the nodes through which a
flow group tunnel passes. Although it is well intentioned, the obvious flaw in this proposal
is that it only partially addresses the scaling problems introduced by the Integrated
Services model. The flow group still must be policed at each intermediate node to
provide traffic-conformance monitoring, and path and reservation state still must be
maintained at each intermediate node for individual flows. This proposal is in the
embryonic stages of review and investigation, however, and it is unknown at this time
how plausible the proposal might be in practice.
Another important concern expressed in [IETF1997k] deals with policy-control issues and
RSVP. Policy control addresses the issue of who is authorized to make reservations and
provisions to support access control and accounting. Although the current RSVP
specification defines a mechanism for transporting policy information, it does not define
the policies themselves, because the policy object is treated as an opaque element.
Some vendors have indicated that they will use this policy object to provide proprietary
mechanisms for policy control. At the time of this writing, the IETF RSVP working group
has been chartered to develop a simple policy-control mechanism to be used in
conjunction with RSVP. There is ongoing work on this issue in the IETF. Several
mechanisms already have been proposed to deal with policy issues [ID1997h, ID1997I,
145
ID1997v], in addition to the aforementioned vendor-proprietary policy-control
mechanisms. It is unclear at this time, however, whether any of these proposals will be
implemented or adopted as a standard.
The key recommendation contained in [IETF1997k] is that given the current form of the
RSVP specification, multimedia applications run within smaller, private networks are the
most likely to benefit from the deployment of RSVP. The inadequacies of RSVP scaling and
lack of policy control may be more manageable within the confines of a smaller, more
controlled network environment than in the expanse of the global Internet. It certainly is
possible that RSVP may provide genuine value and find legitimate deployment uses in
smaller networks, both in the peripheral Internet networks and in the private arena, where
these issues of scale are far less important. Therein lies the key to successfully delivering
quality of service using RSVP. After all, the purpose of the Integrated Services architecture
and RSVP is to provide a method to offer quality of service, not to degrade the service
quality.
146
Chapter 8: QoS and Dial Access
Overview
This chapter deals with the delivery of Quality of Service (QoS) mechanisms at the
demand dial-access level. Some would argue that discernible quality of any IP service is
impossible to obtain when using a telephone modem. Most of the millions of people who
use the Internet daily are visible only at the other end of a modem connection, however,
and QoS and dial access is a very real user requirement.
So what environment are we talking about here? Typically, this issue concerns dial-in
connections to an underlying switched network, commonly a Public Switched Telephone
Network (PSTN) or an Integrated Services Digital Network (ISDN), that is used as the
access mechanism to an Internet Service Provider (ISP). The end-user’s system is
connected by a dynamically activated circuit to the provider’s Internet network.
There are a number of variations on this theme, as indicated in Figure 8.1. The end-user
environment may be a single system (Panel A), or it may be a local network with a
gateway that controls the circuit dynamically (Panel B). The dynamics of the connectivity
may be a modem call made across the telephone network or an ISDN data call made
across an ISDN network. This is fast becoming a relatively cosmetic difference;
integration of ISDN and analog call-answer services finally has reached the stage where
the service provider can configure a single unit to answer both incoming ISDN and
analog calls, so no significant differences exist between analog modem banks and ISDN
Primary Rate Interface (PRI) systems. Finally, the logical connection between the user’s
environment and the ISP may be layered directly on the access connection. Alternatively,
the connectivity may use IP tunneling to a remote ISP so that the access service and
Internet service are operated by distinct entities in different locations on the network
(Panel C).
147
Answering the Dial-Access Call
Answering a call in the traditional telephony world does not appear to be a particularly
challenging problem. When the phone rings, you pick up the handset and answer the
call. Of course, it is possible to replicate this simple one-to-one model for dial access. If
the ISP is willing to provide dedicated access equipment for the exclusive use of each
subscriber, whenever the client creates a connection to the service, the dedicated
equipment answers the call and completes the connection.
Of course, when the client is not connected, the equipment remains idle. It is this state of
equipment idleness, however, that makes this exclusive provisioning strategy a high-cost
option for the service provider and a relatively expensive service for the subscriber.
Competitive market pressures in the service-provider arena typically dictate a different
approach to the provisioning of access ports to achieve a more efficient balance between
operating cost and service-access levels.
Dial-Access Pools
A refinement of this exclusivity model is sharing a pool of access ports among a set of
clients. As long as an adequate balance exists among the number of clients, the number
of access ports, the average connection period, and times when the connection is
established, the probability of a client not being able to make a connection because of
exhaustion of the pool of access ports can be managed to acceptably low levels. The
per-client service costs are reduced, because the cost of operation of each access port
can be defrayed over more than one client, and the resulting service business model can
sustain lower subscription fees. Accordingly, such a refinement of the access model has
a direct impact on the competitive positioning of the service provider.
Unlike a more traditional telephony model of service economics, where the capital
cost of infrastructure can be deferred over many years of operation, the Internet
has a more aggressive model of capital investment with operational lifetimes of less
than 18 months, particularly in the rapidly moving dial-access technology market.
Accordingly, the considerations of the efficiency of utilization of the equipment plant
are a major factor in the overall cost structure for access service operators.
Typically, in such environments, a pool of access ports is provided for a user community,
and each user dials a common access number. The access call is routed to the next
available service port on the ISP’s equipment (using a rotary access number), the call is
answered, and the initial authentication phase of the connection is completed (Figure
8.2).
148
Figure 8.2: A dial-access block diagram.
Such full servicing during peak periods is not always possible when using the shared-
pool approach. Although it may be possible to continually provision a system to meet all
peak demands within the available pool, this is an unnecessarily expensive mode of
operation. More typically, service providers attempt to provision the pool size so that
there are short periods in the week when the client may receive a busy signal when
attempting to access the pool. This signals that the access port pool is exhausted and
the subscriber simply should redial.
149
Queuing Analysis of Dial-Access Pools
Management of the dial-access pool form of congestion control is well understood, and
the discipline of queuing theory is a precise match to this problem. Using queuing theory
as a tool, it is possible to model the client behavior and dimension a pool of access ports
so that the delay to access an available port is bounded by time, and the delay can be
selected by dimensioning the number of individual access servers.
Such engineering techniques are intended to bound the overall quality of the access
service for all clients. The periods during which the access bank returns a busy signal
can be determined statistically, and the length of time before a client can establish a
connection during such periods also can be statistically determined from a queuing
model and can be observed by monitoring the occupancy level of the modem bank to a
sufficiently fine level of granularity. You should note that the queuing theory will predict
an exponential probability distribution of access-server saturation if the call-arrival rate
and call-duration intervals are both distributed in a Markovian distribution model. You can
find a complete treatment of queuing theory in [Kleinrock1976], and an analysis of a
connection environment is available in [Tannenbaum1988].
Tip A.A. Markov published a paper in 1907 in which he defined and examined
various behaviors and properties that create interdependencies and
relationships among certain random variables, forming a stochastic process.
These processes are now known as Markovian processes. More detailed
information on Markovian theory can be found in Queuing Systems, Volume I:
Theory, by Leonard Kleinrock, and published by John Wiley & Sons (1975).
If you are interested in queueing theory, its practice, and its application, Myron
Hylkka’s Queueing Theory Page at www2.uwindsor.ca/~hlynka/queue.html
contains a wealth of information on these topics.
The probability that an access-port configuration with n ports will saturate and an
incoming call will fail can be expressed by this relationship:
Where the traffic intensity, r, is a function of the call rate λ and the service interval μ:
An analysis of this model, putting in values for λ of 0.6 and μ of 0.59, and assuming that
each call will be held for a minimum of 5 units, appears in Figure 8.4.
150
Figure 8.4: Queuing theory model of modern-pool availability.
The conclusion from this very brief theoretical presentation is that in sufficiently large
client population sets and access pools, the behavior of the pool can be modeled quite
accurately by using queuing theory. This theoretical analysis yields a threshold function
of pool provisioning—there is a point at which the law of diminishing returns takes over
and the addition of another access port to the pool makes a very minor change to the
probability of denial of service at the busy period. The consequent observation is that the
“last mile” of provisioning an access pool that attempts to eliminate denial of service
access is the most expensive component of the access rack, because the utilization
rates for such access ports drop off sharply after the provisioning threshold is reached.
How can you introduce a scheme that prioritizes the allocation of access ports so that the
contention can be reduced for a subclass of the serviced client base? Does the
opportunity exist to differentiate clients, allowing preemptive allocation of access ports to
meet the requirements of a differentiated section of the client pool?
The current state of the art as far as dial access is concerned ends at this point. It is
simply the case that an integration of the two parts of dial access—the call-switching
environment and the Internet-access environment—does not exist to the extent
necessary to implement such QoS structures at the access level. However, such
observations should not prevent you from examining what you need to implement
differentiated classes of access service and the resulting efficiencies in the operation of
any such scheme.
151
Chapter 9: QoS and Future Possibilities
Overview
There are, of course, several ongoing avenues of research and development on
technologies that may provide methods to deliver Quality of Service in the future. Current
methods of delivering QoS are wholly at the mercy of the underlying routing system,
whereas QoS Routing (QoSR) technologies offer the possibility of integrating QoS
distinctions into the network layer routing infrastructure. The second category is MPLS
(Multi Protocol Label Switching), which has been an ongoing effort in the IETF (Internet
Engineering Task Force) to develop an integrated Layer 2/Layer 3 forwarding paradigm.
Another category examined is a collection of proposed RSVP extensions that would allow
an integration of RSVP and the routing system. This chapter also examines the potential of
introducing QoS mechanisms into the IP multicast structure. Additionally, you’ll look at the
possibilities proposed by the adoption of congestion pricing structures and the potential
implementation issues of end-to-end QoS within the global Internet. And finally, you’ll look
briefly at the IPv6 protocol, which contains a couple of interesting possibilities of delivering
QoS capabilities in the future.
152
Managing Call-Hunt Groups
The basic access model makes very few assumptions about the switching mechanisms
of the underlying access network substrate. If a free port of a compatible type exists
within the access bank, the port is assigned to the incoming call, and the call is
answered. For an analog modem configuration, this is a call-hunt group, in which a set of
service numbers is mapped by the PSTN switch to a primary hunt-group number. When
an incoming call to the hunt-group number is detected, the switch maps the next
available service number to the call, and the call is routed to the corresponding modem.
For ISDN, this is part of the functionality of Primary Rate Access (PRA), where incoming
switched calls made to the PRA address are passed to the next available B-channel
processor.
The problem of modem faults causing call blocking can be addressed with a simple
change to the behavior of the hunt-group switch, which searches for a free port at the
next port in sequence from where the last search terminated. If a client is connected to a
faulty port, hanging up the connection and redialing the hunt-group number causes the
hunt-group switch to commence the search for a free port at the modem following in
sequence from the faulty port. Although this is an effective workaround, it is still far from
ideal. The missing piece of the puzzle here is a feedback signal from the modem to the
call-hunt group, so that the modem manager agent can poll all modems for operational
integrity and pull out of the call-hunt group any modems that fail such an integrity self-
test.
The second problem, the lack of preemption of service, is not so readily handled by the
access equipment acting in isolation. The mechanism to allow such preemptive access
lies in a tighter integration of the call-management environment and the access
equipment. One potential model uses two hunt-group numbers: a basic and a premium
access number overlaid on a single access port bank. If the basic access number is
called, ports are allocated and incoming calls are answered up to a high-water mark
allocation point of the total access-port pool. All subsequent access calls to this basic
access number are not answered until the total amount of the busy port numbers falls
below a low-water allocation mark. All calls to a premium access service can be allocated
on any available port.
In this way, the pool of ports available to the differentiated premium access service is the
total number of available ports reserving a pool (equal to the total pool size minus the
high-water mark point) for the differentiated access service, as shown in Figure 8.5. With
a differential number of effective servers for the two populations, where the smaller
population is accessing (in effect) a larger pool of access servers, queuing analysis of the
system results in significantly shorter busy wait times for the elevated priority client
group. To implement preemptive access, you must pass the information related to the
incoming call (caller’s number and number called) to the access equipment to allow the
153
access equipment to apply configuration rules to the two numbers to determine whether
to accept or block the call. Recording the called number for the session-accounting
access record allows the service provider to bill the client on a differential fee structure
for the higher-access service class.
Further refinements are possible if the signaling between the call switch and the access
server is sufficiently robust enough to allow passback of the call. In such an environment,
the switch can implement overflow to other access groups in the event of saturation of
the local access port, by the access server sending a local congestion signal back to the
switch, which allows the call to be rerouted to an access server pool with available ports.
From the perspective of the client, there are two levels of service access: a basic rate-
access number, which provides a high probability of access in off-peak periods but with
reduced service levels during peak-use periods, and a premium-access number, which
provides high availability at all times, with presumably a premium fee to match.
The intended result of each of these schemes is to provide a differentiated service. This
service would provide mechanisms that would allow the service operator to increase the
average equipment utilization rate of the access equipment. This in turn would offer a
number of higher-quality, differentiated access services and these, presumably, would also
be differentiated according to subscriber-fee structure and business model.
154
Differentiated Access Services at the Network Layer
After the access call is assigned to an access port and data-link communications is
negotiated, the next step is to create an IP connection environment for the call. The most
critical component of this step is to authenticate the identity of the client in a secure
fashion. Several technologies have been developed to support this phase of creating a
remote-access session, most notably TACACS (and its subsequent refinements)
[IETF1993a], Kerberos [IETF1993b], CHAP [IETF1996d], PAP [IETF1993c], and
RADIUS [IETF1997d]. For the purpose of examining QoS structures in remote access,
the most interesting of these approaches is RADIUS, which not only can undertake the
authentication of the user’s identity, but also can upload a custom profile to the access
port based on the user’s identity. This second component allows the access server to
configure itself to deliver the appropriate service to the user for the duration of the user’s
access session.
Figure 8.6 shows the precise format of the RADIUS Access-Accept packet. The packet
has identification headers and a list of attribute values encoded in standard TLV (Type,
Length, Value) triplets. The RADIUS specification [IETF1997d] specifies attribute types 1
through 63, noting that the remainder of the types are available for future use and
reserved for experimentation and implementation-specific use.
155
RADIUS allows the network service provider to determine a service profile specific to the
user for the duration of the access session. The Access-Accept packet allows the
RADIUS server to return a number of attribute-value pairs; the precise number and
interpretation is left to the RADIUS access server and client to negotiate. The implicit
assumption is that default values for these attributes are configured into the NAS, and
the RADIUS Access-Accept message has the capability to set values for attributes that
are not defaulted.
Within the defined attribute types, the attributes that can be used to provide QoS
structure are described in the following sections.
Session-Timeout
The Session-Timeout attribute defines the maximum session duration before the NAS
initiates session termination. Within a QoS environment, service differentiation can
include the capability to set longer (or unbounded) session times.
Idle-Timeout
The Idle-Timeout attribute sets the maximum idle period before the NAS initiates session
termination. Again, within a QoS environment, this attribute can vary according to the
service class.
Port-Limit
The RADIUS specification also allows extensions to the RADIUS attribute set that are
implementation specific or experimental values. Interestingly enough, there is no further
negotiation between the RADIUS server and client, so if an attribute value is specified
that is unknown to the NAS, no mechanism exists for informing the server or the client
that the attribute has not been set to the specified value.
The first possibility is to set the IP precedence field bit settings of all IP packets
156
associated with the access session to a specified value as they enter the network. This
could allow QoS mechanisms residing in the network to provide a level of service with
which the settings correlate. As noted previously, such settings may be used to activate
precedence-queuing operations, weighting of RED (Random Early Detection)
congestion-avoidance mechanisms, and similar service-differentiation functions.
Obviously, such a QoS profile can be set as an action that is marked on all user packets
as they enter the network. An alternative interpretation of such an attribute is to use the
field settings as a mask that defines the maximum QoS profile the user can request,
allowing the user’s application to request particular QoS profiles on a user-determined
basis.
The immediately obvious application of QoS profiles is on ingress traffic to the network.
Of possibly higher importance to QoS is the setting of relative levels of precedence to
egress traffic flows as they are directed to the dial-up circuit. For a dial-connected client
making a number of simultaneous requests to remote servers, the critical common point
of flow-control management of these flows is the output queue of the NAS driving the
dial-access circuit to the user.
A 1500-byte packet on a 28.8 Kbps modem link, for example, makes the link unavailable
for 400 ms. Any other packets from other flows arriving for delivery during this interval
must be queued at the output interface at the NAS. If the remote server is well connected
to the network (well connected in terms of high availability of bandwidth between the
NAS and the server at a level that exceeds the bandwidth of the dial-access circuit), as
the server attempts to open up the data-transfer rate to the user’s client, the bandwidth
rate attempts to exceed the dial-access circuit rate, causing the NAS output queue to
build. Continued pressure on the access bandwidth, brought about by multiple
simultaneous transfers attempting to undertake the same rate increase, leads to queue
saturation and nonselective tail-drop discard from the output queue. If QoS mechanisms
are used by the server and supported by the NAS, this behavior can be modified by the
server; it can set high precedence to effectively saturate the output circuit. However, if
QoS is a negotiated attribute in which the dial-access client also plays an active role, this
server-side effective assertion of control over the output queue still is somewhat
inadequate.
An additional mechanism to apply QoS profiles is to allow the service provider to set the
policy profile of the output circuit according to the user’s RADIUS profile. This can allow
the user profile to define a number of filters and associated access-rate settings to be set
on the NAS output queue, allowing the behavior of the output circuit to respond
consistently with the user-associated policy described in the RADIUS profile.
Accordingly, the user’s RADIUS profile can include within the filter set whether or not the
output queue management is sensitive to packets with QoS headings. Equally, the profile
can specify source addresses, source ports, relative queuing priorities, and/or committed
bandwidth rates associated with each filter.
The general feature of dial access in an environment with limited bandwidth at the
access periphery of the network with a high-bandwidth network core is that the control
mechanism for QoS structures focuses on the outbound queue feeding the access line.
Although the interior of the network may exhibit congestion events, such congestion
events are intended to be transitory (with an emphasis on intended here), while the
access line presents to the application flow a constant upper limit on the data-transfer
rate. Accordingly, it is necessary in the dial-access environment to focus QoS-
management structures to work on providing reliable control over the queue
management of the dial circuit outbound queue at the NAS.
These two potential extensions into the dial-access environment are highly capable in
157
controlling the effective relative quality of service given to various classes of traffic
passing to and from the access client. These extensions also exhibit relatively good
scaling properties, given that the packet rate of dial-access clients is relatively low, and
the subsequent per-access port-processing resources required by the NAS to implement
these structures is relatively modest. This is a good example of the general architectural
principle of pushing QoS traffic-shaping mechanisms to the edge of the network and, in
doing so, avoiding the scaling issues of attempting to impose extensive per-flow QoS
control conditions into the interior high-speed components of the network.
Layer 2 Tunnels
The model assumed in the discussion of this topic so far as that the dial-access service
and the network-access service are bundled into a single environment, and the logical
boundary between the network-access service and the user’s dial-access circuit is within
the NAS device. Within the Internet today, this model has posed a number of scaling
issues related to the provisioning of very large-scale analog or ISDN access services to
the service providers’ locations and has lead to dramatic changes in the switching load
imposed on the underlying carrier network in routing a large number of access calls from
local call areas to a single site. This model can enable you to significantly alter the
provisioning models used in the traditional PSTN environment, which in turn calls for
rapid expansion of PSTN capacity to service this new load pattern, or to examine more
efficient means to provide dial-access services within a model that is architecturally
closer to the existing PSTN model. Given that the latter approach can offer increased
scaling of the dial-access environment without the need for associated capital investment
in increased PSTN switching capacity, it is no surprise that this is being proposed as a
model to support future ubiquitous dial-access environments.
Such an approach attempts to unbundle the dial-access service and the network-access
service, allowing multiple network-access services to be associated with a single NAS.
The incoming call to one of a set of network-access service call numbers is terminated
on the closest common NAS that supports access to the called number, where the PSTN
uses intelligent PSTN network features to determine what is closest in terms of PSTN
topology and NAS access-port availability. The user-authentication phase then can use
the called number to determine the selected access service, and the NAS’s RADIUS
server can invoke, via a RADIUS proxy call, the RADIUS user-authentication process of
the called network access service. The user profile then can be specified by the remote
network access and downloaded into the local NAS. The desired functionality of this
model is one in which the network-access services can be a mixture of both competing
public Internet dial-access services and Virtual Private Network (VPN) access services,
so that the NAS must support multiple access domains associated with multiple network
access services. This is known as a Virtual Private Dial Network (VPDN).
The implementation of this functionality requires the use of Layer 2 tunneling features to
complete the access call, because the end-to-end logical connection profile being
established in this model is between the client device and the boundary of the network
dial-access service’s infrastructure. The user’s IP platform that initiates the call also must
initiate a Layer 2 peering with a virtual access port provided by the selected network-
access service for the access session to be logically coherent in terms of IP routing
structures. All IP packets from the dial-access client must be passed without alteration to
the remote network access server, and all IP packets from the remote network access
server must be passed to the NAS for delivery across the dial-access circuit to the client.
The local NAS is placed in the role of an intermediary Layer 2 repeater or bridge,
translating from the dial-access Layer 2 frames over the dial circuit to Layer 2 frames
over some form of Layer 2 tunnel. This process is illustrated in Figure 8.7. Such a
tunneling environment is described in the Layer 2 Tunneling Protocol (L2TP) [ID1997a1].
158
Figure 8.7: Layer 2 tunneling.
As a tunneling protocol, L2TP does not specify any particular transport substrate for the
tunnel, allowing the tunnel to be implemented over Frame Relay PVCs or as IP-
encapsulated tunnels, for example. The IP encapsulation mode of tunnel operation is
discussed here, with particular reference to QoS mechanisms that can be undertaken on
the tunnel itself.
The general nature of the problem is whether the IP transport level should signal per-
packet QoS actions to the underlying IP transport path elements and, if so, how the data-
link transport system can signal its dynamic capability to match the desired QoS to the
actual delivery.
As the environment of distributed dial-access services becomes more prolific, you can
anticipate further refinement of the tunneling model. It may be the case that tunnel
encapsulation would map to an RSVP-managed flow between the NAS and the remote
network access server and, instead of making the tunnel QoS transparent and relying on
best-effort QoS management in the tunnel path, the tunnel characteristics could be
specified at setup time and the clear data frames would be placed into a specific RSVP-
managed flow tunnel depending on the QoS characteristics of the frame. The use of this
type of traffic differentiation for voice, video, and non-real-time data applications is
immediately obvious. Although this mechanism does imply a proliferation of tunnels and
their associated RSVP flows across the network, the attraction of attempting to multiplex
159
traditional telephony traffic with other data streams is, in industry terms, extraordinarily
overwhelming. The major determinant of capability here is QoS structures.
One approach is to use a reduced maximum packet size on the link to bound the packet-
transmission latency. Such an approach filters back through the network to the server,
via MTU (Maximum Transmission Unit) discovery, and increases the overall network
packet-switching load. An alternative approach has been advocated to introduce
fragmentation within the PPP Multilink protocol using a multiclass extension to the PPP
Multilink protocol [ID1997a2]. This approach attempts to allow the transmission of larger
packet to be fragmented, so as to defer to a flow a packet with defined latency and jitter
characteristics. Extensions to the HDLC (High-Level Data Link Control) framing protocol
also could provide this preemption capability, although with an associated cost of byte-
level scanning and frame-context blocks.
However, there is a real issue with attempting to offer guaranteed access rates across
analog modem connections: The data-bit rate of the modem circuit is variable. First, the
carrier rate is adjusted dynamically by the modems themselves as they perform adaptive
carrier-rate detection throughout a session. Second, the extent of the compressibility of
the data causes the data-bit rate to vary across a constant carrier rate. The outcome of
this observation is that across the dial circuit, it is reasonable to expect, at most, two or
three RSVP-mediated flows to be established in a stable fashion at any time. As a
consequence, it still is a somewhat open issue whether RSVP or outbound traffic queue
management is the most effective means of managing QoS across dial-access circuits.
Like so many other parts of the QoS story on the Internet, the best that can be said today
is “watch this space.”
160
Chapter 9: QoS and Future Possibilities
Overview
There are, of course, several ongoing avenues of research and development on
technologies that may provide methods to deliver Quality of Service in the future. Current
methods of delivering QoS are wholly at the mercy of the underlying routing system,
whereas QoS Routing (QoSR) technologies offer the possibility of integrating QoS
distinctions into the network layer routing infrastructure. The second category is MPLS
(Multi Protocol Label Switching), which has been an ongoing effort in the IETF (Internet
Engineering Task Force) to develop an integrated Layer 2/Layer 3 forwarding paradigm.
Another category examined is a collection of proposed RSVP extensions that would allow
an integration of RSVP and the routing system. This chapter also examines the potential of
introducing QoS mechanisms into the IP multicast structure. Additionally, you’ll look at the
possibilities proposed by the adoption of congestion pricing structures and the potential
implementation issues of end-to-end QoS within the global Internet. And finally, you’ll look
briefly at the IPv6 protocol, which contains a couple of interesting possibilities of delivering
QoS capabilities in the future.
Tip You can find the IETF QoSR working group charter and related documents at
www.ietf.org/html.charters/qosr-charter.html.
QoSR presents several interesting problems, the least of which is determining whether
the QoS requirements of a flow can be accommodated on a particular link or along a
particular end-to-end path. Some might argue that this specific issue basically has been
reduced to a solved problem with the tools provided by the IETF Integrated Services
architecture and RSVP. However, no integral relationship exists between the routing
system and RSVP, which can be considered a shortcoming in this particular QoS
mechanism.
Because the current routing paradigm is destination-based, you can assume that any
reasonably robust QoSR mechanism must provide routing information based on both
source and destination, flow identification, and some form of flow profile. RSVP side-
steps the destination-based routing problem by making resource reservations in one
direction only. Although an RSVP host may be both a sender and receiver
simultaneously, RSVP senders and receivers are logically discrete entities. Path state is
established and maintained along a path from sender to receiver, and subsequently,
reservation state is established and maintained in the reverse direction from the receiver
to the sender. Once this tedious process is complete, data is transmitted from sender to
receiver. The efficiency concerns surrounding RSVP are compounded by its dependency
161
on the stability of the routing infrastructure, however.
When end-to-end paths change because of the underlying routing system, the path and
reservation state must be refreshed and reestablished for any given number of flows. The
result can be a horribly inefficient expenditure of time and resources. Despite this cost,
there is much to be said for this simple approach in managing the routing environment in
very large networks. Current routing architectures use a unicast routing paradigm in which
unicast traffic flows in the opposite direction of routing information. To make a path
symmetrical, the unicast routing information flow also must be symmetrical. If destination A
is announced to source B via path P, for example, destination B must be announced to
source A via the same path. Given the proliferation of asymmetric routing policies within the
global Internet, asymmetric paths are a common feature of long-haul paths. Accordingly,
when examining the imposition of a generic routing overlay, which attempts to provide QoS
support, it is perhaps essential for the technology to address the requirement for
asymmetry, where the QoS paths are strictly unidirectional.
Intradomain QoSR
A reasonably robust QoSR mechanism must provide a method to calculate and select
the most appropriate path based on a collection of metrics. These metrics should include
information about the bandwidth resources available along each segment in the available
path, end-to-end delay information, and resource availability and forwarding
characteristics of each node in the end-to-end path.
Resource Expenditure
As the degree of granularity with which a routing decision can be made increases, so
does the cost in performance. Although traditional destination-based routing protocols do
not provide robust methods of selecting the most appropriate paths based on available
path and intermediate node resources, for example, they do impact less on forwarding
performance, impose the least amount of computational overhead, and consume less
memory than more elaborate schemes. As the routing scheme increases in complexity
and granularity, so does the amount of resource expenditure.
The following holds true when considering resources consumed by a given routing
paradigm:
A<B<C<D
162
computational comparisons, it may be sufficient to consider the result of comparing of
two or more primary metrics instead of a computationally intensive comparison of all
available metrics. If recursive computation is required, comparisons of secondary metrics
may then be performed.
Don’t forget that the principal objective of QoSR is that, given traffic with a clearly defined
set of QoS parameters, routers must calculate an appropriate path based on available
QoS path metrics without degrading the overall performance of the network. Therefore,
administrative controls should be implemented to reduce the amount of traffic as much
as possible for which these types of path computations must be made. Of course, this
leaves room for creative interpretation. Various administrative mechanisms that provide
admission control, prejudicial packet drop with IP precedence, and partitioning of
available bandwidth may be candidates in this regard.
A modified link-state routing protocol would appear to be an ideal candidate for QoSR,
because the changes in the network topology state are quickly detected and com-
municated to neighbors in the routing domain. However, given the increased amount of
information that would be contained in LSAs (Link-State Advertisements), the additional
propagation overhead may be inefficient. This particular aspect may be an architectural
design issue, however, and may require the number of peers within a routing domain to
be fewer as a result.
An immediate feedback loop exists between path selection and traffic flows. If a path is
selected because of available resources and traffic then is passed on this path, the
available resource level of the path deteriorates, potentially causing the next iteration of
the QoSR protocol to select an alternate path, which may in turn act as an attractor to the
traffic flows. As established in the ARPAnet many years ago, where the routing protocol
uses measured Round Trip Time (RTT) as the link metric, such routing algorithms with
similar feedback loops from traffic flows to the routing metric often suffer from the inability
to converge to a stable state. The problem of similar QoSR protocols is the management
aspect of this feedback loop to allow some level of workable stability.
QoSR should be scaleable, and the side-effect of having to reduce the number of peers
in a routing area or subdomain is somewhat disconcerting. Therefore, any QoSR
approach also should provide a method for hierarchical aggregation, so that scaling
properties are not significantly diminished by constraints imposed in smaller subdomains.
However, introducing aggregation into this equation may produce a problem in
maintaining accuracy in the path state information when senders and receivers reside in
different subdomains.
Depending on the mechanism used for path selection, there are two possibilities for route
163
processing at a first-hop router (the first router in the traffic path). A first-hop router simply
may forward traffic to the most appropriate next-hop router, as is traditionally done in
hop-by-hop routing schemes, or it may select each and every intermediate node through
which the traffic will traverse in the end-to-end path to its destination, also called explicit
routing.
PNNI (Private Network-to-Network Interface) may offer some useful insights into how to
address this topic, because it provides a form of QoS routing for Virtual Connection (VC)
establishment in ATM networks. Although PNNI does not actually perform routing
calculations as would a more traditional Layer 3 routing protocol such as OSPF (Open
Shortest Path First), admirable QoS parameters (e.g., available node and path
resources, delay, and jitter) certainly are used in its path calculations.
Edsger W. Dijkstra
The approach outlined in [ID1997u] assumes that a flow with QoS requirements will
convey these requirements to the QoSR protocol (OSPF, in this case) via parameters
contained in an RSVP Path message; in other words, the Sender TSpec is conveyed to
the routing protocol along with the destination address.
164
Before you look at this proposal in more detail, you should examine the OSPF Options
field and TOS semantics.
The OSPF Options field enables OSPF routers to support optional capabilities and to
communicate their capability level to other OSPF routers. When used in OSPF Hello
packets, the 8-bit Options field allows a router to reject a neighbor because of a
capability mismatch. The T-bit, depicted in Figure 9.1, describes the router’s TOS
capability. Because you looked at the concept of IP TOS in Chapter 4, “QoS and TCP/IP:
Finding the Common Denominator,” it is not revisited here in depth. If the T-bit is set to 0,
this is an indication that the router supports only a single TOS (TOS 0). This also
indicates that a router is incapable of TOS-routing and is referred to as a TOS-0-only
router. The absence of the T-bit in a router-links advertisement (Type 1 LSA) causes the
router to be skipped when building a nonzero TOS shortest-path topology calculation. In
other words, routers incapable of TOS routing are avoided as much as possible when
forwarding traffic requesting a nonzero TOS.
Tip Avoiding routers that are incapable of TOS routing may not be a very wise
decision. Many network architectures use a very high-speed backbone
network, with the design parameter of using the fastest possible routing
platforms available in order to ensure that the core of the network exhibts no
congestion. Such very high-speed routers may offer no TOS-based switching
capability simply because it adds additional delay to the router and impairs
performance. Hence, routers that support a single TOS may be doing so to
offer a minimal latency service to all traffic classes at very high speed.
All OSPF LSAs, with the exception of Type 2 LSAs (network-link advertisements), specify
metrics. In Type 1 LSAs, the metrics indicate the cost of the links associated with each
router interface. In Type 3, 4, and 5 LSAs, the metric indicates the cost of the end-to-end
path. In each of these LSAs, a separate metric can be specified for each IP TOS. Table
9.1 specifies the encoding of TOS in OSPF LSAs and correlates the OSPF TOS
encoding to the TOS field in the IP packet header (as defined in RFC1349). The OSPF
encoding is expressed as a decimal value, and the IP packet header’s TOS field is
expressed in the binary TOS values used in RFC1349.
165
OSPF Encoding RFC1349 TOS Values
6 0011
10 0101
12 0110
14 0111
18 1001
20 1010
22 1011
24 1100
26 1101
28 1110
30 1111
In router links advertisements (Type 1 LSAs), the T-bit is set in the advertisement’s
Option field if and only if the router is able to calculate a separate set of routes for each
IP TOS. The # TOS field contains the number of different TOS metrics given for this link,
not counting the required metric for TOS 0. If no additional TOS metrics are given, this
field should be set to 0, for example. The TOS 0 metric field indicates the cost of using
this particular router link for TOS 0. For each link, separate metrics may be specified for
each Type of Service (TOS); however, the metric for TOS 0 always must be included.
The TOS field indicates the IP TOS this metric refers to. The description of the encoding
of TOS in OSPF LSA follows. The Metric field indicates the cost of using this outbound
166
router link for traffic of the specified TOS.
You can find a detailed description for the remainder of the fields in Figure 9.2 in
RFC1583 [IETF1993a].
The method outlined in [ID1997u] also proposes the addition of a Q-bit to the OSPF
Options field (also shown in Figure 9.1) to allow routers that support this option to
recognize OSPF Hello packets, OSPF database description packets, and packets
containing information flagged as QoS-capable (e.g., LSAs). Because the OSPF Options
field already is present in each of these packets, it is assumed that the addition of this bit
is not a major architectural modification of the OSPF protocol but an extension that may
selectively be supported by routers.
The Q-bit would be set for all routers and links that support QoS routing. A set Q-bit in an
OSPF packet indicates that the associated network described in the advertisement is
QoS-capable and that additional QoS fields must be processed in the packet. Because
the TOS field in OSPF packets are 8 bits and the existing TOS specification [IETF1992b]
only specifies values of 4 bits, this allows a substantial expansion to the encoding of TOS
values that may be expressed in the OSPF TOS field. Therefore, by using an additional
fifth bit, [ID1997u] proposes an additional 16 TOS values for a TOS field, as depicted in
Table 9.2.
32 10000
34 10001
36 10010
38 10011
167
40 10100 bandwidth
42 10101
44 10110
46 10111
48 11000 delay
50 11001
52 11010
54 11011
56 11100
58 11101
60 11110
62 11111
Notice that the two specified parameters for bandwidth and delay (decimal 40 and 48,
respectively) overlay seamlessly onto the existing TOS values for maximize throughput
and minimize delay (decimal 8 and 16, respectively), therefore avoiding conflicts with
existing values.
The bandwidth and delay parameters in [ID1997u] are expressed in the Metric field.
However, because the Metric field is only 16 bits in length, and gigabit-per-second links
soon will become a reality, a linear expression of the available bandwidth is not practical.
Therefore, [ID1997u] proposes to use a 3-bit exponent and a 13-bit mantissa. The delay
value is encoded in a similar format and is expressed in microseconds.
It also is appropriate to define the concept of route pinning, which loosely defined means
that a particular hop-by-hop route is held in place for the flow duration so that changes in
the routing topology or network load do not cause consistent rerouting of the flow. This
proposal uses path pinning instead of route pinning, because what actually is “pinned” is
the path for a given flow—not a route computed by the routing protocol. The proposed
scheme asserts that the pinning is tied to the RSVP soft state and relies on RSVP
message refreshes and time-out mechanisms to “pin” and “unpin” paths selected by the
routing protocol.
Give this set of translation mechanics, it may indeed be possible to translate QoS
requirements from an incoming flow into useful information that can be used to determine
routing paths for various classes of traffic. However, its usefulness and efficiency remain
to be seen.
168
Interdomain QoSR
Interdomain routing can effectively be described as the exchange of routing information
between two dissimilar administrative routing domains or autonomous systems (ASs).
Likewise, the basic definition of interdomain QoSR factors in the exchange of routing
information, which includes QoS metrics. The principle concern in interdomain routing of
any sort is stability. This continues to be a crucial issue in the Internet, because stability
is a key factor in scalability. Instability affects the service quality of subscriber traffic.
For this reason, a link-state routing protocol may not be ideal property in an interdomain
QoSR protocol. First, LSAs are very useful within a single routing domain to quickly
communicate topology changes to other routers in the routing area. Flooding of LSAs
into another, adjacent routing domain (AS) most likely will inject instability into routing
exchanges between neighboring domains. By the same token, an AS may not want to
advertise details of its interior topology to a neighboring AS.
Also, link-state routing is desirable in intradomain routing because of the speed at which
topology changes are computed and the granularity of information that can be
disseminated quickly to peers within the routing domain. However, the utility of this
mechanism would be greatly diminished by the aggregation of state and flow information,
which normally is done in interdomain routing schemes as the routing information is
passed between ASs. It also would be prudent to assume that limiting the rate of
information exchanged between ASs is a good thing, because interdomain routing
scalability is directly related to the frequency at which information is exchanged between
routing domains. Therefore, a mechanism used to provide interdomain QoSR should
provide only infrequent, periodic updates of QoS routing information instead of frequently
flooding information needlessly into neighboring ASs.
It is unclear at this time what mechanism would provide dynamic, interdomain QoSR
routing. Given the pervasive currently deployed base of the BGP protocol, however, it is
not difficult to imagine that a defined set of BGP (Border Gateway Patrol) extensions
certainly could be deployed to provide this functionality. By the same token, it may be
unnecessary to run a dynamic QoS-based routing protocol between different routing
domains in lieu of a common method to differentiate IP packets based on the values
carried in their TOS or IP precedence fields of the IP packet header.
There is, however, at least one proposal [ID1996gk] that might provide a mechanism with
which to communicate QoS requirements between ASs. The proposal, outlined briefly in
the following section, proposes a mechanism with which BGP might use RSVP to
calculate QoS-capable paths.
169
Tip You can find the IETF MPLS Working Group home page, along with any
associated documents at www.ietf.org/html.charters/mpls-charter.html.
The terms label switching or label swapping are an attempt to convey an amazingly
simple concept, although you do need to know a bit of the historical context and
background information to understand why the basic routing and forwarding paradigm of
today is somewhat less than ideal. To appreciate the label-swapping concept, you must
examine the current longest-match routing and forwarding paradigm used today. The
current method of routing and forwarding is referred to as longest match because a
router references a routing table of variable-length prefixes and installs the “longest,” or
most specific, prefix as the preference for subsequent forwarding mechanisms. Consider
an example of a router that receives a packet destined for a 199.1.1.1 host address.
Suppose that the router has routing table entries for both 199.1.0.0/24 and 199.1.0.0/16.
Assuming that no administrative controls are in place that would interfere with the basic
behavior of dynamic routing, the router will (based on an algorithmic lookup) choose to
forward the packet on an output interface toward the next-hop router from which it
received an announcement for 199.1.0.0/24, because it is a “longer” and more specific
prefix than 199.1.0.0/16.
At first, the significance of this paradigm may not seem important. However, given the
number of prefixes in the global Internet (currently hovering in the neighborhood of about
50,000 variable-length unique prefixes) and the realization that the growth trend most
likely will continue to increase, it is important to note that the amount of time and
computational resources required to make path calculations is directly proportional to the
number of prefixes and possible paths. One of the expected results of label swapping is
to reduce the amount of time and computational resources required to make these
decisions. Given that at any routing point within the network, the number of paths always
is far fewer than the number of address prefixes, the intent here is to reduce the
cumulative switching decision in an n node configuration from a calculation in a space of
order (n times the number of unique prefixes) to a calculation in a space that can be
bounded approximately by order (n!). For closed systems in which the number of unique
end-to-end paths is well constrained (which implies relatively small values of n), this
labeling technique can facilitate a highly efficient forwarding process within the router.
170
MPLS has jokingly been referred to as Layer 2.5, because it is neither Layer 3 nor Layer
2 but is inserted somewhere in the middle between the network-layer and the link-layer.
The network-layer could be virtually any of the various network protocols in use today,
such as IP, IPX, AppleTalk, and so on—thus, the significance of multiprotocol in the
MPLS technology framework. Because of the huge deployed base in the global Internet
and the growing base of IP networks elsewhere, however, the first and foremost concern
and application for this technology is to accommodate the IP protocol.
Alternatively, MPLS can be implemented natively in ATM switching hardware, where the
labels are substituted for (and situated in) the VP/VC Identifiers.
Conceptually, the labels are distributed in the MPLS network by a dynamic label-
}istribution protocol, and the labels are bound to a prefix or set of prefixes. The
association and prefix-to-tag binding are done by an MPLS network edge node—a router
that interfaces other nodes that are not MPLS-capable. The MPLS edge node exchanges
routing information with non-MPLS-capable nodes, locally associates and binds prefixes
learned via Layer 3 routing to MPLS labels, and distributes the labels to MPLS peers
(Figure 9.4).
QoS has several possibilities with MPLS. One of the most straightforward is a direct
mapping of the 3 bits carried in the IP precedence of the incoming IP packet headers to a
Label CoS field, as proposed in Cisco Systems’ contribution to the MPLS standardization
process, Tag Switching. For all intents and purposes, the terms tag and label can be
considered interchangeable. As IP packets enter an MPLS domain, the edge MPLS
router is responsible for—in addition to the functions mentioned earlier—mapping the bit
settings in the IP packet header into the CoS field in the MPLS header, as shown in
Figure 9.5.
One of the most compelling uses for MPLS is the capability to build explicit label-
switched paths from one end of an MPLS network domain to another. It is envisioned
171
that label-switched paths could be determined, and perhaps modeled, with a collection of
traffic engineering tools that perhaps reside on a workstation and then downloaded to
network devices. This is an important aspect for traffic-engineering purposes; it gives
network administrators the capability to define explicit paths through an MPLS cloud
based on any arbitrary criteria. Traffic-engineering functions such as this may contribute
significantly to enhanced service quality. As a result, some method to offer differentiated
services may be possible.
Several parallel paths may exist from one end of an MPLS network domain to another,
for example, each of varying bandwidth and utilizations. It certainly is possible that
explicit ingress-to-egress paths could be chosen for each specific CoS type, each path
offering a distinct differentiated characteristic. Traffic labeled with higher CoS values
could be forwarded along a higher-speed, lower-delay path, whereas traffic labeled with
a lower CoS value could be forwarded on a lower-speed, higher-delay path. This
example illustrates one method of providing differentiated service with MPLS; different
approaches certainly exist. In fact, traffic engineering could be performed without
consideration of the CoS designation altogether. It could be based on other criteria, such
as source address, destination address, or per-flow characteristics; or it could be in
response to RSVP messages.
172
RSVP and QoS Routing Integration
As mentioned several times, there is a noticeable disconnect between RSVP and the
underlying routing system. Without some sort of interface between the two, the utility of a
QoS routing scheme or RSVP alone is less than ideal.
At least one proposal has been submitted to the IETF community that provides a method
for RSVP to request information and services from the local routing protocol process
[ID1996h]. This proposal recommends an RSVP-to-routing interface called RSRR
(Routing Support for Resource Reservations). The proposal describes the role of the
RSRR interface: providing for communication between RSVP and an underlying routing
protocol similar in operation to any other type of API (Application Programming
Interface). This task is accomplished via an exchange of asynchronous queries and
replies, with the RSRR protocol as a conduit. Recall that traditionally, RSVP does not
perform its own routing; instead, it relies on the underlying routing system for path
selection. As outlined in this proposal, RSVP could obtain routing entries via the RSRR
interface, which then would allow it to send RSVP control messages (e.g., Path, Resv,
PathTear, ResvTear) hop-by-hop, as well as request notifications of route changes.
Using the RSRR interface, RSVP may obtain routing information by sending a
Route_Query to the routing process; RSVP uses the returned route entries
(Route_Response) to activate the transmission of Path messages. Additionally, RSVP
may ask the routing process to notify it explicitly of routing changes through the use of a
notification flag (Notify_Flag) in the queries, which results in the routing protocol
immediately communicating route changes to RSVP (Route_Change). The result of this
interaction allows RSVP to adapt to changes in paths through the routing system by
sending periodic Path or Resv messages to refresh the path or reservation state for a
flow.
This particular draft [ID1997x], along with another proposal [ID1997z], details how RSVP
could set up reservations based on explicit route support. Between these two proposals,
a suggested new object, Explicit_Route, could be added to the RSVP specification,
which contains specific information to support explicit routes. The proposed
Explicit_Route object contains a pointer, source IP address, destination IP address, and
a series of IP addresses that, when processed sequentially, indicate the explicit hop-by-
hop path with which the Path message should be forwarded. Because RSVP objects are
handled as opaque, RSVP transports only the Explicit_Route objects in Path messages.
The routing protocol then uses the information contained in the object to determine the
next hop to which the Path message should be forwarded by using the Route_Query and
173
Route_Response mechanisms. At each intermediate node, this identification is done by
extracting the Explicit_Route object and examining the pointer, which is updated at each
hop to the next sequential address in the object. This pointer determines what point in
the sequence of addresses is locally relevant and where to forward the message. After
making this determination, the pointer is updated, and the Path message is sent on its
way to the specified next hop.
The extended RSVP-to-routing interface mentioned earlier [ID1997x] also proposes that
the RSVP Sender TSpec information be contained in the Route_Query so that the
necessary QoS information be made available to a QoS-based routing protocol and can
be used in determining an appropriate path for a flow.
At this point, it is important to restate the concept of path pinning as it relates to QoS path
management and RSVP. This aspect of QoS path management is tricky. Simply stated,
QoS-based routing computes the path, and RSVP manages Path state via its normal
method of message processing. Paths are pinned during the processing of Path
messages. RSVP issues a Route_Query to the QoS routing process to determine the
appropriate next-hop for which to send the Path message. Paths may become unpinned
under several conditions. Paths may be unpinned when Path state for the flow is
removed, via a PathTear or refresh time-out. Paths also may become unpinned when
TSpec parameters in a Path message change, resulting in the receiver reinitiating a
reservation. Paths may become unpinned when a local admission-control failure is
detected after receipt of a Resv message, when a PathErr message is received, or when
a local link failure is detected. As mentioned earlier, if Route_Querys are issued with the
Notify_Flag set, the QoS routing process notifies RSVP of specific route changes or link
failures.
Given this set of criteria and characteristics, support for path pinning requires some basic
modifications to the way in which RSVP control messages are processed [ID1997y]. In
particular, when an RSVP receives the first initial Path message, it issues a Route_Query
to the QoS routing protocol, obtains the best path for expected traffic, and then stores the
next hop as part of the Path message with a pinned flag set. After Path refreshes are
received, RSVP checks the Path state for changes in the Previous Hop (PHOP), the IP
Time to Live (TTL), and the status of the pinned flag for the next hop. If there are no
changes in the Path state and the next hop is flagged as pinned, the indicated next hop
is used for forwarding the Path message. If the Path state changes and the next hop is
not flagged as pinned, RSVP again issues a Route_Query to determine the best path
and again sets the pinned flag for the returned next hop within the subsequent Path
refresh. Similarly, when a Route_Change is received by RSVP, it immediately unpins the
next hop.
174
RSVP and QoS Routing Observations
Given the importance of QoS path management, it is clear that relying on the dynamics
of an underlying routing protocol (to include a QoS-based protocol) can be detrimental if
left to its own devices. The dynamics of the link-state QoS-based routing can interfere
with RSVP state and flow establishment when the network topology changes or when
network utilization changes. Therefore, the support for explicit routes and path pinning
cannot only be considered desirable but mandatory. Even with this support, the
interaction between traffic that is so constrained and the flow levels of best-effort routed
traffic must be considered. The inescapable conclusion is that path-constrained flows
should be associated with some form of router-by-router resource-reservation scheme.
Remember that the concepts and proposals discussed here are, for the most part, simply
that—conceptual design plans and proposed mechanisms to enhance the delivery of QoS
in the network. It therefore is unclear at this point whether these mechanism will be adopted
en masse, portions adopted ad hoc, or none adopted at all. It is encouraging, however, in
that the Internet community is beginning to discuss these issues, because a large portion of
the same community feels that mechanisms similar to the ones described here are
necessary to allow for a more granular, controlled level of QoS delivery.
175
QoS and IP Multicast
The discussion of IP multicast is placed in this section on “futures” for a good reason.
Currently, the capability to impose any form of QoS structure on multicast traffic flows is
not well understood. Fred Baker, the current chair of the IETF, was heard to say of IP
multicast, “Multicast makes any problem harder. If you think you understand a problem,
repeat the problem statement using the word ‘multicast.’” Certainly this observation
appears to be well applied to QoS.
The multicast model is one of group communication, where any member of the group can
initiate data that is delivered to all other members of the group. To achieve this, the wide-
area network-transmission mode is one of controlled flooding, in which a single packet
may be replicated and forwarded onto multiple output interfaces simultaneously. Several
routing protocols are designed to facilitate this controlled flooding, including DVRMP, a
simple Distance Vector Routing Multicast Protocol, a sparse and dense mode Protocol-
Independent Multicast (PIM), and most recently, a proposal for interdomain multicast
routing with extensions to the BGP (Border Gateway Protocol). This model of multiple
asynchronous receivers obviously makes any form of end-to-end flow control very
challenging, so QoS multicast flow management must be undertaken by the network as
an imposed profile on the multicast traffic flow.
Because multicast does not follow an adaptive end-to-end flow control model, selective
discard control structures, such as RED (Random Early Detection), have no significant
impact on multicast applications. To date, the major control facility is admission control,
using a leaky-bucket rate-control structure to impose a rate limitation on a multicast path
within a network. Other QoS control mechanisms that can be triggered by the TOS and
precedence fields of the packet header could be deployed by the router. Note that any
precedence queuing must be applied to all output interfaces uniformly if the QoS
structure is to be applied consistently to the entire multicast group.
As you can see, at this stage, the IP multicast QoS subject is largely speculative. A
significant body of further work must be undertaken to provide a well-structured platform
to support QoS traffic profiles.
176
The basis of QoS as a congestion-avoidance mechanism is that you can provide
permanent QoS measures undertaken by the network on behalf of the customer or
implement QoS structures that can be activated dynamically by traffic passed to the
network by the customer.
You can compare the first structure—a network-activated QoS mechanism activated on
behalf of the customer—to a form of performance insurance. Traffic that matches the
QoS criteria is marked at the network ingress device as being of an elevated
precedence. All such traffic would attract a premium tariff, so this could be considered as
a premium service associated with pricing. In an unloaded network, this action triggers
negligible actions by the network, given that the elevated precedence does not displace
any other traffic within the queues, which are located on the customer’s end-to-end paths
within the network.
The consequent observation is that within such conditions of an unloaded network, the
difference between the throughput of traffic flows that are marked with an elevated
precedence and traffic flows that can be considered normal is negligible. In the event of
congestion along a higher-precedence traffic path, notable comparative differences in
throughput levels will result. Of course, congestion is a localized phenomenon, because
each router is not synchronized to the flow state of its neighbors. Because the queue
structure on any given router is not linked to the queue structure of a neighboring router,
the differences activated by the entry-level elevated precedence are readily
distinguishable only on long-lived flows that traverse identical paths and similarly pass
across the same congestion point.
The outcome of this entry function is that the difference between traffic that matches the
entry conditions of elevated precedence and normal best-effort traffic can be
characterized not by the presence of positive metrics related to flow performance but the
selective absence of negative metrics. This is an interesting result: The less the
incidence of congestion within the underlying network, the less the visible perception to
the user of the value of this QoS precedence elevation. This leads to the observation that
this style of QoS setting is a “just in case” option; the eventuality being insured against
with the permanent QoS setting is small-scale isolated burst congestion events or a more
systemic resource constraint within the network, which is activated by peak load patterns.
It is still unclear how this somewhat negative image of network-activated QoS can be
marketed selectively to a customer base that already is paying for normal best-effort
service. Although some segments of the customer base may be prepared to pay a
premium price for such an elevated service structure, bear in mind that the service is
described accurately as better effort, whereas the customer is more motivated to see a
service performance guarantee in exchange for this form of price premium.
177
packet precedence and packet length, a congestion-based pricing structure can be
implemented. If the customer observes some form of performance degradation, the
option is available to increase the precedence level of transmitted packets to avoid
continued degradation, which is associated with an incremental cost. A simple
congestion-based premium pricing structure would use a single elevated precedence,
whereas more complex models can be constructed with a sequence of precedence
levels and an associated sequence of pricing levels.
A number of potential structures can assist the customer in making an informed decision
as to whether the option of precedence setting is a viable action. One is the use of a
precedence-discovery protocol, which is valuable when precedence structures may not
be deployed uniformly across the full extent of the network’s (or networks’) path. This is
possible by using a variant of the MTU discovery protocol, where ICMP (Internet Control
Message Protocol) packets are used to probe the network path, and the required change
is the return of an ICMP error packet with a precedence not honored if a router on the
path will not (or cannot) accommodate the desired precedence setting. Another approach
is perhaps more ambitious and implements more closely the congestion pricing model of
MacKie-Mason and Varian. Here, the packet accounting is undertaken on egress from
the network, but it is accounted in correlation to the source address as a precursor. The
additional requirement is that a single bit field of the header is reserved for precedence
activation (which could be 1 bit of the IP precedence field). The bit is cleared on ingress
to the network and is set by any router that exercises the precedence structure; this
usurps resources that would have been allocated to another packet. Egress accounting
is performed if the bit is set, which indicates that the precedence function has been
exercised within the packet’s transit within the network.
The result of this more sophisticated network structure is that the customer can indicate a
willingness to pay a congestion premium on entry to the network, and the network will
complete the pricing transaction only if the network had to exercise the precedence
because of congestion in the packet’s path across the network. The complete congestion
pricing structure can be implemented with multiple precedence levels.
178
Interprovider Structures and End-to-End QoS
It would appear that QoS is going to be deployed in two fundamentally different
environments: the private network domain, where QoS is tied to particular performance
objectives for certain applications related fundamentally to business needs and drivers,
and the public Internet, where QoS is tied to competitive service offerings.
The public Internet deployment is particularly challenging. For QoS to be truly effective,
the mechanisms used across a multiprovider transit path require all transit elements to
accept the originating QoS signaling. The minimum condition for QoS to work is that the
QoS signals preserve the semantics of the originating customer, so that the QoS field
value in the customer’s packet is interpreted consistently as a request, for example, for
precedence elevation. The situation that should be avoided is one in which the
neighboring provider in the traffic path accepts the packet’s field value without
modification but places an entirely different semantic interpretation on the value. As an
example, here the situation to avoid is where provider A interprets a field value as a
request for elevated queuing and scheduling priority, while the next provider (B)
interprets the same value as a discard eligibility flag. Accordingly, the first requirement for
interdomain QoS deployment is an operational consensus of the QoS signaling
semantics within the deployed global Internet.
It is a realistic prediction that QoS will not be deployed uniformly on the public Internet,
and accordingly, there will be situations in which QoS signaling will be ignored within an
end-to-end network path. In stateful mechanisms, this is addressed explicitly within the
implementation of path setup and maintenance, while in a stateless QoS environment,
where QoS signaling is undertaken by use of the packet header field values, this is an
open issue. Should an ingress network node that does not implement QoS services
respond to all QoS-labeled packets with ICMP destination unreachable messages? This
interprets the QoS fields as a mandatory condition, where if a network cannot honor the
request for elevated or distinguished service, the network must signal the originator of
this inability. Alternatively, the ingress node may choose to accept the packet, effectively
ignoring the implicit QoS request contained in the header. There are arguments for both
types of responses, although the prevailing philosophy of a liberal acceptance philosophy
tends to see the second response as the most appropriate here.
Another situation in which QoS signaling may or may not be honored in the end-to-end
network path is when the neighboring peer network implements QoS structures and
conforms to the semantics of the originating network. If the packet is accepted by the
peer network, it transits the network with elevated priority and is passed on with the QoS
fields intact. Although this is technically feasible, it is readily apparent that the major
issues here relate to the capability of the commercial agreement between the two
providers to accept and honor QoS settings from peer networks. If the commercial
agreement is not sufficiently robust enough to accommodate QoS interaction, the
provider has little choice but to clear the QoS header values to ensure that these are not
activated within the local transit and that the end-to-end QoS signaling mechanism is
cleared at an intermediate position within the transit path.
Current indications are that local QoS structures will indeed appear within the local
Internet with some limited-pricing premium and some limited applicability to transit paths
within each participating ISP’s customer domain, so that end-to-end QoS will be
achievable, but only when both ends are attached to the same provider. This initial
deployment no doubt will use ingress filtering so that interprovider QoS structures will not
be a default feature of the initial deployment models. It therefore is likely that you will see
179
some refinement of the existing interprovider commercial agreements to admit bilateral
and ultimately transit QoS mechanisms, where the initial QoS agreement between the
customer and the local ISP can be reflected by these interprovider agreements to extend
the reach ability of the QoS domain from the customer’s perspective. As to whether this
“bottom-up” method of QoS deployment will yield truly uniform end-to-end QoS
deployment in the global Internet remains very much a subject of speculation.
Interestingly enough, however, the nature of this speculation is more about the
robustness of the commercial models of interaction, the associated issues of ISP
business practices, and the policies of interprovider agreements than speculation about
the capabilities of QoS technology.
Although the telecommunications world has been able to craft an environment that deploys
a relatively uniform quality level across a multiprovider global network, one result is a
relatively consistent view of end-customer retail pricing models. Whether the pressure to
implement consistent interprovider commercial agreements that encompass meaningful
QoS levels instead of a highly variable perspective on what the better of better effort
quantitatively translates to, and whether it will result in increasing uniformity of the retail
pricing structure for the global Internet, remains to be seen.
180
Should QoS Be Bidirectional?
In much of the discussion about QoS mechanisms, it has been implied that QoS is a
unidirectional mechanism and that it applies to transmission flows that are injected into
the network. RSVP goes one step further and makes this “unidirectionality” explicit. The
question is whether this is adequate as a QoS mechanism, whether it will yield tangible
results in terms of elevated service, or whether bidirectionality is a necessary component
of the QoS picture. Of course, this discussion is predicated on the observation that it is
relevant only to end-to-end controlled traffic flows that are clocked by the network and
most readily is applicable only to TCP traffic. Externally clocked, UDP unidirectional flows
explicitly fall outside consideration within this topic.
The intent of elevating the precedence of packets within a QoS structure is directed
toward removing jitter and loss the network may impose on a traffic flow. The intent is to
allow an end-to-end flow-control algorithm to develop an accurate and consistent view of
the network-propagation delay between the sender and transmitter. If this is obtained by
the flow, the optimal operating condition of the network-clocked flow control can be
achieved, and as the receiver peels data packets off the network, the sender pushes
another packet into the network. This is achieved by timing the sender’s data-
transmission events against the receipt of acknowledgment (ACK) packets.
In this optimal steady state, each time the sender receives an ACK, it advertises the
availability of a receive window, which then allows the sender to push another packet into
the network. Of course, the initial state is that the sender is unaware of the RTT delay
and the maximum available capacity of the path to the receiver, so the TCP flow-control
algorithm uses the slow-start mechanism to determine these parameters. Starting with
one packet, the number of packets the sender pushes into the network doubles with
every RTT, using the pacing of received ACKs to send two data packets for each
received ACK. This algorithm continues until either the buffer space of the sender is
exhausted (so that the flow is running at the maximum pace the sender can withstand) or
the network signals that maximum available capacity for the flow has been achieved. The
latter signaling is done by reception of an ACK, which indicates that the network has lost
data. Within the network, the condition that created this signal is caused by the output
queue of an intermediate node reaching a critical capacity state (saturation), and the data
packet being discarded. The packet discard is caused as a byproduct of queue
exhaustion, or possibly through the actions of a RED mechanism active on the router in
question. Thereafter, the sender moves into a state of congestion avoidance, where the
amount of capacity in the path is probed more gently by attempting to increase the
number of packets in flight within the network by one segment per RTT interval. Again,
packet loss signaled within the reverse ACK stream causes the sender to back off the
rate to the previous start point of the congestion-avoidance behavior.
In the face of continued loss, the sender backs off further and attempts to reestablish the
new flow rate by again using slow-start probing. Elevating the priority of the sending
packets within this flow causes the queue on the intermediate node that is experiencing
congestion (or selective packet drop) to delay the advent of packet discard event for this
flow, because the elevated precedence is a signal to the router to attempt to discard
packets of some other nonelevated priority flow in preference. This has the effect of
allowing the elevated priority flow to take a greater proportion of the bandwidth resources
at the intermediate node in question, because as the data packets of the flow continue to
be transmitted, the intent of the sender is to continue to increase the transmission rate,
either aggressively (if it is in slow-start mode) or more temperately (as in congestion-
avoidance mode).
181
The ACK packets may not necessarily take a symmetrical path back from the sender to
the receiver. Taking into account the explicit or implicit assumption that QoS structures
are imposed unidirectionally on packet flows from a sender to an ingress intermediate
node, the ACKs of the flow may suffer a greater level of jitter and loss than the data
packets. Will this affect the flow rate of the sender, and will this affect the steady-state
objective of the flow-control mechanism to clock the sender’s rate at the level of
maximum availability of the sender’s data path?
The most immediate effect of ACK loss is to cause the data-transfer rate to slow down by
a one-segment size per RTT for each lost ACK, which has an impact on the cumulative
flow throughput rate. The more subtle effect is caused by jitter imposed on the ACK
sequence in slow-start mode. Because ACK packets drive the sender’s data-
transmission clock, jitter in the ACK sequence creates consequent jitter in the sender’s
behavior, which in turn increases the bursting nature of the sender. In slow-start mode,
this jitter can be problematic, because the induced sender jitter can cause an overload of
the interior queuing structure. The steady state of slow-start mode (if steady state can be
used to describe exponential growth in a flow rate) already imposes load on the
network’s internal queuing structures by sending two data segments in the same time
frame where a single data segment was delivered in the previous RTT interval. This
sending overload rate is smoothed by interior queues on the critical intermediate-node
hops. ACK jitter increases this burst level, which consequently increases the burst level
of queuing on the interior queues. The risk here is that this induced ACK jitter will cause
packet discard, which in turn causes premature signaling of rate saturation, switching the
sender from slow-start mode to a more conservative level of rate growth via congestion-
avoidance management and, in the worst case, may cause a reinitiated slow-start with
an initial single segment per RTT.
Admittedly, this is a second-order result, and the effects of non-QoS reverse flow ACKs on
a QoS data flow can be quite subtle. From a technical perspective, TCP sessions should
reflect the precedence level of the incoming data stream in the reverse ACK flow to achieve
the best possible overall flow rate. When only a subset of incoming data packets in the flow
has the precedence field set, the correct behavior of the receiver is less obvious and is
probably a productive subject for further investigation. However, this must be placed into a
context of an overall QoS structure which, in the public Internet environment, probably will
include a premium pricing element for generating precedence-level packets. Here, the
receiver exercises an independent decision about whether to exercise precedence levels in
response to the sender, and the consequent possibility of achieving less-than-optimal flow
rates for this class of traffic must be recognized. Overriding this is the consideration that
such behavior (to reflect precedence parameters in ACK packets) is a function of the end-
system’s TCP stack and any changes in the default behavior of the stack with respect to
precedence.
182
QoS and IP Version 6
Also known as IPng or IP Next Generation, IPv6 provides a number of enhancements to
the current IPv4 protocol specification. However, for all of the reasons used to rationalize
an industry-wide migration to IPv6 (IP Version 6), integrated QoS is not one of the most
compelling arguments. It is a common misperception that the IPv6 protocol specification
[IETF1995c] somehow includes a magic knob that provides QoS support. Although the
IPv6 protocol structure is significantly different from its predecessor (IPv4), the basic
functional operation is still quite similar.
Two significant components of the IPv6 protocol may, in fact, provide a method to help
deliver differentiated Classes of Service (CoS). The first component is a 4-bit Priority field
in the IPv6 header, which is functionally equivalent to the IP precedence bits in the IPv4
protocol specification, with somewhat of an expanded scope. The Priority field can be
used in a similar fashion as described earlier when IPv4 was discussed, in an effort to
identify and discriminate traffic types based on contents of this field. The second
component is a Flow Label, which was added to enable the labeling of packets that
belong to particular traffic flows for which the sender might request special handling,
such as nondefault QoS or real-time traffic. Figure 9.6 shows these IPv6 header fields.
Figure 9.6: The priority and flow label in the IPv6 header.
Tip You can find more information on IPv6 by visiting the IETF IPng (IP—Next
Generation) Working Group Web page at www.ietf.org/html.charters/ipngwg-
charter.html as well as the IPv6 Web page hosted by Sun Microsystems,
located at playground.sun.com/pub/ipng/html/ipng-main.html You can find
interesting information related to the experimental deployment of IPv6 on the
6Bone Web site at www.6bone.net.
183
Figure 9.7: The IPv6 Priority field.
Value Category
0 Uncharacterized traffic
3 Reserved
5 Reserved
For traffic that does not provide congestion control, the lowest Priority value, 8, should be
used for packets the sender is most willing to discard in the face of congestion. Likewise,
the highest Priority value, 15, should be used for packets the sender is least willing to
discard.
It is not clear what immediate benefit the separation of traffic types that do or do not
respond to congestion control (TCP versus non-TCP) provides in this priority scheme.
However, it may very well prove beneficial as more advanced packet-drop algorithms are
developed.
184
Of course, technology often changes in midstream, and IPv6 certainly is no exception. A
set of proposed modifications to the base IPv6 protocol specification has been proposed
[ID1997a3]hat includes changing the semantics of the IPv6 Priority field as originally set
forth in RFC1883. This and other modifications were submitted to the IETF IPng Working
Group in July 1997 and initially discussed at the IETF working group meetings in Munich,
Germany, in mid-August 1997. Currently, it is unclear whether these semantics will be
adopted, but it is worthwhile to outline them here, because there is a very good chance
they will be advanced as standards track modifications to the base specification.
As illustrated in Figure 9.8, the name of the 4-bit field previously known in RFC1883 as
the Priority field has been changed to the Class field. The placement of this field has not
been changed in the IPv6 header, but the internal semantics have been changed
significantly. The Class field has been modified to include two subfields: the Delay
Sensitivity Flag subfield (or the D-bit) and the Priority subfield. The D-bit is a single bit
that identifies packets that are delay sensitive and may require special handling by
intermediate routers. This special handling might include special forwarding mechanics
(for example, forwarding packets with the D-bit set along a lower-latency path), whereas
packets without this bit set are forwarded along another, higher-latency path.
The 3-bit Priority subfield is similar in context and function to the IP precedence bit field
in the IPv4 header and indicates relative priority of the packet. The higher the value in the
field, the higher the priority. These modifications were made, in part, to facilitate a
cleaner mapping of the values contained in an IPv4 IP Precedence bit field into the IPv6
Priority field in an effort to simplify translation and transition schemes.
In a manner similar to that described earlier with regard to the IPv4 IP precedence field,
the IPv6 Priority field setting can be used to identify and discriminate traffic based on
these values as packets travel through the network. The basis of this CoS model is that
during times of congestion, a prejudicial method of dropping packets is used, based on
the content of the Priority field. The lower the value of the contents of the Priority field,
the higher chances are that the packets will be dropped. The higher the values in the
Priority field, the less susceptible the packets are to being dropped. Of course, if there is
no congestion, all packets, regardless of their Priority designation, are happily delivered
with equity. An enhanced congestion-avoidance mechanism, such as the eRED
(Enhanced RED) algorithm described earlier [FKSS1997], could be used on a hop-by-
hop basis through the network to provide the prejudicial method of packet drop.
There are clearly two benefits to using a packet-drop strategy such as eRED. The first
benefit is that the basic underlying RED [Floyd1993] functionality provides a method to
avoid global synchronization, where several hundred (or possibly even several hundred
thousand) TCP sessions experience packet loss at roughly the same time, react by
backing off at roughly the same time, and then begin to ramp up at roughly the same
time because of buffer overflow at some point in the network. This congestion-avoidance
185
mechanism is beneficial in a way that is more global in nature, providing a method to
ensure the overall health of the network.
The second benefit is a byproduct of the first and is one of the central themes in
providing for differentiated CoS. The eRED traffic-drop strategy provides for the
prejudicial distinction of packets to be dropped in times of congestion.
The second mechanism provides a more granular level of control. This mechanism
entails implementing a token bucket, or perhaps multiple token buckets, on a router
interface with predefined bit-rate thresholds, coupled with the source and/or destination
criteria mentioned earlier. When traffic matching the defined criteria exceeds the
configured bit-rate threshold, the Priority value of nonconforming packets can be set to a
lower, predefined value. Packets from flows that remain below the bit-rate threshold are
set with a higher predefined Priority value.
Using either of these mechanisms provides the necessary control to contribute predictive
network behavior while allowing a method to discriminate traffic, effectively delivering
differentiated classes of service.
The Flow Label in the IPv6 header is a 24-bit value designed to uniquely identify any
traffic flow so that intermediate nodes can use this label to identify flows for special
handling—for example, with a resource-reservation protocol, possibly RSVPv6 or a
similar protocol. The Flow Label is assigned to a flow by the flow’s source node or the
flow originator. RFC1883 requires that hosts or routers that do not support the functions
of the Flow Label field be required to set the field to 0 when originating a packet, pass the
field on unchanged when forwarding a packet, and ignore the field when receiving a
packet. It also specifies that flow labels must be chosen randomly and uniformly, ranging
from hexadecimal 0x000001 to 0xffffff, as depicted in Figure 9.9. Additionally, all packets
belonging to the same flow must be sent with the same source address, destination
address, Priority value, and Flow Label.
186
Currently, aside from the suggestion mentioned in RFC1883, there are still no clearly
articulated methods for using the Flow Label in the IPv6 header, apart from using it with a
resource-reservation protocol such as RSVP. However, at least one recommendation
has been published that provides a generalized proposal for using the Flow Label in the
IPv6 header. RFC1809 [IETF1995d] suggests that Flow Labels may be used on
individual sessions in which the flow originator may not have specified a Flow Label, and
an upstream service provider may want to impose a Flow Label on specific flows as they
enter their network, in an effort to differentiate these flows from others, possibly for
preferential treatment. Currently, it is unclear what intrinsic value the Flow Label actually
provides, because recently devised methods are available to build flow state in routers
without the need for a special label in each packet.
In a similar vein, the IPv6 Priority field does not offer any substantial improvement over
the utility of the IP precedence field in the IPv4 header. Granted, the IPv6 Priority field
offers eight additional levels of distinction (16 possible values) than found in the IPv4 IP
precedence field (8 possible values). However, this does not prove to be of considerable
benefit. Conventional thinking suggests that because no one has yet to commercially (or
otherwise) implement even a simple two-level class of distinction for differentiating
service classes, increasing the possible number of service classes is not a compelling
reason to prefer IPv6 over IPv4—at least not for the foreseeable future. A rough
consensus of polled service providers indicates that if they deployed a service to provide
differentiated CoS, they would not be interested in offering any more than three levels for
matters of configuration simplicity and management complexity.
The utility of the IPv6 Flow Label, on the other hand, may prove to be quite useful.
However, the only immediately recognizable use for the Flow Label is in conjunction with
a resource-reservation protocol, such as RSVP for IPv6. The Flow Label possibly could
be used to associate a particular flow with a specific reservation. The presence of a Flow
Label in data packets also may help in expediting traffic through intermediate nodes that
have previously established path and reservation states for a particular set of flows.
Aside from using the IPv6 Flow Label with RSVP, its benefit is not immediately
determinable.
All in all, IPv6 does not offer any substantial QoS benefits above and beyond what
already is achievable with IPv4.
187
Chapter 10: QoS: Final Thoughts and
Observations
Overview
Quality of Service continues to remain a largely desirable property, a far-reaching
requirement, and an ever-increasing topic of discussion with respect to the Internet
today.
So far, QoS has been viewed as a wide-ranging solution set against a very broad
problem area. This fact often can be considered a liability. Ongoing efforts to provide
“perfect” solutions have illustrated that attempts to solve all possible problems result in
technologies that are far too complex, have poor scaling properties, or simply do not
integrate well into the diversity of the Internet. By the same token, and by close
examination of the issues and technologies available, some very clever mechanisms are
revealed under close scrutiny. Determining the usefulness of these mechanisms is
perhaps the most challenging aspect in assessing the merit of a particular QoS
approach. So, as we make these observations, we’ll try to assess their practical utility as
well.
One basic observation will be revisited here before we can begin to draw observations on
how to practically implement a QoS methodology, and several criteria need to be
quantified before launching into a dissertation on which one might be appropriate and in
what environment. One conclusion is that when a network is under severe stress, the
network operator can increase the available bandwidth (and scale up a simple switching
environment to match the additional bandwidth) or leave the base capacity levels
constant and attempt to ration access to the bandwidth according to some predetermined
policy framework.
A very supportable second conclusion is the fact that bandwidth is not unlimited in every
location where there is a need for data. Within this framework, the need to differentiate
services becomes a technical argument, not necessarily one of introducing new revenue-
generating products and services. Although the latter certainly may be a byproduct of the
former, unfortunately, new business services rarely are driven by technical requirements.
188
A Matter of Marketing?
The current model of the Internet marketplace is an undifferentiated one. Every
subscriber sees a constant best-effort Internet where, at any point within the network,
packets are treated uniformly. No distinction is made in this model for the source or
destination of the packet, no distinction is made in relation to any other packet header
field, and no distinction is made in regard to any attempt to impose a state-based
contextual “memory” within the network. The base best-effort model is remarkably
simple. A single routing generated topology is imposed on the network, which is used by
the switches to implement simple destination-based, hop-by-hop forwarding. At each
hop, the packet may be queued in the next hop’s FIFO (first in, first out) buffer, or if the
buffer is full, the packet is discarded. If queuing delay or buffer exhaustion occurs,
discarded packets are affected equally. Thus, although the Internet service model is
highly variable across a large network, the elements of the service model are imposed
uniformly on all traffic flows.
The challenge here is that although the price differential will be fixed within the fee
schedule, the service differential will be extremely difficult to determine from the
customer’s perspective. In an uncongested network, there will be no visible service
differential (unless the operator undertakes traffic degradation for non-QoS traffic during
unloaded periods, which in a competitive market appears to be a very short-sighted
move). As the network load increases, the differential will become greater as the QoS
traffic consumes a proportionately larger share of the congested resources. Of course,
this is not an infinite resource, and at some load point, the QoS traffic will congest within
the QoS category. If this happens, the service differential will then start to decline.
189
Is QoS a subscription service or a user-specified per-packet option? From a marketing
perspective, the subscription model offers many advantages, including some stability of
the QoS revenue stream, some capability to plan the QoS traffic levels and consequent
engineering load, and a resultant capability from the engineering perspective to be able
to deliver a stable QoS service. Packet-option models create a more variable service
environment with highly dynamic QoS loads that are visible only at periods of intense
network use. A per-packet option also entails extensive packet-level accounting and
verification at the entry to the network, because the consumer would anticipate that any
fee premium would apply only to packets marked with precedence directives.
Looking at QoS solely from the perspective of the transaction between the service provider
and the consumer of the service and taking the approach that QoS is a matter of marketing,
however, tends to ignore the larger issue of market economics, which nevertheless are very
critical here. So is QoS a matter of market and supply-side economics?
190
A Matter of Economics?
Some of the more profound and intriguing aspects of examining what is possible in the
realm of QoS is that it can, in some regard, be considered a situation that simply boils
down to a set of compromises. Although some of the compromises are economic and
some are technical, they are related inextricably in the grand scheme of things. If
bandwidth were truly unlimited and access to installed fiber was readily available for a
pittance, for example, QoS probably would not be an issue at all, except perhaps as an
interesting research project to examine the possibility of changing the speed-of-light
propagation delay characteristics on transcontinental links. In an ideal world, you also
might choose to limit access to the network for certain types of traffic for a variety of
reasons. Admission control appears to be an eternal desire, regardless of the economic
or technical paradigm.
Of course, this is not the case, and bandwidth on a global scale is not as cheap or eadily
available as we would like. This presents a delicate balance among network engineering,
network architectural design, and scales of economy, some of which are still not well
understood in the telecommunications industry. The brokering of transcontinental
bandwidth is a complex and convoluted game. The players are global
telecommunications industry giants who have been playing the game without
consideration of the traffic content, whether it be voice or data, but only within the realm
of capacity. The process was crafted to reflect the very high investment levels required
for such projects and the high risks of such investment, as well as the relatively slow
growth of the traditional consumer of the product—the voice market. The industry geared
itself to an artificial constraint of supply to ensure stability of pricing, which in turn was
intended to ensure that the return on investment remained high (those familiar with the
diamond industry no doubt will see some parallels in this situation). Thus, international
cable systems are geared to ensuring that at any stage, the supply of capacity onto the
market just matches the current level of demand, so that wholesale pricing does not
crash through dumping and the investment profile remains adequately attractive for the
associated investment risk.
Traditionally, Internet Service Providers (ISPs) simply have bought or leased circuits from
the telco (the local telecommunications entity or telephone company) with which they
construct their networks. This worked reasonably well in the earlier days of the Internet,
when traffic volumes were relatively low and high-speed circuits were relatively low-
speed by today’s standards. The circuit orders were provisioned from the margins of the
oversupply of a vastly greater voice network, where the supply model was one of
advance provisioning up to two decades in advance of consumption. The supply of
carriage for data leases can be seen as reducing the oversupply margin by some small
number of months two decades hence. In the intervening period, the telco has a revenue
stream for otherwise idle capacity.
191
A paradigm shift occurred over the course of the past 10 years where some of the telcos
were getting into the ISP business and competing directly with the traditional ISP
businesses to which they also are selling capacity. Coupled with the fact that the number
of ISPs has grown phenomenally during the same time frame, as has the demand for
bandwidth, capacity is not readily available for a variety of reasons. One reason is the
sheer growth in bandwidth consumption and a staggering lag in provisioning new
capacity. The provisioning ystems and their associated capital investment structures still
are well entrenched within a traditional slow linear growth model. The capital demands of
this recent explosive wave of data expansion will take some time, because a radical
change to the capital investment programs of the capacity providers (currently the telcos)
will not occur quickly.
Another significant reason for this shift is the telco’s emerging appreciation of an
apparent conflict of interest. The traditional high barriers for entry into the voice market
has been eroded by both deregulation and technology advance, so that telcos may
perceive their current ownership of cable as the last bulwark of protection in their
historical voice revenues. An additional factor, is that the telcos are now becoming active
players in the “value-added” data business, either through direct business development
or by acquisition, and now are somewhat reluctant to sell significant levels of capacity
outside their immediate in-house interests in consideration of their own competitive
interests in a deregulated market.
Of course, in some geographic locations, bandwidth cannot be purchased for any price
because it simply does not exist. In this case, the deployment of QoS (in response to
bandwidth scarcity) is based on local infrastructure investment conditions instead of an
artifact of the current state of the industry.
192
A Matter of Technology?
This section approaches the technology viewpoint by using a top-down perspective. The
first issue is one for which the most common denominator must be identified, because
this is the place in which QoS is most easily implemented and least likely to produce
unpredictable or undesirable effects.
The simplistic answer to this conundrum is to dispense with TCP/IP and run native cell-
based applications from ATM-attached end-systems. This is certainly not a realistic
approach in the Internet, though, and chances are that it is not very realistic in a smaller
corporate network. Very little application support exists for native ATM. Of course, in
theory, the same could have been said of frame relay transport technologies in the recent
past and undoubtedly will be claimed of forthcoming transport technologies in the future.
In general, transport-level technologies are similar to viewing the world through plumber’s
glasses: Every communications issue is seen in terms of point-to-point bit pipes. Each
wave of transport technology attempts to add more features to the shape of the pipe, but
the underlying architecture is a constant perception of the communications world as a set
of one-on-one conversations, with each conversation supported by a form of singular
communications channel.
One of the major enduring aspects of the communications industry is that no such thing
as a ubiquitous single transport technology exists. Hence, there is an enduring need for
an internetworking end-to-end transport technology that can straddle a heterogeneous
transport substrate. Equally, there is a need for an internetworking technology that can
allow differing models of communications, including fragmentary transfer, unidirectional
data movement, multicast traffic, and adaptive data-flow management.
This is not to say that ATM itself, or any other link-layer technology for that matter, is not
an appropriate technology to install into a network. Surely, ATM offers high-speed
transport services, as well as the convenience of virtual circuits. However, what is
perhaps more appropriate to consider is that any particular link-layer technology is not
effective as far as providing QoS for reasons that have been discussed thus far.
The second technology issue is determining how, within the constructs of IP, QoS can be
provided.
Providing QoS
Before looking in detail at the way in which QoS can be provided, the underlying model of
193
the network itself is relevant here. To quote a work in progress from the Internet
Research Task Force, “The advantages of [the Internet Protocol’s] connectionless
design, flexibility and robustness, have been amply demonstrated. However, these
advantages are not without cost: careful design is required to provide good service under
heavy load” [IRTFa]. Careful design is not exclusively the domain of the end-system’s
protocol stack, although good end-system stacks are of significant benefit. Careful design
also includes consideration of the mechanisms within the routers that are intended to
avoid congestion collapse. Differentiation of services places further demands on this
design, because in attempting to allocate additional resources to certain classes of traffic,
it is essential to ensure that the use of resources remains efficient and that no class of
traffic is totally starved of resources to the extent that it suffers throughput and efficiency
collapse.
IRTF Overview
The IRTF is managed by the IRTF Chair in consultation with the Internet Research
Steering Group (IRSG). The IRSG membership includes the IRTF Chair, the chairs
of the various research groups, and possibly other individuals (“members at large”)
from the research community.
The IRTF Chair is appointed by the Internet Architecture Board (IAB), the research
group chairs are appointed as part of the formation of research groups, and the
IRSG members at large are chosen by the IRTF Chair in consultation with the rest
of the IRSG and on approval of the IAB. In addition to managing the research
groups, the IRSG may from time to time hold topical workshops focusing on
research areas of importance to the evolution of the Internet or more general
workshops to discuss research priorities from an Internet perspective, for example.
You can find more information on the IRTF at the IRTF Web site, located at
www.irtf.org.
194
In a traditional unicast environment, the router uses the destination IP address of the
datagram when determining which outbound interface to pass the packet to for
transmission. One course of possible action here is to make a forwarding decision based
on some QoS-related setting. Quality of Service routing can take a variety of factors into
consideration, as discussed earlier. One action is to overlay a number of distinct
topologies based on differing results of calculating end-to-end (or hop-by-hop) metrics.
These metric calculations might include such factors as estimates of propagation delay
or configured-link capacity. It certainly is theoretically possible to also use current idle
capacity as a metric, although a very tight relationship is required between forwarding
decisions and the subsequent need to redefine the routing protocol, so this renders this
option an unstable choice, at least for the short-term future. In fact, QoS-routing
technologies still are in the conceptual stages, and nothing appears to be viable in this
area for the near-term, especially when considering this as a candidate routing
mechanism for large-scale Internet environments. Alternatively, the forwarding decision
can be based on an imposed state, where the flow identification of the packet is matched
against a table of maintained state information. Such constructs are used within the
RSVP protocol mechanisms.
Queue Management
Conventionally, FIFO queues are used in routers. This preserves the ordering of packets
within the router, so that sequential packets of the same flow remain in order along a
particular end-to-end path within the network. Given the finite length of an output queue,
the router performs packet discard once the queue is exhausted, creating what
commonly is referred to as a tail-drop behavior. In this state, discard is undertaken as an
alternative to queuing. If space is available on the queue, the packet is added to the tail
end of the queue. Otherwise, the packet is discarded.
195
QoS criteria can be admitted into the queue-admission policy through the selective
behavior of RED. The technique is to weight the preference for packet discard to lower-
precedence packets, which causes a rate-damp signal to be sent to lower-precedence
streams (assuming that all packets within a stream have identical precedence) before the
signal is sent to the higher-precedence streams.
The intrinsic behavior of RED is a relatively subtle form of service differentiation. Short-
lived TCP flows and nonflow-controlled UDP streams are relatively immune from the
effects of RED, in that while the packet drop may cause retransmission, the total flow
rate is not substantially altered. This is either because the flow is only of very short
duration in any case, or the end-to-end application protocol is not drop sensitive. Thus,
while RED can romote more efficient use of network resources by well-behaved TCP
flows, and while weighting of RED by some form of QoS precedence can allow some
level of QoS flow differentiation within the network, the overall result of visible
differentiation of service level is difficult to discern within today’s traffic profiles.
The queue-admission policy can be altered further by modifying the basic FIFO operation
to one in which QoS packets may be inserted at the head of the queue or in an
intermediate position that maintains an overall queue ordering by some QoS precedence
value. However, this particular model is best regarded not as a queue-management
issue, by as a use of multiple queues, with the consequent issue of scheduling packets
from each of the queues.
Uncontrolled UDP (User Datagram Protocol) flows are a relevant consideration here, and
with the increasing deployment of UDP traffic flows under the multimedia umbrella, this
issue will become critical in the near future. A long-lived UDP flow, operating at a rate in
excess of an intermediate hop’s resource capacity, places growth pressure on the
associated output queue, ultimately causing queue saturation and subsequently a queue
tail-drop condition that affects all flows across this path. Controlling the level of non-flow-
controlled UDP can be a difficult ingress policy to effectively police. An alternative is to
use distinct queuing disciplines for TCP and UDP traffic, attempting to minimize the
interaction between flow-controlled and non-flow-controlled data.
Because a number of ways exist to modify this single FIFO queue behavior by using
multiple output queues on a single outbound interface, the router behavior can be seen
from the perspective of scheduling packets from the set of associated output queues.
One such method includes varying the queue lengths and then adopting a scheduling
algorithm that queues packets in some form of preferential basis. This could done by
weighted round-robin queue selection or some other form of queue-selection algorithm.
The capability of the QoS application is an outcome of the queue structure, the queue-
admission criteria, and the packet-scheduling mechanism. The use of QoS weighting on
the queues, where higher-precedence packets are scheduled at a higher priority, does
introduce grossly visible differentiation of service levels under load. As the network load
increases, the incremental congestion load is expressed in increased queue delay in
queues that surround the network load point. This queuing delay is visible in terms of
extended RTT estimates for traffic flows, which in turn reduces the TCP traffic rates. Both
slow-start and congestion-avoidance TCP algorithms are RTT sensitive. By using QoS
scheduling, the higher-recedence traffic consumes a greater share of the congested
segment(s), attempting to preserve the RTT estimates of the associated traffic flows. The
difference between a relative constant RTT and one that exhibits relatively high variability
and higher average value is eminently visible, both for short and long TCP traffic flows. It
196
also is highly visible to imple diagnostic probes such as PING and TRACEROUTE. It is
admittedly a relatively coarse differentiator of traffic and one that can be used to cause
high levels of differentiation even under relatively light levels of load. Of course, if QoS is
tied to a financial premium, one of the key market attributes of QoS will need to be highly
visible QoS differentiation. Weighting the queue scheduling is the mechanism that offers
the greatest promise here.
QoS Deployment
For QoS to be functional, it appears to be necessary that all the nodes in a given path
behave in a similar fashion with respect to QoS parameters, or at the very least, do not
impose additional QoS penalties other than conventional best effort into the end-to-end
traffic environment. The sender or network ingress point must be able to create some
form of signal associated with the data that can be used by down-flow routers to
potentially modify their default outbound interface selection, queuing behavior, and
discard behavior.
The insidious issue here is attempting to exert “control at a distance.” The objective in
this QoS methodology is for an end-system to generate a packet that can trigger a
differentiated handling of the packet by each node in the traffic path, so that the end-to-
end behavior exhibits performance levels in line with the end-user’s expectations and
perhaps even a contracted fee structure.
This control-at-a-distance model can take the form of a “guarantee” between the user
and the network. This guarantee is one in which, if the ingress traffic conforms to a
certain profile, the egress traffic maintains that profile state, and the network does not
distort the desired characteristics of the end-to-end traffic expected by the requester. To
provide such absolute guarantees, the network must maintain a transitive state along a
determined path, where the first router commits resources to honor the traffic profile and
passes this commitment along to a neighboring router that is closer to the nominated
destination and also capable of committing to honor the same traffic profile. This is done
on a hop-by-hop basis along the transit path between the sender and receiver, and yet
again from receiver to sender. This type of state maintenance is viable within small-scale
networks, but in the heart of large-scale public networks, the cost of state maintenance is
overwhelming. Because this is the mode of operation of RSVP, RSVP presents some
serious scaling considerations and is inappropriate for deployment in large networks.
The alternative to state maintenance and resource reservation schemes is the use of
mechanisms for preferential allocation of resources, essentially creating varying levels of
best-effort. Given the absence of end-to-end guarantees of traffic flows, this removes the
criteria for absolute state maintenance, so that better-than-best-effort traffic with classes
of distinction can be constructed inside larger networks. Currently, the most promising
197
direction for such better-than-best-effort systems appears to lie within the area of
modifying the queuing and discard algorithms. These mechanisms rely on an attribute
value within the packet’s header, so these queuing and discard preferences can be made
at each intermediate node. First, the ISP’s routers must be configured to handle packets
based on their IP precedence level. There are three aspects to this: First, you need to
consider the aspect of using the IP precedence field to determine the queuing behavior
of the router, both in queuing the packet to the forwarding process and queuing the
packet to the output interface. Second, consider using the IP precedence field to bias the
packet-discard processes by selecting the lowest precedence packets to discard first.
Third, consider using any priority scheme used at Layer 2 that should be mapped to a
particular IP precedence value.
The cumulative behavior of such stateless, local-context algorithms can yield the
capability of distinguished service levels and hold the promise of excellent scalability.
You still can mix best-effort and better-than-best-effort nodes, but all nodes in the latter
class should conform to the entire QoS selected profile or a compatible subset (an
example of the principle that it is better to do nothing than to do damage).
Each of the mechanisms discussed so far rely on the network to implement distinctions
of quality of service. The user also has the capability to implement distinguished service,
particularly in relation to TCP traffic.
Using TCP buffers and window advertisements commensurate with the delay-bandwidth
product of the end-to-end traffic path allows the end-system to effectively utilize available
bandwidth. The data rate is based on the amount of data that can be loaded into the end-
to-end path, divided by the propagation delay. Queuing behavior attempts to reduce the
propagation delay (which is the sum of the signal propagation delay across all
intermediate hops plus the queuing time) by reducing queuing delay within this equation.
If the system buffer is less than the delay-bandwidth product, no more data can be sent
until the signal of receipt is received. Therefore, the buffer size must be greater than two
times the bandwidth-delay product to increase data-transfer performance, so the only
limiting factor becomes the true bandwidth of the network and not inadequate buffering.
The use of improved implementations of TCP window-scaling options, along with very
large buffers, can provide significant improvements in end-to-end performance without
any changes to the underlying behavior of the network. This is especially true in some
cases where very long propagation delay is evident, such as in geostationary satellite
paths.
Even within such a model, some amount of queuing is inevitable. A TCP session can be
thought of as a quasiconstant delay loop. When a sender commences in slow-start
mode, it transmits one packet, and upon receipt of the acknowledgment (following a
delay of one RTT in time), immediately transmits two packets. This cycle of immediately
transmitting two data packets upon receipt of each ACK packet continues through the
slow-start mode. If, at any point in the traffic path, there are slower links than at the
sender’s end of the traffic path or there is the presence of other traffic, queuing of the
second packet occurs. The steady state of slow start is an exponential increase of
volume, placed into the end-to-end path using ACK pacing to double the data in each
ACK “slot.” The resulting behavior of this is naturally bursty traffic. At each RTT interval,
the packet train contains successive trains of 2, 4, 8, and so on packets, where the
pacing of the doubling is based on the bit rate of the slowest link in the end-to-end traffic
path.
198
Within the network, this sequencing of packets at a rate of twice the previous rate within
each RTT interval is smoothed out to the available bandwidth of the end-to-end path by
the use of queuing within the routers. This queuing requirement can grow to half the
delay-bandwidth product. For one-half of this RTT delay interval, the queuing
requirement is to store one segment for every segment it can transmit. This queuing
burst can be mitigated by the sender implementing bandwidth estimates using sender
packet back-off, where the initial RTT is used to pace the emission of the second of the
two packets, at one-half the RTT, for the second iteration at one-fourth the RTT, and so
on. Alternatively, ACK pacing can be done at the receiver or in the interior of the network
(although interior ACK pacing may require symmetric paths to allow automatic initiation
of ACK spacing by the router). The slow-start phase of doubling the data rate per RTT
continues to the onset of data loss; thereafter, further increases of data rate are linear.
This next phase, the congestion-avoidance algorithm, instead of doubling the amount of
data in the pipe each RTT, uses an algorithm intended to undertake a rate increase of
one MTU per RTT.
How can the host optimize its behavior in such an environment? Consider these
solutions:
Use a good TCP protocol stack. Much of the performance pathologies that exist in
the network today are not necessarily the byproduct of oversubscribed networks and
consequent congestion. Many of these performance pathologies exist because of
poor implementations of TCP flow-control algorithms, inadequate buffers within the
eceiver, poor use of MTU discovery (if at all), and imprecise use of protocol-required
timers. It is unclear whether network ingress-imposed QoS structures will adequately
compensate for such implementation deficiencies, but the overall observation is that
attempting to address the symptoms is not the same as curing the disease.
A good protocol stack can be made even better in the right environment, and the
following suggestions are a combination of measures that are well studied, are known to
improve performance, and appear to be highly productive areas of further research and
investigation.
Implement larger buffers with TCP window-scaling options. The TCP flow
algorithm attempts to work at a data rate that is the minimum of the delay bandwidth
product of the end-to-end network path and the available buffer space of the sender.
Larger buffers at the sender assist the sender in adapting to a wider diversity of
network paths more efficiently by permitting a larger volume of traffic to be placed in
flight across the end-to-end path.
Use a higher initial TCP slow-start rate than the current 1 MSS (Maximum
Segment Size) per RTT. A sample size that appears feasible is an initial burst of 4
MSS segments. The assumption is that there will be adequate queuing capability to
manage this initial packet burst, while the provision to back off the send window to 1
MSS segment should remain to allow stale operation if the initial choice was too large
for the path. The result of a successful start with a transmission window of 4 MSS
units is that the initial data rate through the slow-start algorithm will move four times
the data in the same time interval!
Implement sender data pacing or receiver ACK spacing so that the burst nature
199
of slow-start increase is avoided. This is perhaps a more controversial
recommendation. The intent of such pacing is to reduce the inherent burst nature of
the slow-start TCP algorithm and, in so doing, relieve the queuing pressure placed on
the network where the end-to-end path traverses a relatively slower hop. However,
modern routers can use small, fast caches to detect and optimally switch packet
trains, and packet pacing breaks apart such trains. The advantage of such an
approach is to allow the network flow to quickly find the available end-to-end flow
speed without receiving transient load signals that may confuse the availability
calculation being performed.
All these actions have one feature in common: They can be deployed incrementally and
individually, allowing end-systems to obtain better-than-best-effort service even in the
absence of the network provider attempting to distinguish traffic by class distinction.
Also, because HTTP traffic typically is greater than 50 percent of all public IP network
traffic, one additional point must be addressed.
Use HTTP Version 1.1 [IETF1997c]. HTTP 1.0 requires each client-server exchange
to open a corresponding TCP connection. HTTP 1.1 is not bound by this restriction,
though; it allows multiple client-server exchanges to be conducted over one (or more)
TCP connections. In other words, it uses persistent TCP connections wherever
possible. This behavior also negates the problem where TCP RSTs (Resets) are sent
instead of FINs (Finish tags), which results in excessive TCP ACKs being generated.
The conclusion here is that if the performance of end-to-end TCP is the perceived problem,
it is not necessarily the case that the most effective answer lies in adding service
differentiation in the network. It is often the case that the greatest performance improvement
can be made by improving the way in which the hosts and the network interact through the
flow-control protocol.
200
Service Quality and the Network
The second part of this “good-housekeeping guide” list is intended to allow the network to
play its role in working within reasonable operating parameters.
Of course, there is no substitute for proper network engineering in the first place.
Network loads generally exhibit peak load conditions every day, and if the network
cannot handle these loads, the consequent overload conditions created by bandwidth
exhaustion, queue exhaustion, and switch saturation cannot be readily ameliorated by
QoS measures. The underlying resource-starvation issue must be addressed if any level
of service is to be delivered to the network’s clients. Additionally, the stability of the
routing environment is of paramount importance to ensure that the network platform
behaves predictably. Therefore, two primary prerequisites exist for effective network
management:
Network stability. This refers to the stability of the underlying transport substrate
and the routing system over which the network layers above this fabric. Without an
environment where the routing system is allowed to converge (and then operate for
an extended period without further need to recompute the internal forwarding tables),
the network rapidly degenerates and degrades in such a way that no incremental
introduction of QoS structures can salvage it.
Major performance and efficiency gains can be made by allowing the network to signal to
the end-systems the likely onset of congestion conditions, so that the end-systems can
take action to reduce the traffic rate well before the network is forced into queue tail-drop
behavior. Three of the most effective steps a network operator may take to improve
network efficiency and end-user flow performance follow:
Implement Random Early Detection (RED). This ensures that the initial congestion
back-off signals are sent to the appropriate TCP senders (which are pushing hardest
at the network) before comprehensive packet discard occurs. RED is statistically
likely to signal those stacks that are operating with large transmission windows, and
the effect of this discard mechanism is to signal a reduction in the transmit window
size. This mechanism attempts to avoid first signaling those small-scale flows that
are not causing the overall congestion problem.
Separate TCP and UDP queues within the router. One such mechanism is to
implement class filters to bound the level of resources given to non-flow-controlled
UDP traffic, allowing the flow-controlled queues to behave more predictably. The
most direct way to implement this is to place UDP traffic and TCP traffic in different
output queues and use a weighted scheduling algorithm to select packets from each
queue according to a network-imposed policy constraint of relative resource
201
allocation. This method allows the TCP end-system stacks to oscillate faster, in order
to estimate the amount of total end-to-end traffic capacity, based on the behavior of
the flow-|ontrolled traffic passing through the queues. Although UDP does not provide
this “network-clocking” function, it is assumed that the UDP application will be
intelligent enough to understand how to pace itself. Of course, this may not be a valid
assumption, but then again, the application may be fundamentally broken if it cannot
do so. The outcome of this recommendation is to limit the extent of damage a non-
network-clocked UDP flow can cause. By making UDP queues relatively short and
TCP queues longer in the router, there is a greater probability that the TCP queues
can behave in a way that attempts to avoid tail-drop congestion and therefore
increases network-clocked throughput efficiency.
Traffic shaping, using a token bucket at the network edges, to reduce the
burstiness of the data-traffic rates. This is somewhat controversial, because this
may be the result of a fee contract and not really a service-quality issue, per se. The
effect is a surrogate method of data-burst-rate limiting, using the queues at the
periphery of the network to reduce the level of queue load in the more critical central
interior of the network.
These actions alone have the potential to offer marked increases in the level of efficiency
of bandwidth usage in terms of actual data transfers. Random Early Detection provides
for the enhanced efficiency of bandwidth resource sharing, making best-effort more
uniformly available to all flows at any given instant, whereas queue segregation attempts
to limit the damage that high-volume real-time (externally clocked) flows may inflict on
network-clocked reliable data flows.
Weighted RED (WRED), with the weighting managed by IP precedence header field
values. This allows the network to signal lower-precedence TCP traffic flows to back
off their expanding window behavior gracefully, while higher-precedence traffic
continues to flow unimpeded.
202
These mechanisms constitute better-than-best-effort services, because there is no longer
a single best-effort paradigm within the network. These measures also are independent
of any dial-access trigger mechanism that may be used to complete the QoS picture.
Obviously, the access mechanism can be through a network-admission implementation,
which may combine other traffic-policy mechanisms with precedence settings, or it can
be a simple acceptance of user-defined precedence values, allowing the differentiation
function to be selected by the user rather than the network.
Deploying RSVP
One additional comment remains on the deployment by the user and the network of
RSVP. RSVP could possibly be quite expensive, and often the results may be no better
than a properly designed network. After implementing the mechanisms and principles
discussed here, you can cut down queue length, reduce excessive extraneous
congestion signaling, and control misbehaving flows so that propagation times more
accurately reflect physical signal-hop propagation.
It is not all bad news, however. The model used by the Internet is one of distributed
intelligence, where the functionality of the data flow is passed over the network fence to
the end-user’s platform. The interior of the network is reduced to its bare essentials of
basic transmission elements and simple switches. The result is a network of
unsurpassed cost ~fficiency, and it certainly is vastly different from the circuit-switching
models of previous communications systems in which significant functionality and cost is
preserved within the network in a centralized functionality model. The real long-term
challenge is scaling this model efficiently in terms of switching, and in terms of consumer
models, so that the cost of the transmission and switching elements is fairly distributed to
the consumer. QoS is an intermediate step and, unfortunately, is perhaps a recognition
of intermediate failure to converge on this path.
Choosing QoS
QoS is not a uniform requirement in any network. If networks are engineered to the point
where there is never any congestion, the argument for QoS weakens considerably. This
is not to say that the argument for service quality is any weaker; this is clearly not the
case. However, differentiating services is somewhat of a non-sequitur. Network
administrators will continue to be faced with the task of provisioning networks that meet
customer demands for availability, speed, latency, and so on.
In fact, and as mentioned earlier, it has been suggested that this becomes nothing more
than a pricing argument. There has been some indication that the initial hordes flocking
to get connected to the Internet have been carried through on the margins of oversupply
in, and subsequent build-out of, the traditional telephone system infrastructure. Now that
Internet traffic has effectively chewed through that, there is no high-margin cross-
subsidization agent for further build-out of Internet infrastructure, and prices will inevitably
rise. However, service providers cannot raise prices uniformly in a competitive market.
203
Accordingly, QoS may be the leverage mechanism for price escalation. QoS allows the
service provider to undertake selective price escalation, offering differentiation within
congestion periods as a mechanism for pricing increases. The current flat-rate pricing
mechanisms that dominate much of the current Internet landscape do not readily provide
the capability to rise to the financial challenge to install additional communications
capacity necessary to ride the continued aggressive expansion of the Internet.
v v v
Again, things are not bad all over—it just pays to do your homework. Networking in the
global Internet complicates matters tremendously, simply because of the diversity of
administration. The Internet is the closest thing resembling true chaos on planet Earth.
Network operators who want to implement QoS within a single administrative domain
arguably will have much better success in providing differentiated services and QoS
compared to anyone who attempts to provide similar services across multiple
administrative domains, at least for the foreseeable future. However, the same technical
principles still hold true, and similar considerations must be given to network
performance, stability, scale, management, and control. Private networks that must
purchase or lease wide-area network capacity from a telco or local common carrier also
may face the same unavailability of wide-area capacity, and network administrators who
choose to use public switched wide-area services may face even more insidious
problems. Regardless of whether you are trying to implement QoS in a private network or
within a segment of the global Internet, differentiation may come at a cost. There’s no
magic here. The cost may not be expressed in economic terms, but there are certainly
other prices to pay that cannot be calculated in dollars and cents.
Caveat emptor.
204
A Brief Multiprotocol Historical Perspective
In a broad sense, the annals of networking history have somewhat of a convoluted, and
sometimes conflicting, family tree. It is fairly clear, however, that IBM was one of the first
companies to successfully build, develop, and deliver a networking environment in its
Systems Network Architecture (SNA) platforms. Traditionally (and still to this day), large-
scale SNA networks have provided network computing systems for everything from
large-scale accounting applications to warehouse-inventory systems. In fact, when
network computing was in its infancy, IBM was truly one of the only games in town.
The early IBM mainframes were quite large—monoliths, in fact, compared to today’s
computing platforms—and the initial investment and ongoing maintenance costs were
equally substantial. Companies that invested in IBM mainframes (and other vendors’
mainframe platforms as well) fully expected to use them for a given calculated lifetime.
Companies that used these mainframes generally used a convoluted depreciation
schedule that dictated how long the platform had to be used before they could consider
moving to another technology platform without losing a substantial portion of their initial
investment.
This same paradigm exists with similar technologies, such as Digital Equipment
Corporation’s DECnet and LAT (Local-Area Transport) protocols, as well as more recent
network protocols such as Novell Corporation’s IPX (Internet Packet Exchange) protocol.
Significant numbers of networks still use these older protocols, and this introduces
significant complications with regard to providing any sort of structured QoS within a single
multiprotocol network substrate.
Migrating to TCP/IP
In recent years, many companies that own legacy multiprotocol networks of this sort
have expressed a desire to migrate these network applications to TCP/IP for ease of
administration, management, and engineering maintenance. Generally, managing a two-
protocol network is not just double the operational cost of running a single-protocol
network; it often is much harder and much more expensive. Accordingly, the desire to
operate a single-protocol system is not just for the sake of simplicity of the network, but is
often an outcome of the objective to operate the network within tight bounds of cost-
effectiveness parameters. However, it is not as simple as you might imagine.
As stated earlier, in many cases, economic factors are involved. Many corporations have
invested millions of dollars in legacy technologies and equipment. Just because a simpler
or more elegant solution has been introduced is not a compelling enough reason to
immediately upgrade these legacy platforms. Often, an organization will look for better
mousetraps—newer and improved ways of using existing legacy technologies. Of
course, religious and political issues may tend to delay a migration, but these are not
germane to this particular discussion. In any event, migrations to newer technologies do
not happen overnight and in fact may take years. This sometimes results in a very
205
unfortunate situation; a company may begin planning a migration to the latest and most
current technology, but by the time it completes the migration, the technology is
outdated.
206
Routed versus Bridged Legacy Protocols
One of the more significant complications these older legacy protocols present is the fact
that some of them cannot be routed. They can only be transparently or source-route
bridged, so the entire network become a single MAC-layer broadcast domain or
conceptually equivalent to a single local area network. The most insidious aspect of this
point is that this significantly affects how many hosts can be accommodated in the
network, because all participating hosts share the same broadcast domain. Bridging
simply does not scale.
Here is a specific example of a relatively commonly deployed protocol that has assumed
a number of performance parameters within the original protocol design that were
commonly seen in small-scale Ethernet networks in the late 1980s. As the network
environment became far more diverse, it became a significant challenge to explicitly
recreate these required conditions for LAT functionality in a wide-area multiprotocol
environment. A LAT implementation in the wide area, in this case, might suffer from poor
service quality, and at times, the parameters of the wide-area network are so far outside
the original protocol design that the protocol simply does not function at all. Nonetheless,
it is surprising how many networks still try to implement LAT services in a wide-area
network on low-speed, high-latency links where the 80-millisecond network transit time
simply is unachievable.
207
A lightweight real-time protocol. Such protocols typically have no concept of
adjustment to network congestion and simply can saturate the network with externally
clocked data sources. These applications can work well over small-scale, high-speed
local networks in which the underlying bandwidth resources are not under heavy
contention. But again, such protocols do not readily scale in larger and more diverse
corporate network environments. We can expect to see more of these protocols with
the onset of widespread multimedia applications.
It becomes apparent how QoS might be desired in cases such as this, using QoS
structures simply to provide priority to one protocol over another. In the example here, it
may be highly desirable to prioritize SNA traffic over all other protocols in the network,
especially if the company’s core business is impacted by the inability to conduct critical
SNA-based business transactions.
It is important to recognize the intrinsic difference between how traffic is forwarded when
it is bridged, as opposed to how it is forwarded when it is routed. When bridging,
decisions on how traffic is forwarded are determined by information contained in the link-
layer frame. When routing, the intermediate routers can look into information contained in
higher layers of the protocol stack information (namely, the network layer) to make more
intelligent decisions on how to forward traffic. Routing allows you to compartmentalize
information about the network topology so that excessive or irrelevant information is not
flooded onto every sublet in the network. When bridging, MAC (Media Access Control)
information about every host in the bridging domain also must be readily available to
every end-system; this consumes an inordinate amount of network bandwidth resources.
The major issue with wide-scale bridging is that it takes a non-routed LAN into areas in
which the protocol designers never contemplated. Operational parameters relevant to
protocol design will change in a wide-area bridged network, especially the number of
attached devices and the end-to-end propagation time. More fundamentally, the packet-
delivery concept is changed so that the reliability of packet transmission is reduced
substantially.
In a local Ethernet broadcast configuration, for example, the sender’s protocol stack can
make the assumption that if the transmission attempt does not generate a collision
indication, there is a high probability that the receiver has received the packet into its
input buffer. The probability of the receiver then silently discarding the packet is relatively
low, so that in this environment, if the sender’s protocol stack receives a signal from the
Ethernet driver of successful transmission, the sender can assume a high probability that
the packet has been received. In this environment, undetected packet loss within the
network is a rare occurrence, and the protocol designer can afford to make loss recovery
a more lengthy process as an extraordinary event.
In the wide-area network, this is clearly not the case, and the number of events that can
cause packet drop (without direct real-time notification to the sender) increases
dramatically. Protocols that assume that the Local Area Network is indeed local make the
fatal assumption that broadcast is easy, packet delivery reliability is high, and end-to-end
208
propagation time is short. Wide-area bridging abuses all three assumptions and seriously
degenerates the robustness of the original protocol design. QoS approaches can attempt to
ameliorate this by performing service differentiation within the combined bridge/router units
with a bias toward bridging, although this may not conform to other resource-allocation
objectives for the etwork.
Each of these queuing schemes will work with multiprotocol traffic, because the queue
does not make a distinction regarding what the packet it is given contains. It simply
accepts the packet and queues it for transmission. The task of distinguishing the traffic
type (what protocol the packet contains) is left to the classification process, which can be
used with any of these queuing schemes. Recall that default queuing behavior is single-
queue FIFO (First-In, First-Out). Although a simple priority queuing may also consist of a
singular queue, there must be a traffic classifier that identifies the incoming traffic and
performs packet reordering, moving higher-preference packets to the front of the queue.
Similarly, a traffic classifier is used to place packets into separate queues in which CBQ
and WFQ is used.
Of course, all the caveats mentioned earlier still apply here, so we will not repeat them.
However, it is important to understand that these queuing schemes have limited
applicability. At slower link speeds, of course, traffic coming into the router is at a slower
rate, so the CPU has more time to perform packet classification, reordering, and queue
management. However, at higher speeds, the CPU may become burdened to the point
where these tasks become a liability and performance degradation results.
209
At least one additional option is available for providing differentiated services in a
multiprotocol network, but it requires the use of a switched wide-area network (such as
Frame Relay or ATM). This particular method consists of directing traffic belonging to
one protocol family across a distinctly different traffic path, or virtual circuit, than traffic
belonging to other protocol families. If an organization specifically wants to ensure that its
IPX traffic does not interfere with its TCP/IP traffic between Chicago and Los Angeles, for
example, it simply could provision two PVCs between the routers at each location and
route TCP/IP traffic across one PVC and IPX across the other. This task is relatively
simple enough to accomplish. In fact, it is only a matter of enabling IPX routing on one
PVC, and alternatively, only enabling IP routing on the other.
Also, the PVCs could each have different CIRs (Committed Information Rates), so that
drop preferences in the frame relay network can be characteristically different for each
type of traffic. Granted, some downsides to this approach are obvious—namely, that
there is no redundant back-up path for either protocol in the event of PVC failure, and
there are certainly economic considerations in provisioning multiple PVCs for individual
protocol applications. Having said that, this method has proven to work quite well.
The capability that is readily feasible in either approach is to create a differentiated service
environment based on protocol bias using precedence-based mechanisms to allocate
network resources to various supported protocols within the network. Of course, the
ultimate objective is to provision the routers with a set of generic QoS actions and remotely
trigger those actions by setting protocol-specific header field values on an application-
specific basis. This is a compelling requirement for providing QoS fields within the packet
headers of various protocols—something that has not been a widespread feature of such
protocols to date. In the absence of such facilities, the pragmatic conclusion is that service
differentiation is feasible within multiprotocol networks on a protocol-by-protocol basis, but
finer levels of granularity are limited to protocols that have readily recognizable “hooks” that
can be used by the routers to treat specific applications, protocols, and user traffic with
preference.
210
SNA CoS: A Contradiction in Terms?
SNA—or, more appropriately, IBM networking—does indeed provide Classes of Services
(CoS) distinctions. However, it is imperative to understand how these classes work, when
they are relevant, and their potential applications in a corporate network. First, you
should briefly review the SNA architectural technology so that you have a basic
understanding of what SNA provides as far as QoS is concerned. This is in no way
intended to be an all-inclusive overview of SNA. On the contrary—entire volumes of
textbooks have been written on the topic, and we cannot hope to provide a synopsis of
the entire SNA architecture here. Here, we are simply illustrating the fact that IBM Class
of Service distinctions may not provide the desired QoS functionality in your network.
Terminals (clients)
The mainframe (or host) performs all the computational tasks, and the FEP performs all
the traffic routing for data in the network. Cluster controllers control the I/O operations of
network-attached devices, and the terminals provide the user interface to the network.
SNA does provide the capability for Classes of Services (CoS) parameters to be defined
for each session, either upon session establishment or manually by the user when
logging into the network. The CoS parameters include virtual path information, called
VRNs (virtual route numbers) and transmission priorities, called TPRIs (transmission
priorities). Characteristics specified by a VRN profile include considerations such as
response time, security level, and availability of the path. The TPRI granularity allows
three priorities for each virtual route: 0, 1, and 2, where 0 is the lowest transmission
priority and 2 is the highest.
All data paths in the SNA network are statically configured. Each of these CoS templates
also must be manually configured and uniformly synchronized between the FEP and the
mainframe before they can be used effectively. A CoS template can be configured
211
statically for each client node, or it can be assigned based on client specification during
network login. There are no real dynamics in traditional legacy SNA networks to speak
of—everything is statically preconfigured.
APPN
One of the more important modifications is the fact that APPN provides dynamic routing
similar to other link-state routing protocols. Network nodes within the APPN network
domain maintain a topology database that contains information used for calculating paths
with a particular CoS profile. However, like its predecessor, the CoS values are explicitly
defined—only the granularity has been increased.
A few salient points need to be reiterated here. One pertinent point is that SNA
communications occur at the data-link layer. Therefore, SNA traffic must be bridged in a
network; it cannot be routed. APPN does provide routing capabilities, but these haven’t
seemed to have integrated well into the traditional multiprotocol network, for reasons that
remain to be seen. Perhaps the complexity and migration path for APPN is considered
too risky, but this is purely speculation.
A more recent alternative to RSRB is Data Link Switching (DLSw), which is similar to
RSRB. The primary difference between RSRB and DLSw is that DLSw provides local
termination of link-layer acknowledgments and keep-alive messages. These Data Link
Control (DLC) messages are handled on an end-to-end or peer-to-peer basis with RSRB,
whereas they are terminated locally with DLSw. This provides quicker acknowledgment
of these local messages and considerably lessens the possibility of an error condition
due to an acknowledgment time-out.
The inherent problem in this scheme is similar to what was discussed in the chapter on
ATM regarding the “distortion effect.” Assuming that the underlying CoS distinctions
within the SNA network are configured properly, fully functional, and appropriately
engineered, there is still a possibility that SNA CoS distinctions for a session that
traverses the wide-area network encapsulated in TCP may experience session
degradation, or worse, session disconnect, when there is instability of congestion in the
212
TCP/IP network.
SNA CoS distinctions alone are insufficient in a multiprotocol network. The addition of
several mechanisms is necessary to ensure that TCP-encapsulated SNA sessions are
given preferential treatment at the TCP layer. This includes things already discussed,
such as congestion control, congestion avoidance, preferential packet-drop schemes,
and perhaps a queue-management mechanism to preferentially schedule these packets
for transmission.
v v v
Economists often describe economic history in terms of cycles of boom and bust.
Although we have not yet seen networking decline, let alone bust, we are witnessing
some other cyclical trends of networking. The initial deployment of computer networks
generally was dominated by single-vendor Information Technology (IT) environments and
the associated exclusive use of the vendor’s proprietary networking technologies.
Multivendor IT environments, like multiprotocol computer networks, were a gradual
refinement to this original model and were heralded by broadcast-based local area
networks and wide-area bridges. Multiprotocol routers soon appeared, and many
networks appeared to operate with a chaotic mix of TCP, SNA, IPX, DECnet, AppleTalk,
and X.25 transport protocols, with an equally chaotic mix of PAD (Protocol Access
Device), LAT (Local-Area Transport), and Telnet support for remote access.
213
Glossary
AAL ATM Adaptation Layer. A connection of protocols that takes data traffic znd frames
it into a sequence of 48-byte payloads for transmission over an ATM (Asynchronous
Transfer Mode) network. Currently, four AAL types are defined that support various
service categories. AAL1 supports constant bit-rate connection-oriented traffic. AAL2
supports time-dependant variable bit-rate traffic. AAL3/4 supports connectionless and
connection-oriented variable bit-rate traffic. AAL5 supports connection-oriented variable
bit-rate traffic.
ABR Available Bit Rate. One of the service categories defined by the ATM Forum. ABR
supports variable bit-rate traffic with flow control. The ABR service category supports a
minimum guaranteed transmission rate and peak data rates.
ANSI American National Standards Institute. One of the American technology standards
organizations.
ARP Address Resolution Protocol. The discovery protocol used by host computer
systems to establish the correct mapping of Internet layer addresses, also known as IP
addresses, to Media Access Control (MAC) layer addresses.
ARPA Advanced Research and Projects Agency. A U.S. federal research funding
agency credited with initially deploying the network now known as the Internet. The
agency was referred to as DARPA (Defense Advanced Research and Projects Agency)
in the past, indicating its administrative position as an agency of the U.S. Department of
Defense.
214
ATM Asynchronous Transfer Mode. A data-framing and transmission architecture that
features fixed-length data cells of 53 bytes, consisting of a fixed format of a 5-byte cell
header and a 48-byte cell payload. The small cell size is intended to support high-speed
switching of multiple traffic types. The architecture is asynchronous, so there is no
requirement for clock control of the switching and transmission.
BGP Border Gateway Protocol. An Internet routing protocol used to pass routing
information between different administrative routing domains or ASs (Autonomous
Systems). The BGP routing protocol does not pass explicit topology information. Instead,
it passes a summary of reachability between ASs. BGP is most commonly deployed as
an inter-AS routing protocol.
BRI Basic Rate Interface. A user interface to an ISDN (Integrated Services Digital
Network) that consists of two 64-Kbps data channels (B-Channels) and one 16-Kbps
signaling channel (D-channel) sharing a common physical access circuit.
CBQ Class Based Queuing. A queuing methodology by which traffic is classified into
separate classes and queued according to its assigned class in an effort to provide
differential forwarding behavior for certain types of network traffic.
CBR Constant Bit Rate. An ATM service category that corresponds to a constant
bandwidth allocation for a traffic flow. The CLP (Cell Loss Priority) bit is set to 0 in all
cells to ensure that they are not discard eligible in the event of switch congestion. The
service supports circuit emulation as well as continuous bitstream traffic sources (such
as uncompressed voice or video signals).
CDV Cell Delay Variation. An ATM QoS (Quality of Service) parameter that measures
the variation in transit time of a cell over a Virtual Connection (VC). For service classes
that are jitter sensitive, this is a critical service parameter.
CIDR Classless Inter Domain Routing. An Internet routing paradigm that passes both
the network prefix and a mask of significant bits in the prefix within the routing exchange.
This supercedes the earlier paradigm of classful routing, where the mask of significant
bits is inferred by the value of the prefix (where Class A network prefixes infer a mask of
215
8 bits, Class B network prefixes infer a mask of 16 bits, and Class C network prefixes
infer a mask of 24 bits). CIDR commonly is used to denote an Internet environment in
which no implicit assumption exists of the Class A, B, and C network addresses. BGP
(Border Gateway Protocol) version 4 is used as the de facto method of providing CIDR
support in the Internet today.
CIR Committed Information Rate. A Frame Relay term describing a minimum access
rate at which the service provider commits to provide the customer for any given
Permanent Virtual Circuit (PVC).
CLP Cell Loss Priority. A single-bit field in the ATM cell header to indicate the discard
priority. A CLP value of 1 indicates that an ATM switch can discard this cell in a
congestion condition.
CLR Cell Loss Ratio. An ATM QoS metric defined as the ratio of lost cells to the number
of transmitted cells.
CPE Customer Premise Equipment. The equipment deployed on the customer’s site
when the customer subscribes (or simply connects) to a carrier’s service.
CPU Central Processing Unit. The arithmetic, logic, and control unit of a computer that
executes instructions.
CTD Cell Transfer Delay. An ATM QoS metric that measures the transit time for a cell to
traverse a Virtual Connection (VC). The time is measured from source UNI (User-to-User
Interface) to destination UNI.
D-Channel Data Channel. A full-duplex control and signaling channel on an ISDN BRI
(Basic Rate Interface) or PRI (Primary Rate Interface). The D-Channel is 16 Kbps on an
ISDN BRI and 64 Kbps on a PRI.
DE Discard Eligible. A bit field defined within the Frame Relay header indicating that a
frame can be discarded within the Frame Relay switch when the local queuing load
exceeds a configured threshold.
Dijkstra algorithm Also commonly referred as SPF (Shortest Path First). The Dijkstra
algorithm is a single-source, shortest-path algorithm that computes all shortest paths
from a single point of reference based on a collection of link metrics. This algorithm is
216
used to compute path preferences in both OSPF (Open Shortest Path First) and IS-IS
(Intermediate System to Intermediate System). See also SPF.
DLC Data Link Control. Refers to IBM data-link layer support, which supports various
types of media, including mainframe channels, SDLC (Synchronous Data Link Control),
X.25, and token ring.
DLCI Data Link Connection Identifier. A numerical identifier given to the local end of a
Frame Relay Virtual Circuit (VC). The local nature of the DLCI is that it spans only the
distance between the first-hop Frame Relay switch and the router, whereas a VC spans
the entire distance of an end-to-end connection between two routers that use the Frame
Relay network for link-layer connectivity.
DLSw Data Link Switching. Provides a standards-based method for forwarding SNA
(Systems Network Architecture) traffic over TCP/IP (Transmission Control
Protocol/Internet Protocol) networks using encapsulation. DLSw provides enhancements
to traditional RSRB (Remote Source-Route Bridging) encapsulation by eliminating hop-
count limitations, removes unnecessary broadcasts and acknowledgments, and provides
flow-control.
DS0 Digital Signal Level 0. A circuit-framing specification for transmitting digital signals
over a single channel at 64 Kbps on a T1 facility.
DS1 Digital Signal Level 1. A circuit-framing specification for transmitting digital signals
at 1.544 Mbps on a T1 facility in the United States, or at 2.108 Mbps on an E1 facility
elsewhere.
DS3 Digital Signal Level 3. A circuit-framing specification for transmitting digital signals
at 44.736 Mbps on a T3 facility.
DTE Data Terminal Equipment. A device on the user side of a User-to-Network Interface
(UNI). Typically, this is a computer or a router.
E1 A WAN (Wide-Area Network) transmission circuit that carries data at a rate of 2.048
Mbps. Predominantly used outside the United States.
E3 A WAN transmission circuit that carries data at a rate of 34.368 Mbps. Predominantly
used outside the United States.
217
EPD Early Packet Discard. A congestion-avoidance mechanism generally found in ATM
networks. EPD uses a method to preemptively drop entire AAL5 (ATM Adaptation Layer
5) frames instead of individual cells in an effort to anticipate congestion situations and
make the most economic use of explicit signaling within the ATM network.
Ethernet A LAN (Local Area Network) specification invented by the Xerox Corporation
and then jointly developed by Xerox, Intel, and Digital Equipment Corporation. Ethernet
uses CSMA/CD (Carrier Sense Multiple Access/Collision Detection) and operates on
various media types. It is similar to the IEEE 802.3 series of protocols.
FAQ Frequently Asked Questions. Compiled lists of the most frequent questions and
their answers on a particular topic. An FAQ generally can be found in various formats,
such as HTML (Hyper Text Markup Lanuage) Web pages, as well as traditional printed
material.
FDDI Fiber Distributed Data Interface. A LAN standard defined in ANSI (American
National Standards Institute) Standard X3T9.5 that operates at 100 Mbps, uses a token-
passing technology, and uses fiber-optic cabling for physical connectivity. FDDI has a
base transmission distance of up to 2 kilometers and uses a dual-ring architecture for
redundancy.
FEP Front-End Processor. Typically, FEP describes the function of an IBM 3745, which
provides a networking interface to the SNA (Systems Network Architecture) network for
downstream nodes that have no knowledge of network data forwarding paths. The IBM
3745 FEP functions as an intermediary networking arbiter.
FIFO First In, First Out. FIFO queuing is a strict method of transmitting packets that are
presented to a device for subsequent transmission. Packets are transmitted in the order
in which they are received.
FIN FINish flag. Used in the TCP header to signal the end of TCP data.
FTP File Transfer Protocol. A bulk, TCP-based, transaction-oriented file transfer protocol
used in TCP/IP networks, especially the Internet.
Gbps Gigabits per second. The data world avoided using the term billion, which
invariably is interpreted as one thousand million or one million million, in favor of the term
giga as one thousand million. Of course, some confusion between the
telecommunications and data-storage worlds still exist as to whether a giga is really the
value 10or 2.
218
conformance for ATM VBR (Variable Bit Rate) Virtual Connections (VC). The GCRA is
an algorithm that uses traffic parameters to characterize traffic that is conformant to
administratively defined admission criteria. The GCRA implementation commonly is
referred to as a leaky bucket.
HSSI High-Speed Serial Interface. The networking standard for high-speed serial
connections for wide-area networks (WANs), accommodating link speeds up to 52 Mbps.
HTTP Hyper Text Transfer Protocol. A TCP-based application-layer protocol used for
communicating between Web servers and Web clients, also known as Web browsers.
IAB Internet Architecture Board. A collection of individuals concerned with the ongoing
architecture of the Internet. IAB members are appointed by the trustees of the Internet
Society (ISOC). The IAB also appoints members to several other organizations, such as
the IESG (Internet Engineering Steering Group).
iBGP Internal BGP or Interior BGP. A method to carry exterior routing information within
the backbone of a single administrative routing domain, obviating the need to redistribute
exterior routing into interior routing. iBGP is a unique implementation of BGP (Border
Gateway Protocol) rather than a separate protocol unto itself.
I-D Internet Draft. A draft proposal in the IETF submitted as a collaborative effort by
members of a particular working group or by individual contributors. I-Ds may or may not
be subsequently published as IETF Requests for Comments (RFCs).
IESG Internet Engineering Steering Group. IESG members are appointed by the Internet
Architecture Board (IAB) and manage the operation of the IETF.
IETF Internet Engineering Task Force. An engineering and protocol standards body that
develops and specifies protocols and Internet standards, generally in the network layer
and above. These include routing, transport, application, and occasionally, session-layer
protocols. The IETF works under the auspices of the Internet Society (ISOC).
Integrated Services In a broad sense, this term encompasses the transport of audio,
video, real-time, and classical data traffic within a single network infrastructure. In a more
narrow focus, it also refers to the Integrated Services architecture (the focus of the
Integrated Services working group in the IETF), which consists of five key components:
219
QoS requirements, resource-sharing requirements, allowances for packet dropping,
provisions for usage feedback, and a resource-reservation model (RSVP).
Internet The global Internet. Commonly used as a reference for the loosely
administered collection of interconnected networks around the globe that share a
common addressing structure for the interchange of traffic.
intranet Generally used as a reference for the interior of a private network, either not
connected to the global Internet or partitioned so that access to some network resources
is limited to users within the administrative boundaries of the domain.
I/O Input/Output. The process of receiving and transmitting data, as opposed to the
actual processing of the data.
IP Internet Protocol. The network-layer protocol in the TCP/IP stack used in the Internet.
IP is a connectionless protocol that provides extensibility for host and subnetwork
addressing, routing, security, fragmentation and reassembly, and as far as QoS is
concerned, a method to differentiate packets with information carried in the IP packet
header.
IP precedence A bit value that can be indicated in the IP packet header and used to
designate the relative priority with which a particular packet should be handled.
IPv4 Internet Protocol version 4. The version of the Internet protocol that is widely used
today. This version number is encoded in the first 4 bits of the IP packet header and is
used to verify that the sender, receiver, and routers all agree on the precise format of the
packet and the semantics of the formatted fields.
IPv6 Internet Protocol version 6. The version number of the IETF standardized next-
generation Internet protocol (IPng) proposed as a successor to IPv4.
IPX Internet Packet eXchange. The predominant protocol used in NetWare networks.
IPX was derived from XNS (Xerox Networking Services), a similar protocol developed by
the Xerox Corporation.
IRTF Internet Research Task Force. Composed of a number of focused and long-term
research groups, working on topics related to Internet protocols, applications,
architecture, and technology. The chair of the IRTF is appointed by the Internet
Architecture Board (IAB). The IRTF is described more fully in RFC2014.
ISDN Integrated Services Digital Network. An early adopted protocol model currently
offered by many telephone companies for digital end-to-end connectivity for voice, video,
and data.
ISO International Standards Organization. The complete name for this body is the
International Organization for Standardization and International Electrotechnical
Committee. The members of this body are the national standards bodies, such as ANSI
220
(American National Standards Institute) in the United States and BSI (the British
Standards Institution) in the United Kingdom. The documents produced by this body are
termed International Standards.
ISOC Internet Society. An international user society of Internet users and professionals
that share a common interest in the development of the Internet.
ISP Internet Service Provider. A service provider that provides external transit for a client
network or individual user, providing connectivity and associated services to access the
Internet.
ISSLL Integrated Services over Specific Link Layers. An IETF working group that
defines specifications and techniques needed to implement Internet Integrated Services
capabilities within specific subnetwork technologies, such as ATM or IEEE 802.3z
Gigabit Ethernet.
jitter The distortion of a signal as it is propagated through the network, where the signal
varies from its original reference timing. In packet-switched networks, jitter is a distortion
of the interpacket arrival times compared to the interpacket times of the original signal
transmission. Also known as delay variance.
Kbps Kilobits per second. A measure of data-transfer speed. Some confusion exists as
to whether this refers to a rate of 10bits per second or 2bits per second. The
telecommunications industry typically uses this term to refer to a rate of 10bits per
second.
LANE LAN Emulation. A technique and ATM forum specification that defines how to
provide LAN-based communications across an ATM subnetwork. LANE specifies the
communications facilities that allow ATM to be interoperable with traditional LAN-based
protocols, so that among other things, address resolution and broadcast services will
function properly.
Layer 1 Commonly used to describe the physical layer in the OSI (Open Systems
Interconnection) reference model. Examples include the copper wiring or fiber-optic
cabling that interconnects electronic devices.
Layer 2 Commonly used to describe the data-link layer in the OSI reference model.
221
Examples include Ethernet and ATM (Asynchronous Transfer Mode).
Layer 3 Commonly used to describe the network layer in the OSI reference model.
Examples include IP (Internet Protocol) and IPX (Internet Packet eXchange).
leaky bucket Generally, a traffic-shaping mechanism in which the input side of the
shaping mechanism is an arbitrary size, and the output side of the mechanism is of a
smaller, fixed size. This implementation has a smoothing effect on bursty traffic, because
traffic is “leaked” into the network at a fixed rate. Contrast with token bucket.
LEC Local Exchange Carrier. Usually considered the local telephone company or any
local telephony entity that provides telecommunications facilities within a local tariffing
area. See also RBOC.
LIJ Leaf Initiated Join. A feature introduced in the ATM Forum Traffic Management 4.0
Specification in which any remote node in an ATM network can connect arbitrarily to a
point-to-multipoint Virtual Connection (VC) without explicitly signaling the VC originator.
LLC Link Layer Control. The higher of the two sublayers of the data-link layer defined by
the IEEE. The LLC sublayer handles flow control, error correction, framing, and MAC-
sublayer addressing. See also MAC.
MAC Media Access Control. The lower of the two sublayers of the data-link layer
defined by the IEEE. The MAC sublayer handles access to shared media—for example,
Ethernet and token ring, and whether methods such as media contention or token
passing are used. See also LLC.
maxCTD Maximum Cell Transfer Delay. An ATM QoS metric that measures the transit
time for a cell to traverse a VC (Virtual Connection). The time is measured from the
source UNI (User-to-Network Interface) to the destination UNI.
Mbps Megabits per second. A unit of data transfer. The communications industry
typically refers to a mega as the value 106, whereas the data-storage industry uses the
same term to refer to the value 220.
MBS Maximum Burst Size. An ATM QoS metric describing the number of cells that may
be transmitted at the peak rate while remaining within the Generic Cell Rate Algorithm
222
(GCRA) threshold of the service contract.
MCR Minimum Cell Rate. An ATM service parameter related to the ATM Available Bit
Rate (ABR) service. The allowed cell rate can vary between the Minimum Cell Rate
(MCR) and the Peak Cell Rate (MCR) to remain in conformance with the service.
MPOA Multi Protocol Over ATM. An ATM Forum standard specifying how multiple
network-layer protocols can operate over an ATM substrate.
MSS Maximum Segment Size. A TCP option in the initial TCP SYN (Synchronize
Sequence Numbers) three-way handshake that specifies the maximum size of a TCP
data packet that the remote end can send to the receiver. The resultant TCP data-packet
size is normally 40 bytes larger than the MSS: 20 bytes of IP header and 20 bytes of
TCP header.
MTU Maximum Transmission Unit. The maximum size of a data frame that can be
carried across a data-link layer. Every host and router interface has an associated MTU
related to the physical media to which the interface is connected, and an end-to-end
network path has an associated MTU that is the minimum of the individual-hop MTUs
within the path.
NANOG North American Network Operators Group. A group of Internet operators who
share a mailing list. A subset of this group meets regularly in North America. The
conversation on the mailing list ranges from the pertinent to the inane. The overall
characterization of the group manages to remain as the conspicuous absence of suits
and ties.
NAS Network Access Server. A modernized and “kinder, gentler” form of its precursor,
the terminal server. In other words, a device used to terminate dial-up access to a
network. Predominantly used for analog or digital dial-up PPP (Point-to-Point Protocol)
access services.
NBMA Non Broadcast Multi Access. Describes a multiaccess network that does not
support broadcasting or on which broadcasting is not feasible.
NetWare A Novell, Inc. network operating system still largely popular in the corporate
enterprise. The use of NetWare is experiencing somewhat of a decline because of the
popularity and critical success of TCP/IP. IPX (Internet Packet eXchange) is the principal
protocol used in NetWare networks.
NGI Next Generation Internet. An obligatory inclusion in every current network research
proposal. Also used as a reference for a U.S. government sponsored advanced Internet
research initiative, called the Next Generation Internet Initiative, which is somewhat
controversial.
223
NHOP Next Hop, as referenced as an object within the Integrated Services Architecture
protocol specifications.
NHRP Next Hop Resolution Protocol. A protocol used by systems in an NBMA (Non
Braodcast Multi Access) network to dynamically discover the MAC address of other
connected systems.
NLRI Network Layer Reachability Information. Information carried within BGP updates
that includes network-layer information about the routing-table entries and associated
previous hops, annotated as prefixes (IP addresses).
NMS Network Management System. The distant dream of many a network operations
manager: a computer system that understands the network so well that it can warn the
operator of impending disaster (humor implied).
NNI Network-to-Network Interface. An ATM Forum standard that defines the interface
between two ATM switches operated by the same public or private network operator. The
term also is used within Frame Relay to define the interface between two Frame Relay
switches in a common public or private network.
NNTP Network News Transfer Protocol. An application protocol used to support the
transfer of network news (Usenet) within the Internet. The protocol is used for bulk news
transfer and remote access from clients to a central server. NNTP uses TCP to support
reliable transfer. This protocol is a point-to-point transfer protocol. Efforts to move to a
reliable multicast structure for Usenet news are still an active area of protocol refinement
and research.
NOC Network Operations Center. The people you try to ring when your network is down.
Traditionally staffed 24 hours a day, 7 days a week, the NOC primarily logs network-
problem reports and attempts to redirect responsibility for a particular network problem to
the appropriate responsible party for resolution. The NOC is analogous to a Help Desk.
nrt-VBR Non-Real-Time Variable Bit Rate. One of two variable-bit rate ATM service
categories in which timing information is not crucial. Generally used for delay-tolerant
applications with bursty characteristics.
NSF National Science Foundation. A U.S. government agency that funds U.S. scientific
research programs. This agency funded the operation of the academic and research
NSFnet (a successor of the ARPAnet and a predecessor to the current commodity
Internet) network from 1986 until 1995.
OSPF Open Shortest Path First. An interior gateway routing protocol that uses a link-
state protocol coupled with a Shortest Path First (SPF) path-selection algorithm. The
OSPF protocol is widely deployed as an interior routing protocol within administratively
discrete routing domains.
PAP PPP Authentication Protocol. A protocol that allows peers connected by a PPP link
to authenticate each other using the simple exchange of a username and password.
224
PCR Peak Cell Rate. An ATM service parameter. PCR is the maximum value of the
transmission rate of traffic on an Available Bit Rate (ABR) service category Virtual
Connection (VC).
PPP Point-to-Point Protocol. A data-link framing protocol used to frame data packets on
point-to-point links. PPP is a variant of the HDLC (High-Level Data Link Control) data-link
framing protocol. The PPP specification also includes remote-end identification and
authentication (PAP and CHAP), a link-control protocol (to establish, configure, and test
the integrity of data transmitted on the link), and a family of network-control protocols
specific to different network-layer protocols.
PRA Primary Rate Access. Commonly used as an off-hand reference for ISDN PRI
network access.
PSTN Public Switched Telephone Network. A generic term referring to the public
telephone network architecture.
QoS Quality of Service. Read this book and find out. Better yet, buy your own copy and
then read it.
QoSR Quality of Service Routing. A dynamic routing protocol that has expanded its
path-selection criteria to consider issues such as available bandwidth, link and end-to-
end path utilization, node-resource consumption, delay and latency, and induced jitter.
RBOC Regional Bell Operating Company. Specific to the United States. Basically, the
terms LEC (Local Exchange Carrier) and RBOC are interchangeable. RBOCs were
formed in 1984 with the breakup of AT&T. RBOCs handle local telephone service, while
AT&T and other long-distance companies, such as MCI and Sprint, handle long-distance
and international calling. The seven original RBOCs after the AT&T breakup were Bell
Atlantic, Southwestern Bell (recently changed to SBC, which acquired Pacific Bell on
225
April 1, 1996), NYNEX (recently merged with Bell Atlantic), Pacific Bell (bought by SBC),
Bell South, Ameritech, and U.S. WEST. Independent telephone companies also exist,
such as GTE, that cover particular areas of the United States. The current landscape in
the United States is still evolving. The Telecommunications Deregulation Act of 1996 now
allows both RBOCs and long-distance companies to sell local, long-distance, and
international services.
RFC Request For Comments. RFCs are documents produced by the IETF for the
purpose of documenting IETF protocols, operational procedures, and similarly related
technologies.
routing The process of calculating network topology and path information based on the
network-layer information contained in packets. Contrast with bridging.
RSRB Remote Source-Route Bridging. A method for encapsulating SNA traffic into TCP
for reliable transport, and the capability to be routed over a wide-area network (WAN).
RTT Round Trip Time. The time required for data traffic to travel from its origin to its
destination and back again.
rt-VBR Real-Time Variable Bit Rate. One of the two variable-bit rate ATM service
categories in which timing information is indeed critical. Generally used for delay-
intolerant applications with bursty transmission characteristics.
SBM Subnet Bandwidth Manager. A proposal in the IETF for handling resource
reservations on shared and switched IEEE 802-style local-area media. See also DSBM.
SCR Sustained Cell Rate. An ATM traffic parameter that specifies the average rate at
which ATM cells may be transmitted over a given Virtual Connection (VC).
SDH Synchronous Digital Hierarchy. The European standard that defines a set of
transmission and framing standards for transmitting optical signals over fiber-optic
cabling. Similar to the SONET standards developed by Bellcore.
226
SDLC Synchronous Data-Link Control. A serial, bit-oriented, full-duplex, SNA data-link
layer communications protocol. Precursor to several similar protocols, including HDLC
(High-Level Data Link Control).
SECBR Severely Errored Cell Block Rate. An ATM error parameter used to measure the
ratio of badly formatted cell blocks (or AAL frames) to blocks that have been received
error-free.
SLA Service Level Agreement. Generally, a service contract between a network service
provider and a subscriber guaranteeing a particular service’s quality characteristics.
SLAs vary from one provider to another and usually are concerned with network
availability and data-delivery reliability. Violations of an SLA by a service provider may
result in a prorated service rate for the next billing period for the subscriber as a
compensation for the service provider not meeting the terms of the SLA, for example.
SMTP Simple Mail Transfer Protocol. The Internet standard protocol for transferring
electronic mail.
SNA Systems Network Architecture. General reference for the large, complex network
systems architecture developed by IBM in the 1970s.
SPF Shortest Path First. Also commonly referred as the Dijkstra algorithm. SPF is a
single-source, shortest-path algorithm that computes all shortest paths from a single
point of reference based on a collection of link metrics. This algorithm is used to compute
path preferences in both OSPF and IS-IS. See also Dijkstra algorithm.
SRB Source-Route Bridging. A method of bridging developed by IBM and used in token
ring networks, where the entire route to the destination is determined prior to the
transmission of the data. Contrast with transparent bridging.
SVC Switched Virtual Connection or Switched Virtual Circuit. A Virtual Circuit (VC)
dynamically established in response to UNI (User-to-Network Interface) signaling and
torn down in the same fashion.
SYN SYNchronize sequence numbers flag. Contained in the TCP header. A bit field in
the TCP header used to negotiate TCP session establishment.
T3 A WAN transmission circuit that carries DS3-formatted data at a rate of 44.736 Mbps.
Predominantly used within the United States.
227
TCP Transmission Control Protocol. TCP is a reliable, connection- and byte-oriented
transport layer protocol within the TCP/IP protocol suite. TCP packetizes data into
segments, provides for packet sequencing, and provides end-to-end flow control. TCP is
used by many of the popular application-layer protocols, such as HTTP, Telnet, and FTP.
TLV Type, Length, Value. A standard IETF format for protocol packet formats, where
individual fields are allocated to indicate the type and length of a particular packet, as
determined by a specific value expressed in each field.
TOS Type of Service. A bit field in the IP packet header designed to contain values
indicating how each packet should be handled in the network. This particular field has
never been used much, though.
transparent bridging A method of bridging used in Ethernet and IEEE 802.3 networks
by which frames are forwarded along one hop at a time, based on forwarding information
at each hop. Transparent bridging gets its name from the fact that the bridges
themselves are transparent to the end-systems. Contrast with SRB (Source-Route
Bridging).
TTL Time To Live. A field in an IP packet header that indicates how long the packet is
valid. The TTL value is decremented at each hop, and when the TTL equals 0, the
packet no longer is considered valid, because it has exceeded its maximum hop count.
UBR Unspecified Bit Rate. An ATM service category used for best-effort traffic. The
UBR service category provides no QoS controls, and all cells are marked with the Cell
Loss Priority (CLP) bit set. This indicates that all cells may be dropped in the case of
network congestion.
UNI User-to-Network Interface. Commonly used to refer to the ATM Forum specification
for ATM signaling between a user-based device, such as a router or similar end-system,
and the ATM switch.
UPC Usage Parameter Control. A reference to the traffic policing done on ATM traffic at
the ingress ATM switch. UPC is performed at the ATM UNI level and in conjunction with
the GCRA (Generic Cell Rate Algorithm) implementation.
228
VBR Variable Bit Rate. An ATM service characterization for traffic that is bursty by
nature or is variable in the average, peak, and minimum rates in which data is
transmitted. There are two service categories for VBR traffic: Real-Time and Non-Real-
Time VBR. See also rt-VBR and nrt-VBR.
VCI Virtual Connection Identifier or Virtual Circuit Identifier. A numeric identifier used to
identify the local end of an ATM VC. The local nature of the VCI is that it spans only the
distance between the first-hop ATM switch and the end-system (e.g., router), whereas a
VC spans the entire distance of an end-to-end connection between two routers that use
the ATM network for link-layer connectivity.
VLAN Virtual Local Area Network or Virtual LAN. A networking architecture that allows
end-systems on topological disconnected subnetworks to appear to be connected on the
same LAN. Predominantly used in reference to ATM networking. Similar in functionality
to bridging.
VP Virtual Path. A connectivity path between two end-systems across an ATM switching
fabric. Similar to a VC. However, a VP can carry several VCs within it. Contrast with VC.
VPDN Virtual Private Dial Network. A VPN tailored specifically for dial-up access. A
more recent example of this is L2TP (Layer 2 Tunneling Protocol), where tunnels are
created dynamically when subscribers dial into the network, and the subscriber’s initial
Layer 3 connectivity is terminated on an arbitrary tunnel end-point device that is
predetermined by the network administrator.
VPI Virtual Path Identifier. A numeric identifier used to identify the local end of an ATM
VP. The local nature of the VPI is that it spans only the distance between the first-hop
ATM switch and the end-system (e.g., router), whereas a VP itself spans the entire
distance of an end-to-end connection between two routers that use the ATM network for
link-layer connectivity.
VPN Virtual Private Network. A network that can exist discretely on a physical
infrastructure consisting of multiple VPNs, similar to the “ships in the night” paradigm.
There are many ways to accomplish this, but the basic concept is that many individual,
discrete networks may exist on the same infrastructure without knowledge of one
another’s existence.
WAN Wide-Area Network. A network environment where the elements of the network
are located at significant distances from each other, and the communications facilities
typically use carrier facilities rather than private wiring. Typically, the assistance of a
routing protocol is required to support communications between two distant host systems
on a WAN.
WFQ Weighted Fair Queuing. A combination of two distinct concepts air queuing and
229
preferential weighting. WFQ allows multiple queues to be defined for arbitrary traffic
flows, so that no one flow can starve other, lesser flows of network resources. The
weighting component in WFQ is that the administrator can create the queue size and
also delegate what traffic is identified for a particular-sized queue.
WRED Weighted Random Early Detection or Weighted RED. A variant of the standard
RED mechanism for routers, in which the threshold for packet discard varies according to
the IP precedence level of the packet. The weighting is such that RED is activated at
higher queue-threshold levels for higher-precedence packets.
WWW World Wide Web. A global collection of Web servers interconnected by the
Internet that use the HTTP (Hyper Text Transfer Protocol).
230
Bibliography
[AF1995a] The ATM Forum Technical Committee, “BISDN Inter Carrier Interface (B-ICI)
Specification Version 2.0 (Integrated),” af-bici-0013.003, December 1995.
[AF1995b] The ATM Forum Technical Committee, “LAN Emulation over ATM Version 1.0
Specification,” af-lane-0021.000, January 1995.
[AF1997a] “ATM Service Categories: The Benefit to the User,” Editor: Livio Lambarelli,
CSELT, for the ATM Forum.
www.atmforum.com/atmforum/library/service_categories.html
[AF1997b] The ATM Forum Technical Committee, “Multi Protocol Over ATM (MPOA)
Specification Version 1.0,” af-mpoa-0087.000, July 1997.
[Cisco1995] “Internetworking Terms and Acronyms,” September 1995, Text Part Number
78-1419-02, Cisco Systems, Inc.
231
Jacobson, IEEE/ACM Transactions on Networking, v. 1, n. 4, August 1993.
[ID1996g] IETF Internet Draft, “BGP Route Calculation Support for RSVP, ” draft-
zappala-bgp-rsvp-01.txt, D. Zappala, June 1996.
[ID1996h] IETF Internet Draft, “RSRR: A Routing Interface for RSVP,” draft-ietf-rsvp-
routing-01.txt, D. Zappala, November 1996.
[ID1997e] IETF Internet Draft, “Partial Service Deployment in the Integrated Services
Architecture,” drafte-ietf-rsvp-partial-service-00.txt, L. Breslau, S. Shenker, April 1997.
[ID1997h] IETF Internet Draft, “Open Outsourcing Policy Service (OOPS) for RSVP,”
draft-ietf-rsvp-policy-oops-01.txt, S. Herzog, D. Pendarakis, R. Rajan, R. Guerin, April
1997.
[ID1997i] IETF Internet Draft, “RSVP Extensions for Policy Control,” draft-ietf-rsvp-policy-
ext-02.txt, S. Herzog, April 1997.
[ID1997j] IETF Internet Draft, “Providing Integrated Services over Low-Bitrate Links,”
draft-ietf-issll-isslow-02.txt, C. Bormann, May 1997.
[ID1997k] IETF Internet Draft, “Network Element Service Specification for Low Speed
Networks,” draft-ietf-issll-isslow-svcmap-00.txt, S. Jackowski, May 1997.
[ID1997l] IETF Internet Draft, “Flow Grouping for Reducing Reservation Requirements
For Guaranteed Delay Service,” draft-rampal-flow-delay-service-01.txt, S. Rampal, R.
Guerin, July 1997.
[ID1997m] IETF Internet Draft, “A Framework for Integrated Services and RSVP over
ATM,”draft-ietf-issll-atm-framework-00.txt, E. Crawley, L. Berger, S. Berson, F. Baker, M.
Borden, J. Krawczyk, July 1997.
[ID1997n] IETF Internet Draft, “Issues for Integrated Services and RSVP over ATM,”
draft-ietf-issll-isatm-issues-00.txt, E. Crawley, L. Berger, S. Berson, F. Baker, M. Borden,
232
J. Krawczyk, July 1997.
[ID1997p] IETF Internet Draft, “NBMA Next Hop Resolution Protocol (NHRP),” draft-ietf-
rolc-nhrp-11.txt, D. Katz, D. Piscitello, B. Cole, J. Luciani, March 1997.
[ID1997q] IETF Internet Draft, “A Framework for Providing Integrated Services over
Shared Media and Switched LAN Technologies,” draft-ietf-issll-is802-framework-02.txt,
A. Ghanwani, J. W. Pace, V. Srinivasan, May 1997.
[ID1997r] IETF Internet Draft, “SBM (Subnet Bandwidth Manager): A Proposal for
Admission Control over IEEE 802-style networks,” draft-ietf-issll-is802-sbm-04.txt, R.
Yavatkar, D. Hoffman, Y. Bernet, F. Baker, July 1997.
[ID1997s] IETF Internet Draft, “Integrated Service Mappings on IEEE 802 Networks,”
draft-ietf-issll-is802-svc-mapping-00.txt, M. Seaman, A. Smith, E. Crawley, July 1997.
[ID1997t] IETF Internet Draft, “A Framework for QoS-based Routing in the Internet,”
draft-ietf-qosr-framework-01.txt, E. Crawley, R. Nair, B. Rajagopalan, H. Sandick, July
1997.
[ID1997u] IETF Internet Draft, “QoS Routing Mechanisms and OSPF Extensions,” draft-
guerin-qos-routing-ospf-01.txt, R. Guerin, S. Kamat, A. Orda, T. Przygienda, D. Williams,
March 1997.
[ID1997v] IETF Internet Draft, “Protocol for Exchange of PoliCy Information (PEPCI),” J.
Boyle, R. Cohen, L. Cunningham, D. Durham, A. Sastry, R. Yavatkar, July 1997.
[ID1997w] IETF Internet Draft, “A Framework for Multiprotocol Label Switching,” draft-ietf-
mpls-framework-00.txt, R. Callon, P. Doolan, N. Feldman, A. Fredette, G. Swallow, A.
Viswanathan, May 1997.
[ID1997y] IETF Internet Draft, “QoS Path Management with RSVP,” draft-guerin-qospath-
mgmt-rsvp-00.txt, R. Guerin, S. Kamat, S. Herzog, March 1997.
[ID1997z] IETF Internet Draft, “Setting up Reservations on Explicit Paths using RSVP,”
draft-guerin-expl-path-rsvp-00.txt, R. Guerin, S. Kamat, E. Rosen, July 1997.
[ID1997a1] IETF Internet Draft, “Layer Two Tunneling Protocol ‘L2TP,’” draft-ietf-pppext-
l2tp-04.txt, K. Hamzeh, T. Kolar, M. Littlewood, G. Singh Pall, J. Taarud, A. Valencia, W.
Verthein, June 1997.
[ID1997a2] IETF Internet Draft, “The Multi-Class Extension to Multi-Link PPP,” draft-ietf-
issll-isslow-mcml-02.txt, C. Bormann, May 1997.
[ID1997a3] IETF Internet Draft, “Internet Protocol, Version 6 (IPv6) Specification,” draft-
ietf-ipngwg-ipv6-spec-v2-00.txt, S. Deering, R. Hinden, July 1997.
233
[ID1997a4] IETF Internet Draft, “Soft State Switching: A Proposal to Extend RSVP for
Switching RSVP Flows,” draft-viswanathan-mpls-rsvp-00.txt, A. Viswanathan, V.
Srinivasan, S. Blake, August 1997.
[ID1997a5] IETF Internet Draft, “Use of Label Switching With RSVP,” draft-davie-mpls-
rsvp-00.txt, B. Davie, Y. Rekhter, E. Rosen, May 1997.
[IEEE-2] “Supplement to MAC Bridges: Traffic Class Expediting and Dynamic Multicast
Filtering,” IEEE P802.1p/D6, May 1997.
[IEEE-3] “Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access
Method and Physical Layer Specifications,” ANSI/IEEE Std. 802.3-1985.
[IEEE-4] “Token-Ring Access Method and Physical Layer Specifications,” ANSI/IEEE std.
802.5-1995.
[IEEE-5] “Draft Standard for Virtual Bridged Local Area Networks,” IEEE P802.1Q/D6,
May 1997.
[IETF1988] RFC1058, “Routing Information Protocol,” C. Hedrick, June 1988. See also:
RFC1923, “RIPv1 Applicability Statement for Historic Status,” J. Halpern, S. Bradner,
March 1996.
[IETF1990b] RFC1195, “Use of OSI IS-IS for Routing in TCP/IP and Dual Environments,”
R. Callon, December 1990.
[IETF1992b] RFC1349, “Type of Service in the Internet Protocol Suite,” P. Almquist, July
1992.
234
[IETF1993a] RFC1492, “An Access Control Protocol, Sometimes Called TACACS,” C.
Finseth, July 1993.
[IETF1994c] RFC1577, “Classical IP and ARP over ATM,” M. Laubach, January 1994.
[IETF1995d] RFC1809, “Using the Flow Label Field in IPv6,” C. Partridge, June 1995.
[IETF1996b] RFC2022, “Support for Multicast over UNI3.0/3.1 based ATM Networks,” G.
Armitage, November 1996.
235
[IETF1997c] RFC2068, “Hypertext Transfer Protocol—HTTP/1.1,” R. Fielding, J. Gettys,
J. Mogul, H. Frystyk, T. Berners-Lee, January 1997.
[IETF1997j] RFC2210, “The Use of RSVP with IETF Integrated Services,” J. Wroclawski,
September 1997.
[INTSERVa] The IETF Integrated Services Working Group charter can be found at
www.ietf.org/html.charters/intserv-charter.html.
[Nagle1985] RFC970, “On Packet Switches with Infinite Storage,” J. Nagel, December
1985.
236
[Partridge1994b] “Gigabit Networking,” by Craig Partridge, page 276, section 12.4,
“Weighted Fair Queuing,” Addison-Wesley Publishing, 1994, ISBN 0-201-56333-9.
237