Chapter 4 - Resource Monitoring & Management
Chapter 4 - Resource Monitoring & Management
1
Chapter 4: Resource Monitoring and Management
ii. The data produced from monitoring is analyzed and a course of action (normally performance
tuning and/or the procurement of additional hardware) is taken to resolve the problem
iii. Monitoring to ensure that the performance problem has been resolved
Because of this, performance monitoring tends to be relatively short-lived in duration and more
detailed in scope.
Note: System performance monitoring is an iterative process, with these steps
being repeated several times to arrive at the best possible system performance.
The primary reason for this is that system resources and their utilization tend
to be highly interrelated, meaning that often the elimination of one resource
bottleneck uncovers another one.
2
Chapter 4: Resource Monitoring and Management
The following subsections explore the types of utilization information that would be helpful for each of
the major resource types.
4.1.1.1. Monitoring CPU Power
In its most basic form, monitoring CPU power can be no more difficult than determining if CPU
utilization ever reaches 100%. If CPU utilization stays below 100%, no matter what the system is
doing, there is additional processing power available for more work.
However, it is a rare system that does not reach 100% CPU utilization at least some of the time. At that
point it is important to examine more detailed CPU utilization data. By doing so, it becomes possible to
start determining where the majority of your processing power is being consumed. Here are some of
the more popular CPU utilization statistics:
➢ User Versus System
➢ Context Switches
➢ Interrupts
➢ Runnable Processes
A process may be in different states. For example, it may be:
➢ Waiting for an I/O operation to complete
➢ Waiting for the memory management subsystem to handle a page fault
In these cases, the process has no need for the CPU.
However, eventually the process state changes, and the process becomes runnable. As the name
implies, a runnable process is one that is capable of getting work done as soon as it is scheduled to
receive CPU time. However, if more than one process is runnable at any given time, all but one
(assuming a single-processor computer system) of the runnable processes must wait for their turn at the
CPU. By monitoring the number of runnable processes, it is possible to determine how CPU-bound
your system is.
Other performance metrics that reflect an impact on CPU utilization tend to include different services
the operating system provides to processes. They may include statistics on memory management, I/O
processing, and so on. These statistics also reveal that, when system performance is monitored, there
are no boundaries between the different statistics. In other words, CPU utilization statistics may end up
pointing to a problem in the I/O subsystem, or memory utilization statistics may reveal an application
design flaw.
Therefore, when monitoring system performance, it is not possible to examine any one statistic in
complete isolation; only by examining the overall picture it is possible to extract meaningful
information from any performance statistics you gather.
4.1.1.2. Monitoring Bandwidth
Monitoring bandwidth is more difficult than the other resources described here. The reason for this is
due to the fact that performance statistics tend to be device-based, while most of the places where
bandwidth is important tend to be the buses that connect devices. In those instances where more than
one device shares a common bus, you might see reasonable statistics for each device, but the aggregate
load those devices place on the bus would be much greater.
3
Chapter 4: Resource Monitoring and Management
Another challenge to monitoring bandwidth is that there can be circumstances where statistics for the
devices themselves may not be available. This is particularly true for system expansion buses and
datapaths. However, even though 100% accurate bandwidth-related statistics may not always be
available, there is often enough information to make some level of analysis possible, particularly when
related statistics are taken into account.
Some of the more common bandwidth-related statistics are:
➢ Bytes received/sent: Network interface statistics provide an indication of the bandwidth
utilization of one of the more visible buses -- the network.
➢ Interface counts and rates: These network-related statistics can give indications of excessive
collisions, transmit and receive errors, and more. Through the use of these statistics (particularly
if the statistics are available for more than one system on your network), it is possible to
perform a modicum of network troubleshooting even before the more common network
diagnostic tools are used.
➢ Transfers per Second: Normally collected for block I/O devices, such as disk and high-
performance tape drives, this statistic is a good way of determining whether a particular device's
bandwidth limit is being reached. Due to their electromechanical nature, disk and tape drives
can only perform so many I/O operations every second; their performance degrades rapidly as
this limit is reached.
4.1.1.3. Monitoring Memory
If there is one area where a wealth of performance statistics can be found, it is in the area of monitoring
memory utilization. Due to the inherent complexity of today's demand-paged virtual memory operating
systems, memory utilization statistics are many and varied. It is here that the majority of a system
administrator's work with resource management takes place.
The following statistics represent a cursory overview of commonly-found memory management
statistics:
➢ Page Ins/Page Outs: These statistics make it possible to gauge the flow of pages from system
memory to attached mass storage devices (usually disk drives). High rates for both of these
statistics can mean that the system is short of physical memory and is thrashing, or spending
more system resources on moving pages into and out of memory than on actually running
applications.
➢ Active/Inactive Pages: These statistics show how heavily memory-resident pages are used. A
lack of inactive pages can point toward a shortage of physical memory.
➢ Free, Shared, Buffered, and Cached Pages: These statistics provide additional detail over the
more simplistic active/inactive page statistics. By using these statistics, it is possible to
determine the overall mix of memory utilization.
➢ Swap Ins/Swap Outs: These statistics show the system's overall swapping behavior. Excessive
rates here can point to physical memory shortages.
Successfully monitoring memory utilization requires a good understanding of how demand-paged
virtual memory operating systems work, which alone could take up an entire book.
4
Chapter 4: Resource Monitoring and Management
5
Chapter 4: Resource Monitoring and Management
The system stack continues on below the software. Hardware issues can be prevented through
hardware monitoring. You will need to monitor servers, network devices, interface performance, and
network link capacity. You need to monitor many different types of interacting system elements to keep
your IT services running smoothly.
6
Chapter 4: Resource Monitoring and Management
7
Chapter 4: Resource Monitoring and Management
You are probably familiar with the way that pressing Ctrl+Alt+Delete on your keyboard. Before
Windows Vista was released, this way can bring you directly to Task Manager. Starting with Windows
Vista, pressing Ctrl+Alt+Delete now leads to the Windows Security interface, which provides options
for locking your PC, switching users, signing out, changing a password, and running Task Manager.
The quickest way to start Task Manager is to press Ctrl+Shift+Esc, and it will take you directly to it.
If you prefer using a mouse over a keyboard, one of the quickest ways to launch Task Manager is to
right-click on any blank area on the taskbar and select Task Manager. Just need two clicks.
You can also run Task Management by hitting Windows+R to open the Run box, typing taskmgr and
then hitting Enter or clicking OK.
In fact, you can also open the Task Manager by Star menu, Windows Explorer, or creating a shortcut...
8
Chapter 4: Resource Monitoring and Management
9
Chapter 4: Resource Monitoring and Management
The Performance tab is available in all versions of Windows that is a summary of what's going on,
overall, with your major hardware components, including CPU, memory, disk drive, Wi-Fi, and
network usage. It displays how much the computer's available system resources are being used, so you
can check the valuable information.
For example, this tab makes it easy to see your CPU model and maximum speed, RAM slots in use,
disk transfer rate, your IP address...Newer versions of Windows also display usage charts. What’s
more? There is a quick link to the Resource Monitor at the bottom of this tab.
App History
The App History tab displays the CPU usage and network utilization that each Windows app has used
from the date listed on the screen until the time you enter Task Manager. App History is only available
in Task Manager in Windows 10 and Windows 8.
Startup
The Startup tab shows every program that is launched automatically each time you start your computer,
along with several important details about each program, including the Publisher, Status, and Startup
impact which is the most valuable information - shows the impact rating of high, medium or low.
This tab is great for identifying and then disabling programs that you don't need them to run
automatically. Disabling Windows auto-start programs is a very simple way to speed up your computer.
Startup tab is only available in Task Manager in Windows 10 and Windows 8.
Users
The Users tab shows users currently signed in to the computer and the processes are running within
each. The Users tab is available in all Windows versions of Task Manager but only shows processes
that each user is running in Windows 10 and Windows 8.
Details
The Details tab contains full details of each process running on your computer. The information
provided in this tab is useful during advanced troubleshooting. Details tab is available in Task Manager
in Windows 10 and Windows 8, and the features of the Processes tab are similar to Details in earlier
versions of Windows.
Services
The Services tab is available in Task Manager in Windows 10, 8, 7, and Vista that shows all of the
Windows Services currently running on the computer with the Description and Status. The status is
Running or Stopped, and you can change it.
10
Chapter 4: Resource Monitoring and Management
11
Chapter 4: Resource Monitoring and Management
The sidebar displays graphs that highlight the CPU, Disk, Network, and Memory use over a period of
60 seconds.
You can hide and show elements with a click on the arrow icon in title bars. Another option that you
have to customize the interface is to move the mouse cursor over dividers in the interface to drag the
visible area. Use it to increase or decrease the visible area of the element.
You may want to hide the graphs, for instance, to make more room for more important data and run the
Resource Monitor window in as large of a resolution as possible.
The overview tab is a good starting point, as it gives you an overview of the resource usage. It
highlights CPU and memory usage, disk utilization, and network use in real-time.
Each particular listing offers a wealth of information. The CPU box lists process names and IDs, the
network box IP addresses and data transfers, the memory box hard faults, and the disk box read and
write operations.
One interesting option that you have right here and there is to select one or multiple processes under
CPU to apply filters to the Disk, Network and Memory tab.
If you select a particular process under CPU, Resource Monitor lists the disk, network and memory
usage of that process only in its interface. This is one of the differences to the Task Manager, as you
cannot do something like that in the tool.
12
Chapter 4: Resource Monitoring and Management
13
Chapter 4: Resource Monitoring and Management
The Storage listing at bottom lists all available drives, available and total space on the drive, as well as
active time. The graphs visualize the disk queue length. It is an indicator for requests of that particular
disk and is a good indicator to find out if disk performance cannot keep up with I/O operations.
14
Chapter 4: Resource Monitoring and Management
Network bandwidth
Use of bandwidth can also be monitored by a network bandwidth monitor. Network bandwidth is a
fixed commodity. There are several ways to use network bandwidth. First, you can control the data
flow in your Internet connection. That is you can streamline data from one point to another point. Next,
you can also optimize data so that it consumes less bandwidth from what is allocated.
In summary, bandwidth is the amount of information and Internet connection can handle in a given
period. An Internet connection operates much faster or slower depending on whether the bandwidth is
large or small. With a larger bandwidth, the set of data transmission is much faster than an Internet
connection with a lower bandwidth.
4.1.3. Network Printers
Network printing allows us to efficiently use printing resources. With network printing we first connect
all of our work stations to a network and then we implement a network printer. In general there are two
ways this can be done. In first method we take a regular printer and plug it into the back of one of the
PCs. On the picture below that PC is named Workstation 1. Then we share that printer on the network
by going to the printer properties in Windows.
In second method we implement the type of printer that has its own network interface installed (either
wired or wireless). This way we can connect our printer directly to the network so the print jobs can be
sent from workstations directly to that network printer.
15
Chapter 4: Resource Monitoring and Management
Figure 4.6. Shared printer with its own dedicated NIC (Network Interface Card)
The print job doesn’t have to go through the workstation such as in the first case. To connect to a
network attached printer we can create a printer object using a TCP/IP port. We use the IP address and
port name information to connect to the printer.
Print Port
When a client needs to send print job to network printer, client application formats print job and sends
it to print driver. Just as a traditional print job, it’s saved on the local work station in the spool. Then the
job is sent from the spool to the printer. In traditional set up the computer will send the job through the
parallel or USB cable to the printer. In the network printing set up, the job is redirected. The print job
goes out through the network board, then the network, and then arrives at destination network printer.
Drivers
Each network host that wants to use the network printer must have the corresponding printer driver
installed. When we share a printer in Windows, the current printer driver is automatically delivered to
clients that connect to the shared printer. If the client computers run a different version of Windows, we
can add the necessary printer drivers to the printer object. To add drivers for network users we can use
the ‘Advanced’ and ‘Sharing’ tab in printer properties.
Print Server
An important component of any network printer that we have is the print server. The print server
manages the flow of documents sent to the printer. Using a print server lets us customize when and how
documents print. There are different types of print servers. In the first scenario where we have attached
ordinary printer to our workstation, the printer has no print server hardware built in. In this case the
operating system running on Workstation 1 functions as a print server. It receives the jobs from the
other clients, saves them locally in a directory on the hard drive and spools them off to the printer one
at a time as the printer becomes ready. The computer can fill other roles on the network in addition to
being the print server. Most operating systems include print server software.
16
Chapter 4: Resource Monitoring and Management
Some printers, like our printer from the second scenario, have a built in print server that’s integrated
into the hardware of the printer itself. It receives the print jobs from the various clients, queues them
up, gives them priority and sends them on through the printing mechanism as it becomes available. We
often refer to this type of print server as internal print server. We use special management software to
connect to this kind of print server and manage print jobs.
Print servers can also be implemented in another way. We can purchase an external print server. The
external print server has one interface that connects to the printer (parallel or USB interface), and it also
has a network jack that plugs into our HUB or switch. It provides all the print server functions but it’s
all built into the hardware of the print server itself. So, when clients send a job to the printer, the jobs
are sent through the network to the hardware print server which then formats, prioritizes, saves them in
the queue, and then spools them off to the printer one at a time as the printer becomes available.
Different operating systems implement servers in different ways, and different external or internal print
servers also function in different ways. Because of that we need to check our documentation to see how
to set it up with our specific hardware or software.
Remember: We can share our existing printers on the network or we can set up a
printer which has its own NIC and which is then directly connected to the
network. Print server formats, prioritizes, queues and then spools print jobs.
Connecting
When the client connects to host computer, a window showing the Desktop of the host usually appears.
The client may then control the host as if he/she were sitting right in front of it. Windows has a built-in
remote administration package called Remote Desktop Connection. A free cross-platform alternative
is VNC, which offers similar functionality.
17
Chapter 4: Resource Monitoring and Management
18
Chapter 4: Resource Monitoring and Management
The best-known application of the tool is for access to shell accounts on Unix-like operating systems-
GNU/Linux, OpenBSD, FreeBSD, but it can also used in a similar fashion for accounts on Windows.
SSH is generally used to log into a remote machine and execute commands. It also supports tunneling,
forwarding TCP ports and X11 connections, it can transfer files using the associated SSH file transfer
(SFTP) or secure copy (SCP) protocols. SSH uses the client-server model.
SSH is important in cloud computing to solve connectivity problems, avoiding the security issues of
exposing a cloud-based virtual machine directly on the Internet. An SSH tunnel can provide a secure
path over the Internet, through a firewall to a virtual machine.
OpenSSH (OpenBSD Secure Shell)
OpenSSH is a tool providing encrypted communication sessions over a computer network using the
SSH protocol. It was created as an open source alternative to the proprietary Secure Shell software
suite offered by SSH Communications Security.
Telnet
Telnet is used to connect a remote computer over network. It provides a bidirectional interactive text-
oriented communication facility using a virtual terminal connection on internet or local area networks.
Telnet provides a command-line interface on a remote host. Most network equipment and operating
systems with a TCP/IP stack support a Telnet service for remote configuration (including systems based
on Windows NT). Telnet is used to establish a connection to Transmission Control Protocol (TCP) on
port number 23, where a Telnet server application (telnetd) is listening.
Experts in computer security, recommend that the use of Telnet for remote logins should be
discontinued under all normal circumstances, for the following reasons:
➢ Telnet, by default, does not encrypt any data sent over the connection (including passwords),
and so it is often practical to eavesdrop on the communications and use the password later for
malicious purposes; anybody who has access to a router, switch, hub or gateway located on the
network between the two hosts where Telnet is being used can intercept the packets passing by
and obtain login, password and whatever else is typed with a packet analyzer.
➢ Most implementations of Telnet have no authentication that would ensure communication is
carried out between the two desired hosts and not intercepted in the middle.
➢ Several vulnerabilities have been discovered over the years in commonly used Telnet daemons.
rlogin
Utility for Unix-like operating systems that allows users to log in on another host remotely through
network, communicating through TCP port 513. It has several security problem, like all information,
including passwords transmitted in unencrypted mode. It is vulnerable to interception. Therefor, it was
rarely used across distrusted networks (like the public Internet) and even in closed networks.
rsh
remote shell (rsh) can connect a remote host across a computer network. The remote system to which
rsh connects runs the rsh daemon (rshd). The daemon typically uses the well-known Transmission
Control Protocol (TCP) port number 514. In security point of view, it is not recommended.
19
Chapter 4: Resource Monitoring and Management
FreeNX
FreeNX allows to access desktop from another computer over the Internet. One can use this to login
graphically to a desktop from a remote location. One example of its use would be to have a FreeNX
server set up on home computer, and graphically logging in to the home computer from work computer,
using a FreeNX client.
20
Chapter 4: Resource Monitoring and Management
Remote Admin allows system administrators or support personnel to remotely access Officelinx Admin
from their own workstation, eliminating the need to be in front of the server in order to perform
administrative functions.
4.2.4. Disadvantages of Remote Administration
Remote administration has many disadvantages too apart from its advantages. The first and foremost
disadvantage is the security. Generally, certain ports to be open at Server level to do remote
administration. Due to open ports, the hackers/attackers takes advantage to compromise the system. It
is advised that remote administration to be used only in emergency or essential situations only to do
administration remotely. In normal situations, it is ideal to block the ports to avoid remote
administration.
4.3. Performance
4.3.1. Redundant Array of Inexpensive (or Independent) Disks (RAID)
RAID is a data storage virtualization technology that combines multiple physical disk drive
components into one or more logical units for the purposes of data redundancy, performance
improvement, or both. This was in contrast to the previous concept of highly reliable mainframe disk
drives referred to as Single Large Expensive Disk (SLED).
Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on
the required level of redundancy and performance. The different schemes, or data distribution layouts,
are named by the word "RAID" followed by a number, for example RAID 0 or RAID 1. Each scheme,
or RAID level, provides a different balance among the key goals: reliability, availability,
performance, and capacity. RAID levels greater than RAID 0 provide protection against
unrecoverable sector read errors, as well as against failures of whole physical drives.
4.3.1.1. Standard levels
Originally, there were five standard levels of RAID, but many variations have evolved, including
several nested levels and many non-standard levels (mostly proprietary). RAID levels and their
associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the
Common RAID Disk Drive Format (DDF) standard:
RAID 0 consists of striping, but no mirroring or parity. Compared to a spanned
volume, the capacity of a RAID 0 volume is the same; it is the sum of the capacities
of drives in the set. But because striping distributes contents of each file among all
drives in the set, the failure of any drive causes the entire RAID 0 volume and all
files to be lost. In comparison, a spanned volume preserves the files on the unfailing
drives. The benefit of RAID 0 is that the throughput of read and write operations to
any file is multiplied by the number of drives because, unlike spanned volumes,
reads and writes are done concurrently. The cost is increased vulnerability to drive
failures—since any drive in a RAID 0 setup failing causes entire volume to be lost,
the average failure rate of the volume rises with the number of attached drives.
Figure 4.6. RAID 0 setup
21
Chapter 4: Resource Monitoring and Management
NOTES:
In data storage, data striping is the technique of segmenting logically sequential data, slike a file, so
that consecutive segments are stored on different physical storage devices. It is useful when processor
requests data more quickly than single storage device can provide it. By spreading segments across
multiple devices which can be accessed concurrently, total data throughput is increased.
In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard
disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored
volume is a complete logical representation of separate volume copies.
Parity stripe or parity disk in a RAID array provides error-correction. Parity bits are written at the
rate of one parity bit per n bits, where n is the number of disks in the array. When a read error occurs,
each bit in the error region is recalculated from its set of n bits. In this way, using one parity bit
creates "redunancy" for a region from the size of one bit to the size of one disk.
RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation
is synchronized and data is striped such that each sequential bit is on a different drive. Hamming-code
parity is calculated across corresponding bits and stored on at least one parity drive. This level is of
historical significance only; as of 2014 it is not used by any commercially available system.
22
Chapter 4: Resource Monitoring and Management
RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is
synchronized and data is striped such that each sequential byte is on a different drive. Parity is
calculated across corresponding bytes and stored on a dedicated parity drive. Although
implementations exist, RAID 3 is not commonly used in practice. The following figure shows a RAID
3 setup of 6-byte blocks and two parity bytes, shown are blocks of data in different colors.
RAID 4 consists of block-level striping with dedicated parity. The main advantage of RAID 4 over
RAID 2 and 3 is I/O parallelism: in RAID 2 and 3, a single read I/O operation requires reading the
whole group of data drives, while in RAID 4 one I/O read operation does not have to spread across all
data drives. As a result, more I/O operations can be executed in parallel, improving the performance of
small transfers. The figure below shows a setup of RAID 4 with dedicated parity disk with each color
representing the group of blocks in the respective parity block (a strip).
RAID 5 consists of block-level striping with distributed parity. Unlike RAID 4, parity information
is distributed among the drives, requiring all drives but one to be present to operate. Upon failure of a
single drive, subsequent reads can be calculated from the distributed parity such that no data is lost.
23
Chapter 4: Resource Monitoring and Management
RAID 5 requires at least three disks. Like all single-parity concepts, large RAID 5 implementations
are susceptible to system failures because of trends regarding array rebuild time and the chance of drive
failure during rebuild. Rebuilding an array requires reading all data from all disks, opening a chance for
a second drive failure and the loss of the entire array. The figure below shows a setup of RAID 5 layout
with each color represent the group of data blocks and associated party block (a stripe).
RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault
tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-
availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of
four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array
until the failed drive has been replaced. With a RAID 6 array, using drives from multiple sources and
manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the
drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead
of RAID 5. RAID 10 also minimizes these problems. The figure below shows a RAID 6 setup, which is
identical to RAID 5 other than the addition of a second parity block.
24