lOMoARcPSD|38379425
UNIT - IV
STORAGE VIRTUALIZATION
Memory Virtualization - Types of Storage Virtualization - Block, File - Address Space
Remapping- Risks of Storage Virtualization – SAN – NAS – RAID.
4.1. Memory Virtualization
• Memory virtualization refers to the abstraction of physical memory resources from
the perspective of software running on a system.
• It allows for more efficient and flexible management of memory resources by
decoupling the way programs use memory from the underlying hardware
implementation.
• In a virtualized memory environment, each program or process operates under the
illusion that it has access to a contiguous block of memory, known as virtual
memory, which is typically larger than the physical memory available on the system.
This illusion is maintained by the operating system through various techniques such
as paging and segmentation.
4.1.1. How does memory virtualization work?
Virtual memory virtualization is similar to the virtual memory support provided by
modern operating systems.
In a traditional execution environment, the operating system maintains mappings
of virtual memory to machine memory using page tables, which is a one-stage mapping from
virtual memory to machine memory. All modern x86 CPUs include a memory management
unit (MMU) and a translation lookaside buffer (TLB) to optimize virtual memory
performance.
However, in a virtual execution environment, virtual memory virtualization involves
sharing the physical system memory in RAM and dynamically allocating it to the physical
memory of the VMs.That means a two-stage mapping process should be maintained by
the guest OS and the VMM, respectively: virtual memory to physical memory and
physical memory to machine memory. Furthermore, MMU virtualization should be
supported, which is transparent to the guest OS.
The guest OS continues to control the mapping of virtual addresses to the physical
memory addresses of VMs. But the guest OS cannot directly access the actual machine
memory. The VMM is responsible for mapping the guest physical memory to the actual
machine memory. Figure 4.1 shows the two-level memory mapping procedure.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
c4
0nor
Figure. 4.1. Two level memory mapping procedure
Since each page table of the guest OSes has a separate page table in the VMM
corresponding to it, the VMM page table is called the shadow page table. Nested page
tables add another layer of indirection to virtual memory. The MMU already handles virtual-
to-physical translations as defined by the OS. Then the physical memory addresses are
translated to machine addresses using another set of page tables defined by the hypervisor.
Since modern operating systems maintain a set of page tables for every process, the
shadow page tables will get flooded. Consequently, the performance overhead and cost of
memory will be very high.
VMware uses shadow page tables to perform virtual-memory-to-machine-
memory address translation. Processors use TLB hardware to map the virtual memory
directly to the machine memory to avoid the two levels of translation on every access. When
the guest OS changes the virtual memory to a physical memory mapping, the VMM updates
the shadow page tables to enable a direct lookup. The AMD Barcelona processor has featured
hardware-assisted memory virtualization since 2007. It provides hardware assistance to the
two-stage address translation in a virtual execution environment by using a technology called
nested paging.
4.1.2. Techniques for memory virtualization
1. Virtual Memory: Virtual memory is a fundamental form of memory virtualization
that abstracts physical memory by providing each process with its own virtual address
space. It allows processes to operate as if they have a contiguous memory address
space larger than the actual physical memory available. When a process accesses
memory, the virtual memory system translates virtual addresses to physical addresses
using techniques like paging or segmentation.
2. Paging: Paging is a memory management scheme that divides both physical and
virtual memory into fixed-size blocks called pages. The operating system maintains a
page table to map virtual pages to physical pages. When a process accesses a virtual
address, the page table is consulted to determine the corresponding physical address.
If the required page is not currently in physical memory, a page fault occurs, and the
operating system fetches the required page from disk into RAM.
3. Segmentation: Segmentation divides the virtual address space into variable-sized
segments, each representing a logical unit of the program (e.g., code segment, data
segment, stack segment). Unlike paging, segmentation does not use fixed-size blocks.
Each segment has its own base address and size, and the operating system maintains a
segment table to map virtual addresses to physical addresses.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4. Memory Overcommitment: Memory overcommitment, also known as
overcommitting or oversubscribing memory, is a technique used in virtualized
environments to allocate more virtual memory to guest VMs than is physically
available on the host. It relies on the assumption that not all VMs will actively use all
of their allocated memory at the same time. Memory overcommitment requires
sophisticated memory management techniques, such as memory ballooning and
memory swapping, to handle situations where memory demands exceed physical
capacity.
5. Memory Ballooning: Memory ballooning is a technique used in virtualized
environments to dynamically adjust the memory allocation of guest VMs based on
demand. It involves installing a balloon driver within the guest operating system,
which can inflate or deflate to claim or release memory from the VM. Memory
ballooning helps optimize memory usage across VMs and prevents resource
contention.
6. Memory Compression: Memory compression is a technique used to reduce memory
usage by compressing memory pages in RAM instead of swapping them out to disk
when physical memory is low. This allows more data to fit into physical memory,
improving performance by reducing the need for disk I/O.
7. Memory Deduplication: Memory deduplication identifies duplicate memory pages
across multiple VMs or processes and consolidates them to reduce memory
consumption. This technique is particularly useful in virtualized environments where
many VMs are running similar operating system and application images.
4.1.3. Advantages of memory virtualization:
1. Efficient Resource Utilization
2. Isolation
3. Scalability
4. Flexibility
5. Performance Optimization
4.1.4. Disadvantages of memory virtualization:
1. Overhead
2. Complexity
3. Potential for Resource Contention
4. Security Concerns
4.2. Storage Virtualization:
• Storage virtualization is the process of presenting a logical view of the physical
storage resources to a host. This logical storage appears and behaves as physical
storage directly connected to the host.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
• Some examples of storage virtualization are host-based volume management, LUN
creation, tape storage virtualization, and disk addressing (CHS to LBA).
• The key benefits of storage virtualization include increased storage utilization,
adding or deleting storage without affecting an application’s availability, and
nondisruptive data migration (access to files and storage while migrationsare in
progress).
• Figure. 4.2. illustrates a virtualized storage environment.
Figure 4.2. Storage Virtualization
• At the top are four servers, each of which has one virtual volume assigned, which is
currently in use by an application. These virtual volumes are mapped to the actual
storage in the arrays, as shown at the bottom of the figure.
• When I/O is sent to a virtual volume, it is redirected through the virtualization at the
storage network layer to the mapped physical array.
4.2.1. How Does Storage Virtualization Work?
Storage virtualization abstracts the underlying physical storage resources and presents
them as virtualized storage to the applications, operating systems, and other components
within a computing environment. As a result, this allows for centralized management and
logical grouping of storage resources, providing a unified view and control over the storage
infrastructure.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
By leveraging storage virtualization, organizations can achieve improved storage
utilization, simplified management, enhanced scalability, better data mobility, and increased
flexibility in deploying and managing storage resources.
Additionally, virtualization allows for decoupling storage from physical hardware,
providing a more efficient and agile approach to storage management in modern data center
environments.
4.2.2. How Storage Virtualization Apply?
Following are the different ways for storage applies to the virtualization:
● Host-Based
● Network-Based
● Array-Based
i. Host-Based Storage Virtualization
Here, all the virtualizations and management is done at the host level with the help of
software and physical storage, it can be any device or array.
The host is made up of multiple hosts which present virtual drives of a set to the guest
machines. Doesn’t matter whether they are VMs in an enterprise or PCs.
ii. Network-Based Storage Virtualization
Network-based storage virtualization is the most common form which are using
nowadays. Devices such as a smart switch or purpose-built server connect to all the
storage device in a fibre channel storage network and present the storage as a virtual
pool.
iii. Array-Based Storage Virtualization
Here the storage array provides different types of storage which are physical and used as
storage tiers. The software is available which handles the amount of storage tier made up
of solid-state drives hard drives.
4.2.3. Advantages of Storage Virtualization
Below are some Advantages of Storage Virtualization.
• Advanced features like redundancy, replication, and disaster recovery are all
possible with the storage devices.
• It enables everyone to establish their own company prospects.
• Data is kept in more practical places that are farther from the particular host. Not
always is the data compromised in the event of a host failure.
• IT operations may now provision, divide, and secure storage in a more flexible
way by abstracting the storage layer.
4.2.4. Disadvantages of Storage Virtualization
Below are some Disadvantages of Storage Virtualization.
• Storage Virtualization still has limitations which must be considered.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
• Data security is still a problem. Virtual environments can draw new types
of cyberattacks, despite the fact that some may contend that virtual computers
and servers are more secure than physical ones.
• The deployment of storage virtualization is not always easy. There aren’t many
technological obstacles, including scalability.
• Your data’s end-to-end perspective is broken by virtualization. Integrating the
virtualized storage solution with current tools and systems is a requirement.
4.3. Types of Storage Virtualization
Virtual storage is about providing logical storage to hosts and applications
independent of physical resources. Virtualization can be implemented in both SAN and NAS
storage environments. In a SAN, virtualization is applied at the block level, whereas in NAS,
it is applied at the file level.
There are majorly two types of storage virtualization, which are:
1. Block level storage virtualization
2. File level storage virtualization
4.3.1. Block-Level Storage Virtualization
Block-level storage virtualization provides a translation layer in the SAN, between the
hosts and the storage arrays, as shown in Figure. 4.3. Instead of being directed to the LUNs
on the individual storage arrays, the hosts are directed the virtualized LUNs on the
virtualization device.
Figure 4.3. Block-Level Storage Virtualization
The virtualization device translates between the virtual LUNs and the physical LUNs
on the individual arrays. This facilitates the use of arrays from different vendors
simultaneously, without any interoperability issues. For a host, all the arrays appear like a
single target device and LUNs can be distributed or even split across multiple arrays.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Block-level storage virtualization extends storage volumes online, resolves
application growth requirements, consolidates heterogeneous storage arrays, and enables
transparent volume access. It also provides the advantage of nondisruptive data migration.
In traditional SAN environments, LUN migration from one array to another was an
offline event because the hosts needed to be updated to reflect the new array configuration. In
other instances, host CPU cycles were required to migrate data from one array to the other,
especially in a multi-vendor environment.
With a block-level virtualization solution in place, the virtualization engine handles
the back-end migration of data, which enables LUNs to remain online and accessible while
data is being migrated. No physical changes are required because the host still points to the
same virtual targets on the virtualization device. However, the mappings on the virtualization
device should be changed. These changes can be executed dynamically and are transparent to
the end user.
Deploying heterogeneous arrays in a virtualized environment facilitates and
information lifecycle management (ILM) strategy, enabling significant cost and resource
optimization. Low-value data can be migrated from high- to low-performance arrays or disks.
4.3.2. File-Level Virtualization
File-level virtualization addresses the NAS challenges by eliminating the
dependencies between the data accessed at the file level and the location where the files are
physically stored. This provides opportunities to optimize storage utilization and server
consolidation and to perform nondisruptive file migrations.
Figure. 4.4. illustrates a NAS environment before and after the implementation of file-level
virtualization.
(doll4v.tittle
Figure. 4.4. NAS device before and after file-level virtualization
Before virtualization, each NAS device or file server is physically and logically
independent. Each host knows exactly where its file-level resources are located.
Underutilized storage resources and capacity problems result because files are bound to a
specific file server. It is necessary to move the files from one server to another because of
performance reasons or when the file server fills up. Moving files across the environment is
not easy and requires downtime for the file servers.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Moreover, hosts and applications need to be reconfigured with the new path, making
it difficult for storage administrators to improve storage efficiency while maintaining the
required service level.
File-level virtualization simplifies file mobility. It provides user or application
independence from the location where the files are stored. File-level virtualization creates a
logical pool of storage, enabling users to use a logical path, rather than a physical path, to
access files.
File-level virtualization facilitates the movement of file systems across the online file
servers. This means that while the files are being moved, clients can access their files no
disruptively.
Clients can also read their files from the old location and write them back to the new
location without realizing that the physical location has changed. Multiple clients connected
to multiple servers can perform online movement of their files to optimize utilization of their
resources. A global namespace can be used to map the logical path of a file to the physical
path names.
4.3.3. Comparison between File Level Storage Virtualization and Block Level Storage
Virtualization
8A5IS FILE LEVEL STORAGE BLOCK LEVEL STORAGE
to.Are t4tore
Wand+Cof[fore Channel o
hbreChannel +SC
that)
locket4to4tore fl wtic to.d
h.rive.bets.rer o.tit
fl44ere. Cb«4.4n..4bthi4tor Not4bit0.. 0otron.r bl4of
it 04
pr.ti Mogel.tip.t Coen.lie
flexibility l fl ee fob.
fibrin. grr.deb.vent0turf fr4.h..0.tit.b.
.it fl. 4.p.it/l.4.4th.4tt. $le
4eve.ton..tee.veter..
tb/0ptiwh on4$..l.Lrt.putt00rt%th.f of
qrd to dun.pt fl bloci thr f abhy
L.torn.4to fit. .en.44tee ft..
/
4.4. Address Space Remapping
4.4.1. Introduction
• Address space remapping is a technique used to map logical addresses to physical
storage locations in a way that provides flexibility and efficiency in managing
memory or storage resources.
• In storage virtualization, address space remapping may be employed to manage
virtual storage volumes and provide features such as thin provisioning and dynamic
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
resizing. This allows storage resources to be allocated and managed flexibly, without
being tied to specific physical storage devices or locations.
• Virtualization of storage helps achieve location independence by abstracting the
physical location of the data. The virtualization system presents to the user a logical
space for data storage and handles the process of mapping it to the actual physical
location.
• It is possible to have multiple layers of virtualization or mapping. It is then possible
that the output of one layer of virtualization can then be used as the input for a higher
layer of virtualization.
• Virtualization maps space between back-end resources, to front-end resources. In this
instance, "back-end" refers to a logical unit number (LUN) that is not presented to a
computer, or host system for direct use. A "front-end" LUN or volume is presented to
a host or computer system for use.
• The actual form of the mapping will depend on the chosen implementation.
• Some implementations may limit the granularity of the mapping which may limit the
capabilities of the device.
• Typical granularities range from a single physical disk down to some small subset
(multiples of megabytes or gigabytes) of the physical disk.
• In a block-based storage environment, a single block of information is addressed
using a LUN identifier and an offset within that LUN – known as a logical block
addressing (LBA).
4.4.2. Benefits of Address Space Remapping
Address space remapping in storage virtualization provides several significant
benefits:
1. Optimized Storage Utilization: Address space remapping allows for dynamic
allocation of physical storage resources based on demand. This ensures that storage
capacity is efficiently utilized, with resources allocated as needed rather than being
statically provisioned. Remapping enables the system to allocate storage resources
more effectively, reducing wasted space and optimizing storage utilization.
2. Improved Performance: By remapping logical addresses to different physical
storage locations, storage virtualization can enhance performance. Frequently
accessed data can be placed on faster storage tiers, such as solid-state drives (SSDs),
while less frequently accessed data can be stored on lower-cost, higher-capacity
storage tiers, such as hard disk drives (HDDs). This optimization improves overall
system performance by ensuring that data is stored on the most appropriate storage
medium.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
3. Flexibility and Scalability: Address space remapping enables seamless scalability of
storage resources. New storage devices or arrays can be added to the virtualized
storage environment, and logical addresses can be remapped to include these
additional resources without disrupting existing applications or users. This flexibility
allows organizations to easily expand their storage infrastructure to meet growing
storage demands.
4. Data Mobility and Migration: Address space remapping facilitates data mobility
and migration within the storage virtualization environment. Data can be moved or
migrated between different storage systems, arrays, or technologies, and logical
addresses are remapped to the new physical locations transparently to applications or
users. This capability simplifies data management tasks and allows for more efficient
storage resource utilization.
5. Abstraction and Simplification: Storage virtualization abstracts the underlying
physical storage infrastructure, providing a unified view of storage resources to
applications or users. Address space remapping ensures that applications or users can
access storage resources using logical addresses without needing to know the details
of the underlying physical storage configuration. This abstraction simplifies storage
management and reduces complexity for administrators.
6. Enhanced Data Protection and Redundancy: Address space remapping can
contribute to data protection and redundancy in storage virtualization environments.
The virtualization layer can remap logical addresses to redundant or mirrored copies
of data stored on alternative physical devices, providing resilience against hardware
failures or disruptions. This redundancy helps ensure data availability and reliability
in the event of storage device failures.
4.4.3. Working of Address Space Remapping
Address space remapping, specifically in the context of storage virtualization,
involves dynamically associating logical addresses used by applications or file systems with
physical storage locations managed by the storage virtualization layer. Here's how it typically
works:
1. Logical-to-Physical Mapping: When an application or file system requests access to
data, it uses logical addresses to specify the location of the data. These logical
addresses are abstract representations that do not directly correspond to physical
storage locations.
2. Virtualization Layer: The storage virtualization layer intercepts requests from
applications or file systems and translates logical addresses to physical addresses.
This translation process involves mapping logical addresses to specific physical
storage locations managed by the storage virtualization layer.
3. Mapping Table: The storage virtualization layer maintains a mapping table that
associates logical addresses with corresponding physical storage locations. This
mapping table is dynamic and can be updated as needed to reflect changes in the
storage environment, such as the addition or removal of storage devices.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4. Dynamic Allocation: Address space remapping enables dynamic allocation of
physical storage resources based on the logical addresses requested by applications or
file systems. The storage virtualization layer allocates physical storage from a pool of
available resources and maps logical addresses to these physical storage locations.
5. Optimization and Load Balancing: The storage virtualization layer may remap
logical addresses to different physical storage locations to optimize performance and
balance load. For example, frequently accessed data may be moved to faster storage
tiers, while less frequently accessed data may be moved to lower-cost, higher-capacity
storage tiers.
6. Data Migration and Mobility: Address space remapping facilitates data migration
and mobility within the storage virtualization environment. Data can be moved or
migrated between different storage systems, arrays, or technologies, and logical
addresses are remapped to the new physical locations transparently to applications or
file systems.
7. Abstraction and Transparency: Storage virtualization abstracts the underlying
physical storage infrastructure, providing a unified view of storage resources to
applications or file systems. Address space remapping ensures that applications or file
systems can access storage resources using logical addresses without needing to know
the details of the underlying physical storage configuration.
8. Fault Tolerance and Redundancy: Address space remapping may also contribute to
fault tolerance and redundancy in storage virtualization environments. The storage
virtualization layer can remap logical addresses to redundant or mirrored copies of
data stored on alternative physical devices, providing resilience against hardware
failures or disruptions.
4.4.4. Major Challenges of Memory Address Remapping
The major challenges associated with memory address remapping:
1. Limited Applicability:
The primary challenge is that address space remapping primarily targets memory
management within a computer system. Its direct impact on storage virtualization,
which deals with physical storage allocation and presentation, is minimal.
2. Increased Complexity:
Introducing another layer of remapping within storage controllers can add complexity
to the storage virtualization environment. This complexity can make troubleshooting
and debugging issues more challenging for administrators.
3. Potential Performance Overhead:
Remapping introduces an additional translation step between logical addresses used
by virtual machines and the physical locations on storage devices. This can lead to
slight performance overhead in I/O operations, especially for random access patterns.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4. Security Considerations:
While remapping with encryption can add some obfuscation, it's not a security
solution in itself. A sophisticated attacker could potentially exploit vulnerabilities in
the remapping process to gain access to encrypted data. Strong encryption algorithms
and proper key management practices remain essential for data security.
5. Limited Visibility and Control:
Since remapping typically happens within storage controller firmware, IT
administrators might have limited visibility and control over the specific remapping
mechanisms employed. This can make it difficult to fine-tune performance or
implement specific security policies related to remapping.
4.5. Risks of Storage virtualization
Storage virtualization offers numerous benefits, but it also comes with several risks and
challenges. Here are some of the main risks associated with storage virtualization:
1. Data Security: Virtualizing storage means that multiple logical storage units can
reside on the same physical storage infrastructure. If proper security measures are not
in place, there is a risk of unauthorized access to sensitive data. Malicious actors
could potentially compromise the virtualization layer and gain access to data from
multiple sources.
2. Performance Degradation: While storage virtualization can improve resource
utilization and flexibility, it can also introduce performance overhead. The additional
layer of abstraction between the physical storage and the applications accessing it can
lead to latency issues and reduced I/O performance, especially if not properly
managed.
3. Single Point of Failure: The centralization of storage management in a virtualized
environment means that there is a single point of failure—the storage virtualization
layer itself. If this layer experiences a failure or becomes unavailable, it can result in
widespread data loss or downtime for multiple applications and services.
4. Vendor Lock-In: Adopting a specific storage virtualization solution may lead to
vendor lock-in, where organizations become dependent on a particular vendor's
technology and find it challenging to switch to alternative solutions in the future. This
can limit flexibility and hinder the organization's ability to adapt to changing business
requirements or take advantage of emerging technologies.
5. Complexity and Management Overhead: Storage virtualization introduces
additional complexity to the storage infrastructure, requiring specialized skills and
knowledge to manage effectively. Administrators need to understand the
virtualization technology, as well as the underlying physical storage systems, to
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
troubleshoot issues and optimize performance. This can result in increased
management overhead and training costs.
6. Data Migration Challenges: Moving data between different storage systems or
migrating from one virtualization platform to another can be complex and time-
consuming. Data migration processes may disrupt normal operations and require
careful planning to minimize downtime and ensure data integrity.
7. Compatibility and Interoperability Issues: Integrating storage virtualization with
existing IT infrastructure and applications can be challenging, particularly if there are
compatibility or interoperability issues between different systems and components.
Ensuring seamless communication and data exchange between virtualized storage and
other IT resources may require additional configuration and testing.
4.6. Storage Area Network (SAN)
A Storage Area Network (SAN) is a specialized, high-speed network that provides
network access to storage devices. SANs are typically composed of hosts, switches, storage
elements, and storage devices that are interconnected using a variety of technologies,
topologies, and protocols. SANs may span multiple sites.
A SAN presents storage devices to a host such that the storage appears to be locally
attached. This simplified presentation of storage to a host is accomplished through the use of
different types of virtualization.
Client Access LAN
Figure.4.5. Storage Area Network
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4.6.1. SAN Protocols
The most Frequent SAN protocols are:
● FCP: Fibre Channel Protocol is the most widely used SAN deployed in 70% to 80%
of the total SAN market. FCP uses Fibre Channel transfer protocols using embedded
SCSI commands.
● iSCSI: Internet Small Computer System Interface is the next biggest SAN or
block protocol, with roughly 10% to 15 percent of the marketplace. ISCSI
encapsulates SCSI commands within an Ethernet frame and uses an IP Ethernet
system for transportation.
● FCoE: Fibre Channel over Ethernet is less than 5 percent of the SAN marketplace.
It’s very similar to iSCSI because it encapsulates an FC frame within an Ethernet
datagram. Then like iSCSI, it utilizes an IP Ethernet system for transportation.
● NVMe: Non-Volatile Memory Express over Fibre Channel is a port protocol for
obtaining flash storage through a PCI Express (PCIe) bus. Unlike conventional all-
flash architectures, which can be restricted to one sequential or serial control queue,
NVMe supports tens of thousands of concurrent queues, each with the capability to
encourage thousands of concurrent controls.
4.6.2. Components of SAN
Each of the fibre channel devices is known as a node port like server storage and tape
libraries. You can understand the real-time concept of both SAN and NAS in the picture
given below:
E. EE.
rc
.
po-toe«co
•
SA
chaw
s A4tog
ytern
Figure.4.6. SAN Components
● Node: Every node may be either an origin or a destination for a different host.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
● Cables: Cabling of this system is performed using fiber optic cable and copper cable.
To cover short space the copper cable is used, for example, for backend connectivity.
● Interconnect Devices: Hubs, switches, and supervisors will be the interconnect
apparatus adopted for its SAN.
● Storage Arrays: The massive storage arrays are utilized for supplying host access to
the storage tools.
● SAN Management Software: The SAN management software is utilized to control
the ports between storage arrays, interconnect hosts, and devices.
4.6.3. How SAN works?
SAN storage solutions are block storage-based, meaning data is split into storage volumes
that can be formatted with different protocols, such as iSCSI or Fibre Channel Protocol
(FCP). A SAN can include hard disks or virtual storage nodes and cloud resources, known as
virtual SANs or vSANs.
SAN configurations are made up of three distinct layers:
● Storage layer: The physical data storage resources, such as drives in a data center,
are organized into storage pools and tiers. Because the data is stored using block-level
storage, built-in redundancy and automatic traffic rerouting, data is available even if a
server is down.
● Fabric layer: The fabric layer is how the storage connects to the user, such as via
network devices and cables. This connectivity could be via Fibre Channel or Fibre
Channel over Ethernet (FCoE). Both take pressure off the local area network (LAN)
by moving storage and associated data traffic to its own high-speed network.
● Host layer: The servers and applications that facilitate accessing the storage. Because
this layer recognizes the SAN storage as a local hard drive, it ensures quick
processing speeds and data transfers.
4.6.4. SAN use cases:
Low-latency and scalability make SANs the preferred choice in these cases:
● Video editing: Large files require high throughput and low-latency. SANs can
connect directly to the video editing desktop client, without the need for an extra
server layer, offering high-performance capabilities.
● Ecommerce: Today’s consumers expect shopping online to go smoothly and quickly.
Ecommerce companies need high-performance functionality, which makes SANs a
good choice.
● Backup/disaster recovery: Backups of networked devices can be executed quickly
and directly to SAN storage because traffic does not travel over the LAN.
Virtualization accelerates the processing and scalability of SANs with virtual
machines and cloud storage.
4.6.5. Overview of SAN Benefits
• High-speed data access
• Highly expandable
• OS-Level access to files
• A committed network for storage alleviates pressure on the LAN
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4.6.6. Limitations of SAN
• The major limitation of SAN lies in the cost and management.
• Even though they provide high speed data access, different networks of Ethernet
are to be maintained- one for carrying out the Fibre Channel network and the
other for handling the requests of metadata files.
4.7. Network Attached Storage (NAS)
A Network Attached Storage (NAS) is a computer attached to a system or network
that offers file-based data storage solutions to other devices available on the network. The
NAS installation and deployment process is more straightforward. Network Attached Storage
volumes appear to the end-users as community-mounted volumes.
The data to be served is usually contained on single or multiple storage drives,
frequently arranged into logical, Redundant Arrays of Independent Disks. The device itself is
known as a system or network node, similar to computers or TCP/IP devices, all of which
maintain their particular IP address and efficiently communicate with other networked
devices.
NAS devices offer an easier way for many users in different locations to access data,
which can be valuable when working on the same project or sharing information.
Network Attached Storage (NAS) Architecture
NAS server
Client Client Client
1 I
Ethernet
Figure. 4.7. Network Attached Storage (NAS)
4.7.1. NAS Protocols
● SMB OR CIFS: Server Message Block or Common Internet File Services is the
protocol that Windows typically uses.
● NFS: Network File System was initially developed to be used with UNIX servers,
and it’s also a frequent Linux protocol.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
4.7.2. Components of NAS
● NIC: Network Interface Card that permits connectivity to the system.
● Optimized Operating System: An optimized operating system that controls the
performance of NAS.
● Protocols: Protocols for sharing documents like NFS and CIFS.
● Storage Protocols: Storage protocols such as ATA, SCSI, or FC are used to connect
and handle physical disk tools.
NAS
CFS/NFS
E
protocol
gs Ethernet
Switch
LAN
e@er
cables
NAS Storage Systems
with NAS Gateway
Figure.4.7. NAS Components
4.7.3. How NAS works?
NAS storage systems are file storage-based, meaning the data is stored in files that are
organized in folders under a hierarchy of directories and subdirectories. Unlike direct
attached storage — which can be accessed by one device — the NAS file system provides
file storage and sharing capabilities between devices.
A NAS system is built using the following elements:
● Network: One or multiple networked NAS devices are connected to a local area
network (LAN) or an Ethernet network with an assigned IP address.
● NAS box: This hardware device with its own IP address includes a network interface
card (NIC), a power supply, processor, memory and drive bay for two to five disk
drives. A NAS box, or head, connects and processes requests between the user’s
computer and the NAS storage.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
● Storage: The disk drives within the NAS box that store the data. Often storage uses a
RAID configuration, distributing and copying data across multiple drives. This
provides data redundancy as a fail-safe, and it improves performance and storage
capacity.
● Operating system: Unlike local storage, NAS storage is self-contained. It also
includes an operating system to run data management software and authorize file-
level access to authorized users.
● Software: Preconfigured software within the NAS box manages the NAS device and
handles data storage and file-sharing requests.
4.7.4. NAS use cases:
There are times when NAS is the better choice, depending on the company’s needs and
application:
● File collaboration and storage: This is the primary use case for NAS in mid- to
large-scale enterprises. With NAS storage in place, IT can consolidate multiple file
servers for ease of management and to save space.
● Archiving: NAS is a good choice for storing a large number of files, especially if you
want to create a searchable and accessible active archive.
● Big data: NAS is a common choice for storing and processing large unstructured
files, running analytics and using ETL (extract, transform, load) tools for integration.
4.7.5. Overview of NAS Benefits
● Relatively inexpensive
● 24/7 remote data accessibility
● Very Good expandability
● Redundant storage structure (RAID)
● Automatic backups to additional cloud and devices
● Flexibility
4.7.6. Limitations of NAS
• The area where NAS limits itself is in the scalability and performance. After crossing
a certain limit of users’ access of files over NAS, it will ask for scaling up of
horsepower of the server.
• Another major limitation of NAS lies in the Ethernet. The data over the Ethernet is
shared in the form of packets, which simply means that one source or file is divided
into a number of packets. If even one reaches late or goes out of sequence, the user
won’t be able to access that file until each and every packet is reached and converted
back into the sequence.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Comparison Table: SAN vs NAS:
Basis NAS SAN
Target audience Smaller business organizations larger business enterprises
Data (Files] are accessed from network The data [blocks) is accessed by the server hike
Data access
attached drive a local hard drive
Management It is easy to manage Management is complex
Can be scaled up as per the needs by the
Scalability Cannot be scaled up
admins
Standard protocols NFS, SMB, CIFS, HTTP sCSl, SCSI, or FCoE
The speed of NAS devices depend on local
It has high speeds due to Fiber Channels [2
Speed dependency IP/TCP or the Ethernet basically (100
gigabits to 128 gigabits per second)
megabits to I igabit per second)
Bottlenecks Can have network bottlenecks No traffic bottlenecks are experienced
More storage space is required for file
Backups Backups are possibly cost-effective
backups
Power supply is the only point of failure for
Point of failure Network is fault tolerant
NAS
4.8. RAID (Redundant Arrays of Independent Disks)
4.8.1. Introduction
RAID is a technique that makes use of a combination of multiple disks instead of
using a single disk for increased performance, data redundancy, or both. The term was coined
by David Patterson, Garth A. Gibson, and Randy Katz at the University of California,
Berkeley in 1987.
Why Data Redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This means
that in case of disk failure, if the same data is also backed up onto another disk, we can
retrieve the data and go on with the operation. On the other hand, if the data is spread across
multiple disks without the RAID technique, the loss of a single disk can affect the entire
data.
Key Evaluation Points for a RAID System
● Reliability: How many disk faults can the system tolerate?
● Availability: What fraction of the total session time is a system in uptime mode,
i.e. how available is the system for actual use?
● Performance: How good is the response time? How high is the throughput (rate
of processing work)? Note that performance contains a lot of parameters and not
just the two.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
● Capacity: Given a set of N disks each with B blocks, how much useful capacity is
available to the user?
RAID is very transparent to the underlying system. This means that to the host system, it
appears as a single big disk presenting itself as a linear array of blocks. This allows older
technologies to be replaced by RAID without making too many changes to the existing code.
4.8.2. Terms used in RAID:
Here are some common terms used in RAID (Redundant Array of Independent Disks):
1. Striping: A technique used in RAID 0 and some other RAID levels where data is
divided into blocks and distributed across multiple disks. It improves performance by
allowing multiple disks to work in parallel.
2. Mirroring: Also known as RAID 1, mirroring involves creating an exact duplicate of
data on multiple disks. This provides redundancy and fault tolerance, as data remains
accessible even if one disk fails.
3. Parity: In RAID 5 and RAID 6 configurations, parity is a method used to provide
fault tolerance by generating and storing parity information. Parity data allows the
RAID array to reconstruct data in the event of disk failure.
4. Hot Spare: A spare disk drive that is kept in reserve and can automatically replace a
failed disk in a RAID array. Hot spares help minimize downtime and maintain data
redundancy.
5. RAID Level: Refers to the specific configuration or layout of a RAID array,
determining how data is distributed, duplicated, or parity is calculated across the
disks. Common RAID levels include RAID 0, RAID 1, RAID 5, RAID 6, RAID 10,
etc.
6. RAID Controller: A hardware or software component responsible for managing the
operation of a RAID array. Hardware RAID controllers are dedicated devices, while
software RAID controllers are implemented in software.
7. RAID Array: The logical grouping of multiple physical disk drives configured in a
RAID configuration. The RAID array appears as a single storage device to the
operating system.
4.8.3. Levels of RAID:
There are several levels of RAID, each with its own characteristics and benefits. Here
are some most common RAID levels:
1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Strip 1
Strip 4
Strip 2
\_ Raid
f
r
Strip 1
Strip 2
Strip 3 /o
Controller F
subsysten
5tip 5
Strip 3
Strip 6 [rl
Figure. 4.8. RAID Controller
4.8.3.1. RAID-0 (Stripping)
● Blocks are “striped” across disks.
RAID O
Disk 0 Disk 1 Disk 2 Disk 3
0 l 2 3
4 5 6 7
8 9 10 11
12 13 14 15
● In the figure, blocks “0,1,2,3” form a stripe.
● Instead of placing just one block into a disk at a time, we can work with two (or
more) blocks placed into a disk before moving on to the next one.
Disk 0 Disk 1 Disk 2 Disk
0 3 4 6
1 3 5 7
B 10 12 1
9 11 13 15
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Evaluation:
● Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
● Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks
each having B blocks are fully utilized.
Advantages:
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages:
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.
4.8.3.2. RAID-1 (Mirroring)
● More than one copy of each block is stored in a separate disk. Thus, every block
has two (or more) copies, lying on different disks.
RAID T
Disk o Disk Disk 2 Disk 3
0 0 1 1
2 2 3 3
4 4 5 5
6 6 7 7
● The above figure shows a RAID-1 system with mirroring level 2.
● RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of
reliability.
Evaluation:
Assume a RAID system with mirroring level 2.
● Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk would have
duplicates on some other disk. If we are lucky enough and disks 0 and 2 fail, then
again this can be handled as the blocks of these disks have duplicates on disks 1
and 3. So, in the best case, N/2 disk failures can be handled.
● Capacity: N*B/2
Only half the space is being used to store data. The other half is just a mirror of
the already stored data.
Advantages:
1. It covers complete redundancy.
2. It can increase data security and speed.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Disadvantages:
1. It is highly expensive.
2. Storage capacity is less.
4.8.3.3. RAID-2 (Bit-Level Striping with Dedicated Parity)
● In Raid-2, the error of the data is checked at every bit level. Here, we
use Hamming Code Parity Method to find the error in the data.
● It uses one designated drive to store parity.
● The structure of Raid-2 is very complex as we use two disks in this technique.
One word is used to store bits of each word and another word is used to store error
code correction.
● It is not commonly used.
Advantages
1. In case of Error Correction, it uses hamming code.
2. It Uses one designated drive to store parity.
Disadvantages
1. It has a complex structure and high cost due to extra drive.
2. It requires an extra drive for error detection.
4.8.3.4. RAID-3 (Byte-Level Striping with Dedicated Parity)
● It consists of byte-level striping with dedicated parity striping.
● At this level, we store parity information in a disc section and write to a dedicated
parity drive.
● Whenever failure of the drive occurs, it helps in accessing the parity drive,
through which we can reconstruct the data.
RAID 3
Disk 0 Disk Disk 2 Disk 3
15 16 17 P(s, 16, 17)
18 19 20 P(18, 19, 20)
21 22 23 (21, 22, 23)
24 25 26 p(24, 25, 26)
● Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data loss
occurs, we can construct it with Disk 3.
Advantages:
1. Data can be transferred in bulk.
2. Data can be accessed in parallel.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Disadvantages:
1. It requires an additional drive for parity.
2. In the case of small-size files, it performs slowly.
4.8.3.5. RAID-4 (Block-Level Striping with Dedicated Parity)
● Instead of duplicating data, this adopts a parity-based approach.
RAID 4
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
0 2 po
4
'
5 6
3
7 p
8 9 10 1 p2
12 13 14 15 ps
● In the figure, we can observe one column (disk) dedicated to parity.
● Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the
parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is
XOR(0,1,1,0) = 0. A simple approach is that an even number of ones results in
parity 0, and an odd number of ones results in parity 1.
c ca c c
0 o o
o '
o '
o
' '
● Assume that in the above figure, C3 is lost due to some disk failure. Then, we can
recompute the data bit stored in C3 by looking at the values of all the other
columns and the parity bit. This allows us to recover lost data.
Evaluation:
● Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity
works). If more than one disk fails, there is no way to recover the data.
● Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are
made available for data storage, each disk having B blocks.
Advantages:
1. It helps in reconstructing the data if at most one data is lost.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
Disadvantages:
1. It can’t help in reconstructing when more than one data is lost.
4.8.3.6. RAID-5 (Block-Level Striping with Distributed Parity)
● This is a slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
RAID 5
Disk 0 Disk T Disk 2 Disk 3 Disk 4
0 , 2 3 PO
5 6 7 Pl 4
10 1 p2 8 9
15 p5 12 13 14
P4 16 17 18 19
● In the figure, we can notice how the parity bit “rotates”.
● This was introduced to make the random write performance better.
Evaluation:
● Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity
works). If more than one disk fails, there is no way to recover the data. This is
identical to RAID-4.
● Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1)
disks are made available for data storage, each disk having B blocks.
Advantages:
1. Data can be reconstructed using parity bits.
2. It makes the performance better.
Disadvantages:
1. Its technology is complex and extra space is required.
2. If both discs get damaged, data will be lost forever.
4.8.3.7. RAID-6 (Block-Level Striping with two Parity Bits)
● Raid-6 helps when there is more than one disk failure. A pair of independent
parties are generated and stored on multiple disks at this level. Ideally, you need
four disk drives for this level.
● There are also hybrid RAIDs, which make use of more than one RAID level
nested one after the other, to fulfill specific requirements.
lOMoARcPSD|38379425
CCS372 – Virtualization Unit – IV - Notes
RAID 6
Disk Disk 2 Disk 3 Disk 4
Al Bl P(BI) P(BI)
A2 P(B2} P(B2} B2
P(B3) P(B3) A3 B3
P(B4) A4 A4 P(B4)
Advantages:
1. Very high data Accessibility.
2. Fast read data transactions.
Disadvantages:
1. Due to double parity, it has slow write data transactions.
2. Extra space is required.
4.8.4. Advantages of RAID
● Data redundancy: By keeping numerous copies of the data on many disks, RAID
can shield data from disk failures.
● Performance enhancement: RAID can enhance performance by distributing data
over several drives, enabling the simultaneous execution of several read/write
operations.
● Scalability: RAID is scalable, therefore by adding more disks to the array, the
storage capacity may be expanded.
● Versatility: RAID is applicable to a wide range of devices, such as workstations,
servers, and personal PCs
4.8.5. Disadvantages of RAID
● Cost: RAID implementation can be costly, particularly for arrays with large
capacities.
● Complexity: The setup and management of RAID might be challenging.
● Decreased performance: The parity calculations necessary for some RAID
configurations, including RAID 5 and RAID 6, may result in a decrease in speed.
● Single point of failure: RAID is not a comprehensive backup solution, while
offering data redundancy. The array’s whole contents could be lost if the RAID
controller malfunctions.