0% found this document useful (0 votes)
72 views123 pages

CC

Cloud computing is a technology that allows users to store and access data and applications over the internet, offering characteristics like agility, cost reduction, and scalability. It operates on principles such as federation, independence, and elasticity, and includes service models like IaaS, PaaS, and SaaS. Key advantages include cost efficiency, rapid deployment, and access to advanced technologies, making it appealing to IT stakeholders and researchers alike.

Uploaded by

sushmavemula991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views123 pages

CC

Cloud computing is a technology that allows users to store and access data and applications over the internet, offering characteristics like agility, cost reduction, and scalability. It operates on principles such as federation, independence, and elasticity, and includes service models like IaaS, PaaS, and SaaS. Key advantages include cost efficiency, rapid deployment, and access to advanced technologies, making it appealing to IT stakeholders and researchers alike.

Uploaded by

sushmavemula991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CC Unit-1

1) Cloud computing definition,characteristics and principles:

Cloud Computing is a technology that allows you to store and access data and applications over the internet instead of
using your computer’s hard drive or a local [Link] enables users to access shared resources—including networks,
servers, storage, applications, and services—on demand, without direct active management by the user.

It builds on virtualization, distributed systems, and service-oriented architectures (SOA) to deliver scalable and reliable
computing as a service.

Characteristics:
1. Agility for organizations
2. Cost reductions, Centralization of infrastructure in locations with lower costs.
3. Device and location independence, which means no maintenance, required.
4. Pay-per-use means utilization and efficiency improvements for systems that are often only 10–20% utilized.
5. Performances are being monitored by IT experts i.e., from the service provider end.
6. Productivity increases which results in multiple users who can work on the same data simultaneously.
7. Time may be saved as information does not need to be re-entered when fields are matched
8. Availability improves with the use of multiple redundant sites
9. Scalability and elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis in
near real-time without users having to engineer for peak loads.
10. Self-service interface.
11. Resources that are abstracted or virtualized.
12. Security can improve due to centralization of data
Cloud Characteristic Description Application

On-demand self-service Users can automatically provision and manage Server, Time, Network, Storage
computing resources without human
intervention.

Broad network access Services are available over standard networks Smartphones, Tablets, PCs
and accessible through various devices.

Resource pooling Provider’s resources are shared among Physical and Virtual Resources
multiple users using a multi-tenant model. with Dynamic Provisioning

Rapid elasticity Resources can be quickly scaled up or down Adding or Removing Nodes,
based on demand. Servers, or Instances

Measured service Resource usage is monitored and billed on a Metering, Billing, Monitoring
(pay as you go) pay-per-use basis.

Multi-tenancy Multiple users share the same infrastructure Shared Servers, Databases
securely.

Virtualization Hardware resources are abstracted and Virtual Servers, Storage,


provided as virtual machines. Networks

Resilient computing Systems are designed for fault tolerance and Backup, Load Balancing,
high availability. Disaster Recovery

Flexible pricing models Users pay only for the resources they Pay-per-use, Subscription, Spot
consume. Pricing

Security Strong authentication, encryption, and access Data Protection, Privacy,


control are ensured. Access Control

Automation Tasks like deployment and scaling are Auto-scaling, Self-healing


automated with minimal manual effort. Systems

Sustainability Uses energy-efficient and eco-friendly data Green IT, Renewable Energy
centers. Data Centers

Principles:
1. Federation: Enables collaboration and resource sharing between different clouds securely and transparently.
2. Independence:Users can access required virtual resources independent of provider type or tools.
3. Isolation: Ensures each user’s data and processes are separated and protected from others.
4. Elasticity:Resources can scale up or down automatically based on user demand.
5. Business Orientation:Focuses on building efficient platforms that ensure service quality and SLA compliance.
6. Trust:Establishes secure and reliable interaction between providers and users.
7. Service Orientation:Follows a service-based model (IaaS, PaaS, SaaS) instead of delivering physical hardware.
8. Virtualization: Abstracts hardware and runs multiple virtual machines on one physical host efficiently.
9. Autonomy and Automation:Cloud systems self-manage, self-configure, and recover automatically with minimal
human control.
10. Scalability and Parallelism:Supports horizontal and vertical scaling and parallel execution of distributed workloads.
11. Dynamic Resource Management: Allocates and releases computing resources dynamically according to workload
demand.
12. Utility Computing Principle:
Operates like public utilities, charging users only for what they consume.
13. Security and Isolation:Protects multi-tenant environments through encryption, virtualization, and access control.

Ex:

Company Cloud Service Name

1. Amazon AWS (Amazon Web Services)

2. Microsoft Azure

3. Google Google Cloud Platform (GCP)

4. Oracle Oracle Cloud

5. IBM IBM Cloud

6. Salesforce Salesforce Cloud


Advantages of cloud computing:

1. Cost: It reduces the huge capital costs of buying hardware and software.
2. Speed: Resources can be accessed in minutes, typically within a few clicks.
3. Scalability: We can increase or decrease the requirement of resources according to the business requirements.
4. Productivity: While using cloud computing, we put less operational effort. We do not need to apply patching, as well
as no need to maintain hardware and software. So, in this way, the IT team can be more productive and focus on
achieving business goals.
5. Reliability: Backup and recovery of data are less expensive and extremely fast for business continuity.
6. Security: Many cloud vendors offer a broad set of policies, technologies, and controls that strengthen our data
security.

Advantages of CC over internet:The cloud delivers more flexibility and reliability, increased performance and
efficiency, and helps to lower IT costs. It also improves innovation, allowing organizations to achieve faster time to
market and incorporate AI and machine learning use cases into their strategies.

On-Demand Self-Service:
● Users can provision computing resources (e.g., storage, processing power) as needed without requiring human
interaction with service providers.
● This is faster and more flexible than traditional Internet services where provisioning may require manual setup.
Broad Network Access:
● Cloud services are accessible over the Internet from a wide range of devices (laptops, smartphones, tablets).
● The service is device-independent, unlike some Internet applications tied to specific platforms.
Resource Pooling / Multi-Tenancy:
● Computing resources are pooled to serve multiple users dynamically.
● Internet-based applications often dedicate resources to individual users, leading to underutilization.
Rapid Elasticity / Scalability:
● Cloud resources can be scaled up or down quickly to handle varying workloads.
● Traditional Internet hosting often has fixed capacity, making sudden spikes difficult to handle.
Measured Service / Pay-Per-Use:
● Cloud providers monitor and charge based on actual resource usage.
● In typical Internet setups, users often pay for fixed resources regardless of usage.
Reliability and Redundancy:
● Clouds provide fault tolerance and data replication across multiple data centers.
● Standard Internet services may not offer the same level of reliability.
Automation and Management:
● Cloud platforms handle maintenance, updates, and monitoring automatically.
● Internet applications often require manual administration.
Cloud Computing Advantages ☁️

├── 1. Cost Savings and Economic Benefits


│ ├── Shift from Capital Expenditures to Operational Expenditures
│ ├── Pay-per-Use (Measured Service)
│ └── Economies of Scale

├── 2. Operational Agility and Scalability
│ ├── Rapid Elasticity
│ ├── On-Demand Self-Service
│ └── Reduced Management Overhead

├── 3. Reliability and Global Reach
│ ├── High Availability and Reliability
│ ├── Disaster Recovery
│ └── Global Deployment

└── 4. Innovation and Technology Access
├── Access to Advanced Services
├── Ubiquitous Access
└── Continuous Improvement

Advantage Category Why it is an Advantage over Traditional IT

Cost Savings Traditional IT requires large CapEx investment, regardless of usage. Cloud's Pay-per-
Use model (Measured Service) and OpEx structure avoid over-provisioning and initial
capital outlay.

Scalability/Agility Traditional IT scaling is slow (procuring, installing, configuring hardware). Cloud's Rapid
Elasticity is nearly instant and automated, matching variable demand.

Reliability/Reach Traditional IT disaster recovery is complex and expensive. Cloud provides built-in High
Availability and Global Deployment via vast, redundant data centers.

Innovation/Access Traditional IT requires internal expertise and time to integrate new tech. Cloud offers
instant Access to Advanced Services (AI, ML) as managed utilities.
what makes cc so interesting to it stakeholders and research practitioners:
IT Stakeholders: Individuals or groups (like managers, investors, or employees) who have a vested interest in the
performance, decisions, and outcomes of IT systems.

Developers: Professionals who design, build, and maintain software applications or systems using programming,
frameworks, and tools.

Researchers: Individuals who systematically investigate and experiment to generate new knowledge, technologies, or
solutions in a specific domain.

Cloud Computing Advantages



├── 1. Cost Efficiency and Financial Flexibility
├── 2. Scalability and Agility
├── 3. Access to Advanced Technologies
├── 4. Global Reach and Collaboration
├── 5. Reliability, Security, and Disaster Recovery
└── 6. Innovation and Experimentation

Advantage Area Why it's Interesting to Stakeholders

Financial Model Cost Savings: Trading high, fixed Capital Expenditure (CapEx) for low, variable
Shift Operational Expenditure (OpEx) allows for more predictable budgeting and frees up
capital for core business investments. The Pay-per-Use model (Measured Service)
ensures they only pay for what they consume.

Agility & Time-to- Rapid Deployment: Spinning up new servers, databases, or complex environments in
Market minutes instead of weeks or months allows the business to test new ideas and launch new
products much faster than competitors.

Elastic Scalability Matching Demand: The ability to instantly scale resources up or down (Rapid Elasticity) to
handle sudden spikes (like a holiday sale or viral event) without over-provisioning
infrastructure, and then shrinking back down to save cost.

Focus on Core Reduced Management Overhead: Outsourcing the mundane, complex tasks of managing
Business physical hardware, patching operating systems, and maintaining data centers allows the
in-house IT team to focus on strategic, value-generating activities.

Global Reach Instant Globalization: Deploying applications to data centers all over the world in minutes
allows a business to reach new international markets quickly and improve user experience
by reducing latency.
Enhanced Built-in Disaster Recovery: Cloud providers offer superior resilience, reliability, and
Resilience automated backup services, drastically reducing the risk and cost associated with
business continuity planning.

Advantage Area Why it's Interesting to Researchers/Developers

Massive Compute Big Data & HPC: Access to massive, on-demand compute clusters (High-Performance
Power Computing - HPC), Petabytes of storage, and high-speed networking that no single
institution could afford or maintain themselves. This enables research into Big Data,
climate modeling, and genomics.

Access to Advanced Innovation Acceleration: Immediate access to pre-built, fully managed PaaS and SaaS
Services tools for Artificial Intelligence (AI), Machine Learning (ML), serverless computing, and
quantum computing simulators. Researchers can skip infrastructure setup and go
straight to writing code and running experiments.

Collaboration & Shared Environments: The ability to easily share code, data sets, and entire computing
Reproducibility environments with collaborators across the globe, enhancing research transparency
and reproducibility.

Cost-Effective Low Barrier to Entry: Researchers can spin up a powerful, high-cost environment, run a
Experimentation simulation for a few hours, and shut it down, only paying for the minimal time it was
used. This encourages low-risk experimentation.

Ubiquitous Access Work from Anywhere: Data and applications are accessible from any location with an
internet connection, allowing for seamless work between the lab, office, and home.
Cloud service models:

Infrastructure as a Service (IaaS)


Provides virtualized computing resources—VMs, storage, and networking—over the internet. Users get full control to
configure and manage the infrastructure.
Example: AWS EC2 offers customizable virtual machines similar to physical servers.

Platform as a Service (PaaS)


Provides a ready platform for developers to build, deploy, and manage applications without handling underlying
hardware. Includes tools like frameworks, databases, and middleware.
Example: Google App Engine provides a scalable environment to develop and host applications using supported
languages like Python or Java.

Software as a Service (SaaS)


Delivers complete software applications over the internet. No installation or maintenance is needed; users access
software through a browser.
Example: Salesforce provides CRM applications hosted entirely on their servers, accessible on demand.
Feature / IaaS (Infrastructure as a PaaS (Platform as a SaaS (Software as a Service)
Aspect Service) Service)

Definition Provides virtualized Provides a development Delivers software applications


computing resources such environment to build, over the Internet via web
as servers, storage, and deploy, and manage browsers.
networks. applications without
handling underlying
infrastructure.

Customer High – users manage OS, Moderate – users manage Low – provider manages
Control applications, and applications and data; everything; users just access
middleware. provider manages OS and the application.
infrastructure.

Provider Maintains hardware, Maintains hardware, Maintains hardware,


Responsibility networking, storage, networking, storage, OS, networking, OS, runtime, and
virtualization. runtime, and development the application itself.
tools.

User Install and maintain OS, Develop, deploy, and Use the application; minimal
Responsibility applications, and patches. maintain applications and technical management.
data.

Scalability Flexible; users can scale Supports automatic scaling Scales automatically for multiple
virtual machines and of applications developed users; provider handles load.
storage as needed. on the platform.
Examples Amazon EC2, Amazon S3, Google App Engine, [Link], Google Docs,
Joyent Microsoft Azure App Zoho Office
Service, Heroku

Cost Structure Pay for computing Pay for platform usage; Subscription-based; low upfront
resources used; upfront reduces cost of managing cost; multi-tenant architecture
OS/application licensing hardware/software. reduces cost.
required by the user.

Use Case Developers need full Application developers End-users accessing ready-to-
control over infrastructure focusing on building apps use applications for business or
for custom solutions. without managing personal use.
infrastructure.

Advantages On-demand self-service, Reduces infrastructure No hardware costs, automated


broad network access, investment, collaborative updates, accessible from
measured service, development, security, anywhere, pay-per-use.
flexibility. adaptability.

Application Custom applications Applications built on the Ready-to-use software


requiring full infrastructure platform for deployment applications accessed by end-
control users

Disadvantages • Limited control over • Limited control over • Limited customization options.
physical infrastructure. infrastructure and • Possible data security
• Security management configurations. concerns.
responsibility lies with the • Dependence on provider
user. reliability.

distributed computing,cloud computing,grid computing :

Feature Distributed Grid Computing Cloud Computing


Computing
Definition A system whose A specialized type of A model for enabling
components are distributed computing that ubiquitous, convenient, on-
located on different coordinates resources demand network access to a
networked computers, (CPU, storage) across shared pool of configurable
which communicate multiple administrative computing resources (IaaS,
and coordinate their domains to achieve a PaaS, SaaS), which can be
actions by passing single, massive rapidly provisioned and
messages. The computational goal. released with minimal
foundational concept. management effort.

Primary Goal To improve To solve complex To provide computing


performance, computational problems resources as a service on a
reliability, and fault that require a massive pay-per-use basis (utility
tolerance by running amount of CPU power (a computing).
components "virtual supercomputer").
concurrently.

Service Model Architectural Focuses on High- Provides services across


framework; not a Performance Computing layers: IaaS, PaaS, and SaaS.
commercial service (HPC), often provided
model. through specialized
middleware.

Architecture Generally Decentralized and Generally Centralized


Decentralized or peer- collaborative. Resources management by a service
to-peer. Resources are typically provider (client-server
may be homogeneous heterogeneous and architecture) in data centers,
or heterogeneous. geographically distributed. but the underlying
infrastructure is distributed.

Resource Ownership Resources are often Resources are owned by Resources are owned by a
owned by a single multiple, collaborating third-party vendor (Cloud
organization (e.g., an organizations (multiple Provider).
internal corporate administrative domains).
network).

Key Features Concurrency, Fault Resource Sharing across On-demand Self-Service,


(Features) Tolerance, administrative boundaries, Rapid Elasticity, Measured
Transparency (users High Throughput Service, Broad Network
see a single system). Computing (HTC), utilizes Access, Resource Pooling.
specialized Grid Uses heavy Virtualization
Middleware. (VMs, Containers).

Advantages Highly reliable, highly Efficiently utilizes idle CPU Cost-effective (OpEx, pay-as-
scalable, and robust cycles across the globe, you-go), elasticity (scale
due to redundancy. capable of running up/down instantly), low/no
massive parallel jobs. capital expenditure.
Disadvantages Complex to design High complexity due to Security/Compliance concerns
and debug due to lack heterogeneous resources (lack of control), dependency
of a global clock. High and security boundaries. on Internet connectivity, vendor
dependency on Requires specific lock-in risk.
network middleware for access.
communication.

Example Blockchain, Peer-to- Large Hadron Collider Amazon Web Services (AWS)
Peer networks (e.g., (LHC) Computing Grid EC2, Microsoft Azure, Google
BitTorrent), Multi-tier (WLCG), Folding@Home Cloud Platform (GCP), and
Web Applications. (Volunteer Computing), SaaS apps like Gmail or
weather modeling. Salesforce.

1. Centralized Computing
A computing model where all processing is done on a single central server/mainframe,
and users access it through terminals or thin clients.
Features:
1. Single control point – one central system manages processing and resources.
2. High data security – data is stored in one location.
3. Low client hardware needs – clients only interact; no heavy processing.
4. Easier maintenance – updates and backups handled at the central server.

2. Distributed Computing
A model where processing is spread across multiple independent computers that communicate over a network to
achieve a common goal.

Features:
1. Resource sharing – multiple systems share CPU, storage, and applications.
2. Scalability – nodes can be added easily to increase performance.
3. Fault tolerance – failure of one node doesn’t stop the whole system.
4. Parallel processing – tasks can be executed concurrently on different nodes.

3. Cloud Computing
A model that delivers on-demand computing resources (servers, storage, applications) over the internet as a service,
with pay-as-you-go usage.
Features:
1. On-demand self-service – users provision resources automatically.
2. Elasticity & scalability – resources scale up/down based on needs.
3. Measured service – pay for what you use.
4. Broad network access – accessed via internet from any device.

4. Grid Computing
A distributed computing model that combines geographically distributed, heterogeneous
resources to solve large-scale scientific or technical problems.
Features
1. Heterogeneous resource pooling – uses many different types of systems.
2. High-performance computing – suitable for scientific simulations.
3. Decentralized management – resources owned by different organizations.
4. Virtual organizations – users share resources across institutions.

Cloud Deployment Models, as defined by NIST (National Institute of Standards and Technology), describe the nature,
location, and ownership of the cloud infrastructure.

Model Definition Features Advantages Disadvantages Examples

Public Cloud Services offered 1. Multi-tenant - Low cost- No - Less control- AWS, Google
to the general 2. Highly scalable maintenance- Security concerns Cloud,
public over the 3. Internet-based High reliability Microsoft
internet. Owned access Azure
and managed by 4. Managed by vendors
third-party
providers.

Private Cloud Cloud used 1. Single-tenant - Maximum - Costly to VMware


exclusively by one 2. High privacy control- Better maintain- Limited Private Cloud,
organization. 3. Customizable security- Good scalability OpenStack
Managed 4. On-prem/hosted compliance
internally or by a
vendor with high
security.

Hybrid Cloud Combination of 1. Cloud bursting - High flexibility- - Complex setup- Private cloud
public + private 2. Workload portability High scalability- Needs strong + AWS
cloud with 3. Mix of secure + Secure for network combination
data/application scalable sensitive data
portability. 4. Balanced cost
Enables flexible
workload
distribution.
Community Shared by multiple 1. Shared infrastructure - Cheaper than - Not highly Government
Cloud organizations with 2. Good security private- Good scalable- Higher community
similar needs. 3. Community-based collaboration- cost than public cloud,
Costs and 4. Managed Higher security Research
resources are internally/externally cloud
shared equally.

On-Demand Computing
On-demand computing is a cloud model where computing resources such as storage, servers, applications, and
processing power are provided whenever the user needs them, without human intervention.
Users can provision and release resources automatically based on demand.

Key Features:
1. Self-Service Provisioning – Users can create, run, or delete resources without contacting the provider.
2. Elastic Scaling – Resources automatically scale up or down based on workload.
3. Pay-per-Use Model – Users pay only for the amount of computing they consume.
4. Rapid Deployment – New applications or servers can be launched within minutes.
5. Automation – Resource allocation is handled by automated cloud systems.
6. Resource Pooling – Multiple users share a centralized resource pool efficiently.

Example:
1. A user instantly creates a virtual machine on AWS only when needed and shuts it down after use.
2. Netflix automatically adds more servers during peak streaming hours and reduces them when demand

how internet computing has transformed traditional computing models and allowed for scaalable solutions:

Traditional computing :
Traditional computing models refer to the on-premise IT infrastructure where organizations buy, install, and manage
their own hardware, software, servers, networking, and storage.
All computation and data processing occur within the organization’s physical premises.

Parts / Components of Traditional Computing Models


1. Hardware Infrastructure – Servers, desktops, storage devices, routers, switches.
2. Software & Operating Systems – Installed locally on company machines.
3. Networking Setup – LAN, intranet, firewalls, and physical cables.
4. IT Maintenance Team – Handles updates, repairs, monitoring.
5. Data Centers (On-Premise) – Physical rooms with power, cooling, security.

Network Based Systems:


Network-based systems are computing systems in which multiple devices, computers, or servers are connected through
a network (LAN, WAN, or Internet) to share resources, data, and services.
They enable communication, distributed processing, and remote access across interconnected systems.
PAN,LAN,WAN

3 main technologies that enabled dev of network based systems:


1. Virtualization Technology
Virtualization allows multiple virtual machines to run on a single physical machine by abstracting hardware resources.
Why it enabled network-based systems:
● Efficient use of hardware
● Easy creation, deployment, and migration of servers
● Supports scalability and multi-tenancy

2. High-Speed Networking (Broadband, Fiber, TCP/IP)


Advances in network technologies like TCP/IP protocol, high-speed internet, optical fiber, and wireless networks.
Why it enabled network-based systems:
● Fast data transfer across long distance
● Real-time communication between distributed systems
● Reliable internet connectivity for cloud applications

3. Web Technologies & Distributed Computing Frameworks


Technologies like the World Wide Web, web services (SOAP/REST), APIs, distributed systems, and service-oriented
architecture.
Why it enabled network-based systems:
● Standard ways to access resources over the internet
● Platforms to build scalable, distributed applications
● Interoperability between different systems

what is effective work load distribution,its contribution in distributed and cloud systems:
Effective workload distribution is the process of dividing tasks and computational loads across multiple systems or
servers in a balanced way so that no single machine is overloaded.
It ensures optimal performance, faster processing, and efficient resource utilization.

Key Points (easy to write):


● Distributes work evenly across all available nodes/servers.
● Avoids overload on any single machine.
● Improves speed, reliability, and system throughput.
● Uses algorithms like load balancing, scheduling, and task allocation.

Contribution in Distributed Systems


● Improves Performance
● Enhances Fault Tolerance
● Better Resource Utilization
● Supports Scalability
● Prevents Bottlenecks

Contribution in Cloud Systems


● Elastic Scaling
● Auto Load Balancing
● High Availability
● Cost Efficiency
● Global Performance

1. Key Technologies in Network-Based Systems


These foundational technologies enabled modern networked services (Cloud, Distributed, Internet):
● (a) Virtualization Technology: Enables resource pooling, isolation, and efficient hardware use by allowing
multiple virtual machines on a single physical system.
● (b) High-Speed Networking: Provides fast, reliable communication and data transfer via TCP/IP, Fiber, and
Broadband/Wireless.
● (c) Web Technologies & APIs: Enable applications to communicate over the web and allow platform-
independent access (using REST, SOAP, HTTP, JSON).
● (d) Distributed Computing Frameworks: Support parallel processing and large-scale task execution (e.g.,
Hadoop, MapReduce, Microservices).
● (e) Cloud Management Platforms: Tools for orchestration, deployment, scaling, and monitoring of resources
(e.g., AWS, Azure, Kubernetes).

2. Challenges in Network-Based Systems


Network-based architectures face inherent risks and technical limitations:
● (a) Security & Privacy Issues: Requires strong encryption to mitigate threats like unauthorized access, data loss,
and DDoS attacks.
● (b) Network Latency & Bandwidth Limitations: Performance is constrained by internet speed and can affect real-
time applications.
● (c) Reliability & Fault Tolerance: Failure of nodes or networks requires redundancy and backup strategies to
prevent service disruption.
● (d) Interoperability: Need to resolve compatibility issues between systems, APIs, and data formats built on
different platforms.
● (e) Scalability & Load Balancing: Requires adaptive resource management to handle sudden surges in users or
workload.
● (f) Cost Control: Pay-per-use models require constant monitoring to prevent unintended escalation of costs.

3. Important Considerations for Network-Based Systems


Effective management requires focus on these key operational areas:
● (a) Security Architecture: Implementing authentication, authorization, encryption, and firewall setup.
● (b) Performance Optimization: Using caching, load balancing, CDNs, and bandwidth management.
● (c) Reliability Measures: Deploying backups, replication, failover systems, and guaranteeing SLAs.
● (d) Scalability Planning: Defining policies for horizontal/vertical scaling and auto-scaling.
● (e) Resource Management: Ensuring efficient allocation of CPU, memory, and storage.
● (f) Compliance & Legal Concerns: Adhering to data location laws, privacy regulations, and industry standards.

Contribution of Network Monitoring and Management to Efficiency:


Network Monitoring: Continuous observation of network devices, traffic, and performance using tools.
Network Management: The process of controlling, configuring, and maintaining network resources.

● Ensures High Performance: Detects congestion and bottlenecks early to maintain smooth operations.
● Improves Reliability & Availability: Identifies failures quickly, enabling fast troubleshooting and minimizing
downtime.
● Supports Load Balancing: Monitors traffic across nodes to ensure efficient workload distribution and prevent
overloads.
● Enhances Security: Detects abnormal traffic (like DDoS) and helps apply security policies promptly.
● Capacity Planning: Tracks data usage trends to inform hardware upgrades and resource additions.
● Optimizes Resource Utilization: Ensures all network devices and resources operate within optimal limits,
preventing waste.
● Service Quality (QoS) Assurance: Maintains performance of critical applications by setting traffic priorities.

Client–Server Model & Its Role in Distributed Computing:


The client–server model is a network architecture where:
● Client → requests services (e.g., data, files, processing).
● Server → provides services and resources.
They communicate over a network using protocols.

2. Key Parts
● Client: User device/application sending requests.
● Server: Central machine providing data, storage, computation.
● Network: Medium enabling communication between them.
● Protocols: Rules (HTTP, FTP, TCP/IP) enabling data exchange.

Benefits
● Centralized control and management
● High security and data integrity
● Easy to scale by upgrading servers
● Simple to maintain and manage

Drawbacks
● Single point of failure (server)
● Server overload if too many clients
● Higher cost (powerful servers required)
● Limited fault tolerance unless replicated

Role in Distributed Computing


● (a) Enables Resource Sharing
● (b) Foundation for Distributed Systems
● (c) Supports Scalability
● (d) Simplifies Management
● (e) Improves Performance
● (f) Enhances Security
● (g) Fault Tolerance
● (h) Supports Multi-user Environment

Peer-to-Peer (P2P) Model


A distributed network model where every node acts as both client and server, sharing resources directly without central
control.
Components
● Peers: Each node stores data, sends/receives resources.
● Overlay Network: Logical connection between peers.
● Resource-Sharing Protocols: BitTorrent-like protocols.

Benefits
● No central point of failure
● Highly scalable (peers increase → capacity increases)
● Cost-effective (no central server needed)
● Balanced load (each peer contributes resources)

Drawbacks
● Security is weaker (no central control)
● Unreliable peers reduce performance
● Hard to manage and coordinate
● Data consistency is difficult to maintain

How Virtualization Enables Resource Management in Distributed Systems:


Virtualization creates virtual versions of hardware/resources, allowing multiple isolated environments to run on the same
physical machine.
How It Enables Resource Management
1. Resource Pooling: CPU, memory, storage from multiple machines are pooled into a single logical resource.
2. Isolation: Each VM runs independently → faults do not affect others.
3. Dynamic Allocation: Resources can be allocated or removed based on workload.
4. Load Balancing: VMs can be migrated to less-loaded machines.
5. Efficient Utilization: Prevents hardware underutilization → improves performance.

Impact on Scalability
● New VMs can be created instantly → quick expansion.
● Workloads can be moved/migrated → balanced scaling.
● Physical resources are used efficiently → supports large-scale distributed systems.

2. How Containers Differ from Traditional Virtual Machines & Their Impact on Scalability
Difference (Simple Points)
Feature Virtual Machines (VMs) Containers

OS Each VM has full OS Share host OS kernel

Size Heavy (GBs) Lightweight (MBs)

Startup Time Slow (minutes) Fast (seconds)

Isolation Strong hardware-level Process-level isolation

Resource Usage High overhead Low overhead

Impact on Scalability
● Containers scale faster (seconds) → more responsive to changing demand.
● Higher density: More containers can run on one machine than VMs.
● Lower cost scaling: Uses fewer resources.
● Ideal for microservices → each service scaled independently.

3. How Virtualization Supports Cloud Computing to Scale Dynamically


Key Points
1. On-Demand Provisioning
Cloud providers can create new VMs/containers instantly when workload increases.
2. Auto-Scaling
Cloud platforms monitor load (CPU/requests) and automatically add/remove VMs.
3. Elasticity
Virtual machines expand or shrink resources (CPU, RAM, storage) without downtime.
4. Live Migration
VMs can be shifted to another physical server without stopping—keeps services available during scale-out
operations.
5. Multi-Tenancy Support
Many users share the same hardware securely through virtual isolation → efficient large-scale operation.

Performance Metrics of Distributed Systems:


Metric Short Meaning Why Important
Latency Time taken to respond to a Lower latency = faster system
request response

Throughput Work/data processed per second Shows system capacity under load

Scalability Ability to handle growth by adding Ensures smooth performance as


resources users increase

Availability % of time system is up (uptime) Critical for reliability and continuous


service

Fault Tolerance System works even when parts Prevents service interruptions
fail

Consistency All nodes have same correct data Avoids errors and data mismatches

Load Balancing Even distribution of workload Prevents overload and improves


performance

Resource How efficiently resources are used Avoids wastage and ensures
Utilization optimal usage

Performance Overheads with Virtualization & How to Minimize Them:

Performance Overhead Meaning (Short) How to Minimize

CPU Overhead Extra CPU cycles used by Use hardware-assisted virtualization (Intel VT-x, AMD-
hypervisor V), optimized hypervisors

Memory Overhead Extra RAM used for multiple Use memory ballooning, deduplication, increase
VMs physical RAM

I/O Overhead Slower disk and network Use paravirtualized drivers, SSD storage, SR-IOV,
operations faster networking

Boot/Startup Delay VMs take longer to start Use lightweight VMs or containers

Context Switching Overhead Switching between VMs Limit excessive VM creation, use CPU pinning
causes delay

Network Function Virtualization (NFV) & Its Role in Enhancing Cloud Computing Systems:
Network Function Virtualization (NFV) replaces hardware-based network devices (routers, firewalls, load balancers)
with software-based virtual network functions (VNFs) running on standard servers.
It is an architectural concept that virtualizes entire classes of network node functions—such as firewalls, load balancers,
intrusion detection systems, and DNS—away from dedicated proprietary hardware and onto standard, industry-standard
commodity servers

Enhancement Explanation (Short)

1. Improves Scalability VNFs (Virtual Network Functions) can be scaled up/down instantly without
installing new hardware.

2. Reduces Costs Uses general commodity servers instead of expensive dedicated network
appliances, lowering CapEx.

3. Faster Deployment New network services (firewalls, VPNs, load balancers) can be launched in
minutes as software.

4. Flexibility & Agility Network functions can be moved, replicated, or updated via software without
downtime or physical reconfiguration.

5. Automation Integrates with orchestration tools for auto-scaling and creating self-healing
networks.

6. Better Resource Utilization Multiple virtual network functions efficiently share the same underlying
physical hardware resources.

Clustering is the technique of connecting multiple servers/computers to work together as a single system.
If one node fails or is overloaded, another node takes over—improving performance and reliability.

Various Clustering Techniques in Cloud Computing

Clustering Technique Short Meaning

Load-Balancing Cluster Distributes user requests across multiple nodes to avoid overload.

High-Availability (HA) Cluster Provides failover—if one node fails, another replaces it instantly.
High-Performance (HPC) Cluster Multiple nodes work in parallel for heavy computation tasks.

Storage Cluster Multiple storage nodes combine to form scalable, redundant storage
systems.

Grid/Distributed Cluster Nodes located in different locations coordinate to process large jobs.

Scalability: The ability of a system to handle an increasing amount of work by adding resources, typically measured
by how performance is sustained under heavier load.
System Reliability: The probability that a system or component will perform its required function without failure for
a specified period of time under stated conditions.
Availability: The percentage of time a system is operational and accessible to users when needed

Contribution to Scalability
● Horizontal Scalability: Clustering allows adding more nodes (servers) to increase capacity without changing the
existing system.
● Resource Pooling: The CPU, memory, and network resources of many servers are combined, enabling the
system to handle very high traffic.
● Near-Linear Performance Growth: As demand increases, new servers can be added, and the load is evenly
distributed, giving almost proportional performance improvement.
● Elasticity: Auto-scaling tools can automatically add or remove virtual/container nodes when workload crosses
thresholds, ensuring smooth performance during traffic spikes.

Contribution to System Reliability


1. Fault Tolerance:If a node or component fails, another node immediately takes over, preventing system
breakdown.
2. Redundancy of Nodes:Multiple nodes store and run the same services, so even if one node fails, the system
continues working.
3. Error Isolation: Failures are contained within a single node and do not affect others, improving overall system
stability.
4. Reliable Performance Under Failure: Workload is redistributed automatically to healthy nodes, ensuring
continuous operation.

Contribution to System Availability


1. High Availability (HA): Services run on multiple active nodes so users always have access even during failures.
2. Automatic Failover: When a node goes down, the system instantly routes requests to another node without
interrupting service.
3. No Single Point of Failure: Distributed deployment ensures the system stays online even if a node, storage
device, or network link fails.
4. Zero-Downtime Maintenance: Rolling updates allow upgrading or repairing one node at a time while other nodes
continue serving users.

An internet-based system is a software platform that uses the internet to allow users to access and interact
with applications and data without needing to install software locally

Technical Challenges That Impact Scalability of Internet-Based Systems:


Challenge Short Explanation (Exam-Ready)

1. Single Point of Failure If a critical component (database, firewall, load balancer) has no redundancy, its
(SPOF) failure stops the entire system. This reduces uptime, and without high availability,
true scaling is impossible.

2. Database Bottlenecks Databases are hard to scale horizontally because of consistency issues. As traffic
grows, the database becomes the slowest part, limiting application scaling.

3. Network Latency & As nodes spread across regions, data travel time increases. Higher latency and
Bandwidth Limits limited bandwidth slow response times and restrict the benefits of adding more
servers.

4. Inter-Service Communication Microservices require many internal API calls. Serialization, network delays, and
Overhead connection overhead accumulate and reduce throughput as the system grows.

5. Cache Invalidation Distributed caches must stay consistent. Updating cached data across many
Complexity nodes is complex and resource-heavy, slowing write operations and limiting
scaling.

6. Resource Contention CPU, RAM, and disk I/O become overloaded when many services share the
same hardware, restricting horizontal scaling.

7. Load Balancing Challenges Poor or uneven distribution of user traffic overwhelms some nodes while others
stay idle, preventing effective scaling.

Three Main Security Risks & How They Impact Scalability


1. DDoS Attacks (Distributed Denial of Service)
● Risk: Attackers flood servers with fake traffic.
● Impact on Scalability:

○ System resources get exhausted


○ Legitimate users get blocked
○ Auto-scaling may waste resources by adding more nodes to handle attack traffic

2. Data Breaches / Unauthorized Access


● Risk: Attackers steal or manipulate user data.
● Impact on Scalability:
○ Systems must add stronger authentication, encryption, monitoring → increases overhead
○ More security checks reduce performance under high load
○ Trust issues may force redesign of architecture

3. Malware / Injection Attacks (SQL injection, XSS, Ransomware)


● Risk: Attackers exploit vulnerabilities in applications or databases.
● Impact on Scalability:
○ Compromised nodes spread malware across scaled infrastructure
○ Services must use extra validation layers → increases latency
○ System may become unstable or require shutdown of multiple nodes

How Scalable Computing Over the Internet Improves Fault Tolerance & Reliability in Cloud Computing:
Scalable Computing over the Internet refers to the on-demand provisioning and de-provisioning of computing resources
(like CPU, memory, storage, and networking) delivered over a broad network (the Internet), which allows the system to
automatically and efficiently handle a massive, variable workload while maintaining performance and controlling costs

1. Enhanced Fault Tolerance


Fault tolerance means the system keeps running even when components fail. Cloud computing achieves this through:
● Redundancy & Replication: Data and applications are copied across multiple disks, servers, and availability
zones. If one fails, replicas take over instantly.
● Load Balancer Support: When an application server fails, traffic is automatically redirected to healthy servers.
● Decentralization: Resources are spread across global data centers, protecting the system from regional failures.
● Automatic Failover: Cloud systems monitor health and instantly shift workloads to standby or healthy nodes
without interruption.

2. Improved Reliability
Reliability means the system operates correctly for long periods. Cloud scalability improves reliability through:
● No Single Point of Failure: Distributed and replicated components ensure the system continues even if one part
breaks.
● Resource Pooling & Elasticity: Workloads move automatically to stable servers when resources fail or overload.
● Software-Defined Infrastructure: Automated, standardized deployments reduce human errors and keep systems
stable.
● High Availability SLAs: Cloud providers guarantee high uptime (like 99.99%) due to their scalable, reliable
architecture.

Best Practices for Securing VMs & Containers (with Energy Efficiency):

Containers are lightweight, isolated environments that package an application along with all its dependencies (libraries,
runtime, configuration) so it can run consistently across different systems.
They share the host OS kernel, making them faster, smaller, and more portable than virtual machines.

Energy efficiency in cloud computing refers to using cloud resources and data centers in a way that reduces power
consumption while maintaining high performance.
It focuses on optimizing hardware usage, improving cooling systems, using renewable energy, and deploying scalable
techniques (like virtualization, auto-scaling, and containers) to minimize wasted energy.

A. For Virtual Machines (VMs)

1. Minimal OS & Patch Management


2. Strong Access Controls
3. Network Security
4. Resource Right-Sizing
5. VM Isolation & Hardening

B. For Containers

1. Use Trusted & Minimal Base Images


2. Implement RBAC & Least Privilege
3. Image Scanning & Signing
4. Limit Resource Usage
5. Disable Unused Capabilities

C. Practices for Both (VMs + Containers)


1. Auto-Scaling & Load Balancing
● Scale up during demand, scale down during low usage.
● Reduces unnecessary running instances → major energy savings.

2. Continuous Monitoring (Lightweight)


● Use low-overhead monitoring tools.
● Detects anomalies without heavy CPU usage.

3. Network Encryption (Optimized)


● Use TLS 1.3 and hardware-accelerated encryption.
● Strong security with minimal power overhead.

4. Use Renewable-Powered Cloud Regions


● Many CSPs offer carbon-neutral data centers.
● Improves sustainability without affecting performance.

5. Automated Patching & Policy Enforcement


● Ensures consistent security while reducing manual, energy-wasting rework.

Energy efficiency in cloud computing involves using less energy to perform the same computational work by
optimizing hardware, software, and operational practices in data centers

Why Energy Efficiency is Important in Cloud Computing


1. Reduces Operational Costs
2. Improves System Performance
3. Environmental Sustainability
4. Better Resource Utilization

Techniques to Improve Energy Efficiency in CC:


Technique Description Impact

Virtualization & Running many Virtual Machines (VMs) or Maximizes host utilization (less idle
Consolidation Containers on a single physical server (host). power draw) and reduces the total
number of physical servers needed.

Dynamic Voltage and Automatically adjusting the CPU clock speed and Reduces energy consumption during
Frequency Scaling (DVFS) voltage based on the current workload. periods of low usage while
maintaining high performance when
needed.

Power-Aware Load Distributing workloads not just by CPU usage, but Saves significant power by de-
Balancing by power consumption. Consolidating workloads powering idle hardware rather than
onto fewer servers and putting the idle servers letting it sit active.
into a low-power sleep state (hibernation or deep
sleep).
Efficient Cooling Using free cooling (outside air when temperatures Reduces the massive power
Technologies allow), liquid cooling (more effective than air), and required for cooling, which can
optimizing data center temperature setpoints account for 40% or more of total data
(operating at higher ambient temperatures). center power consumption.

Server Hardware Using high-efficiency power supply units (PSUs) Reduces wasted power in
Optimization and low-power hardware components (e.g., solid- conversion and operation.
state drives (SSDs) instead of spinning disks, or
specialized, lower-power CPU architectures).

Applying Energy-Efficient Practices Without Compromising Security

● Use Hardware Acceleration: Enable CPU features like AES-NI to handle encryption in hardware, reducing
CPU usage and energy consumption.
● Efficient Monitoring & Logging: Use lightweight or agentless monitoring and filter logs at the source so only
important security events are processed.
● Minimal/Secure OS Images: Use minimal, hardened operating systems (e.g., Alpine, Distroless) to reduce
attack surface and avoid unnecessary background processes.
● Right-Sizing & Auto-Scaling Security: Deploy security tools (firewalls, IDS) as virtualized functions and apply
auto-scaling so they run at full capacity only when needed, saving energy during low-traffic periods.

How Energy-Efficient Data Centers Improve Security

1. Less Overheating = Fewer Failures: Efficient cooling reduces hardware crashes and downtime, improving
system stability and availability.
2. Stable Power = Higher Availability: Optimized power use lowers outages, strengthening overall cloud
reliability.
3. Reduced Attack Surface: Consolidation means fewer active servers, giving attackers fewer targets.
4. More Budget for Security: Savings from lower energy costs can be reinvested into advanced security tools and
monitoring.
5. Stronger Physical Security: Modern energy-efficient data centers include better access control, surveillance,
and fire protection.
Unit 2
● Virtualization is a computer architecture technology by which multiple virtual machines (VMs) are multiplexed in
the same hardware machine.
● The purpose of a VM is to enhance resource sharing by many users and improve computer performance in
terms of resource utilization and application flexibility.
● Hardware resources (CPU, memory, I/O devices, etc.) or software resources (operating system and software
libraries) can be virtualized in various functional layers.
● Virtualization is a technology that creates virtual versions of computing resources (servers, storage, networks)
using a hypervisor. It allows multiple virtual machines (VMs) to run on a single physical machine while remaining
isolated from each other.

Role of Virtualization in Cloud Computing:

1. Resource Pooling – Combines multiple physical resources (CPU, RAM, storage) into a shared pool that can be
allocated dynamically to many users.
2. Isolation – Each VM runs independently and securely, ensuring that faults or security breaches in one virtual
environment do not affect others.
3. Scalability – Virtual machines or containers can be created or removed quickly (elasticity) based on real-time
demand, allowing for rapid scaling.
4. Cost Efficiency – Reduces hardware requirements by consolidating many workloads onto fewer physical
servers, leading to lower capital and operational expenditure (OpEx).
5. Support for Multi-Tenancy – Enables multiple users or organizations (tenants) to securely share the same
physical infrastructure, which is foundational to the public cloud model.
6. On-Demand Self-Service – Allows users to provision resources automatically through APIs or web portals
without manual intervention from the service provider.
7. Measured Service – Provides the ability to track and monitor the consumption of virtualized resources,
enabling the pay-per-use and utility billing models.
8. Disaster Recovery (DR) and Business Continuity – Enables the easy creation of replicas and snapshots of
entire virtual systems, facilitating quick backup and recovery processes.
9. Live Migration – Allows a running virtual machine to be moved between different physical servers without
interrupting the service, enhancing maintenance efficiency and load balancing.
10. Hardware Abstraction – Decouples the software/OS from the underlying physical hardware, making
applications highly portable across different data centers or hardware vendors.

Significance of Virtualization in Data Centers


A. Automation
1. Automated Deployment – Automatically deploys virtual machines without manual setup.
2. Streamlined Management – Simplifies monitoring and maintenance of VMs.
3. Scheduled Tasks – Supports automated backups, updates, and patching.
4. Reduced Human Error – Minimizes mistakes in configuration and provisioning.
5. Rapid Provisioning – Quickly creates or removes resources as needed.

B. Efficiency
1. Hardware Optimization – Runs multiple workloads on the same physical server.
2. Energy Saving – Reduces power consumption by consolidating resources.
3. Cost Reduction – Lowers maintenance and operational expenses.
4. Performance Improvement – Balances workloads for better server performance.
5. Space Utilization – Minimizes need for extra physical hardware.

C. Scalability
1. On-Demand Resource Scaling – Increases or decreases resources as needed.
2. Elastic Infrastructure – Supports cloud elasticity for fluctuating workloads.
3. Rapid Expansion – Easily adds new virtual machines without new hardware.
4. Flexible Workload Management – Allocates resources dynamically to tasks.
5. Future-Proofing – Adapts to growing business or application requirements.

D. Resource Utilization
1. Maximum CPU Usage – Optimizes processor capacity across VMs.
2. Memory Optimization – Shares RAM efficiently among multiple virtual machines.
3. Storage Efficiency – Uses available storage effectively.
4. Reduced Idle Resources – Minimizes unused hardware to improve performance.
5. Balanced Workloads – Ensures all resources are utilized proportionally.

A Virtual Machine (VM) is a software-based emulation of a physical computer. It runs an operating system and
applications just like a physical machine, but is isolated from the underlying hardware and other VMs.
Key point: VMs are created and managed by a hypervisor or virtual machine monitor (VMM).
Working: The hypervisor abstracts hardware and allocates CPU, memory, storage, and network to each VM. Multiple
VMs can run on a single physical host.
Benefits:
● Isolation between VMs
● Efficient resource utilization
● Supports testing, development, and multi-OS environments

The Instruction Set Architecture (ISA) Level is the lowest form of virtualization implementation, focusing on
translating or emulating CPU instructions.

At the ISA level, virtualization is performed by emulating a given Instruction Set Architecture (ISA) by the ISA of the
host machine. This means the system runs software compiled for one type of CPU on a computer with a different CPU.
● Working: This approach uses software translation (like binary translation or interpretation) to convert the
instructions of the guest architecture into instructions that the host processor can understand and execute. This
process creates a Virtual ISA (V-ISA).
Benefits
● High Portability: It enables running a large amount of legacy binary code written for various processors on any
given new hardware host machine.
● Architecture Agnostic: It allows the software to be entirely independent of the underlying physical processor
architecture.

Drawbacks/Overheads
● Extremely High Performance Overhead: The constant, instruction-by-instruction interpretation and translation
process consumes significant CPU resources, leading to very slow execution compared to running the software
natively.
● Slower Execution: The time required to translate instructions adds considerable latency, severely impacting
performance.

The Hardware Abstraction Layer (HAL) Level is the classic method of system virtualization, where a hypervisor
creates virtual hardware environments for VMs.

Hardware-level virtualization is performed right on top of the bare hardware (Type 1) or on top of a host OS (Type 2).
The virtualization layer (Hypervisor) creates a uniform, virtual hardware environment for each Virtual Machine (VM).
● Working: The hypervisor manages the underlying physical resources, such as processors, memory, and I/O
devices. It presents a virtualized view of the hardware to the Guest OS, which then runs as if it were on a
dedicated machine.

Benefits
● Strong Isolation: Each VM is fully isolated from others, providing a high level of security and fault tolerance.
● Hardware Independence: It facilitates the easy migration of entire VMs between different physical hosts.
● Consolidation: It efficiently consolidates multiple virtual workloads onto fewer physical machines.

Drawbacks/Overheads
● Memory Overhead: Each VM must include its own full copy of the operating system (Guest OS), consuming
significant RAM.
● I/O Overhead: Network and storage Input/Output operations often require interception and translation by the
hypervisor, adding latency.

The Operating System (OS) Level is the basis of containerization, where environments are isolated using OS kernel
features.

This refers to an abstraction layer between the traditional OS and user applications. OS-level virtualization creates
isolated containers on a single shared physical server that all use the same underlying OS kernel instance.
● Working: It utilizes OS features (like Linux namespaces and cgroups) to isolate containers, giving them their
own file system, process tree, and network interface. The containers behave like real servers but without the
overhead of a full separate kernel.

Benefits
● Lightweight and Fast: Containers start up extremely quickly (seconds/milliseconds) and have minimal
overhead since they don't require a full Guest OS.
● High Density: Allows for the allocation of hardware resources among a large number of mutually distrusting
users in a very efficient manner.
● Resource Utilization: Maximizes resource utilization on a single physical server.

Drawbacks/Overheads
● Limited OS Types: All containers must share the same host OS kernel; you cannot run a Linux container on a
Windows host without an intermediate translation layer.
● Weaker Isolation: Isolation is maintained at the kernel level, which is less secure than the full hardware
isolation provided by HAL-level virtualization.

The Library Support Level (or User-Level API Level) virtualizes the communication link between applications and the
OS.

Most applications use APIs exported by user-level libraries. This level of virtualization involves controlling the
communication link between applications and the rest of the system through API hooks.
● Working: An intermediate layer intercepts the application's API calls and translates them into compatible calls
recognized by the host operating system. This makes the application believe it is running in its intended
environment.

Benefits
● Application Compatibility: Enables applications written for one OS to run on another without requiring full OS-
level or hardware virtualization.
● Lower Overhead: Generally incurs less overhead than running a full VM.

Drawbacks/Overheads
● Limited OS Feature Access: The translation layer may not support all complex system calls or OS features.
● Compatibility Issues: The translation may not be 100% accurate or complete for all applications.

The User-Application Level (or Process-Level Virtualization) runs applications within a controlled runtime environment.

Application-level virtualization is also known as process-level virtualization and often involves deploying High-Level
Language (HLL) VMs.
● Working: The application code is compiled into an intermediate bytecode (e.g., Java bytecode). The
virtualization layer (like the JVM or .NET CLR) sits as an application program on top of the host OS and
executes this bytecode. Other forms involve wrapping the application in an isolated layer (sandboxing).

Benefits
● Maximum Portability: The application code (bytecode) is entirely OS and hardware-independent ("Write Once,
Run Anywhere").
● Security and Sandboxing: The runtime environment provides inherent security by isolating the application
process from the host OS and other applications.

Drawbacks/Overheads
● Performance Overhead: The interpretation or Just-In-Time (JIT) compilation of bytecode by the runtime
environment adds some processing overhead.
● Requires Runtime: The host system must have the specific runtime environment (e.g., JVM) installed.

Virtualization Support at the OS Level

● Cloud computing faces two major challenges:


1. The ability to use a variable number of physical machines and VM instances depending on workload
requirements.
2. The slow process of instantiating new virtual machines.
● Currently, new VMs are created either as fresh boots or as clones of a template VM, and they are unaware of
the current application state.

Why OS-Level Virtualization

● OS-level virtualization inserts a virtualization layer inside the operating system to partition a machine’s physical
resources.
● It allows multiple isolated virtual machines (VMs) to run within a single operating system kernel.
● These VMs are also called Virtual Execution Environments (VEEs), Virtual Private Systems (VPS), or simply
containers
● To users, these VEs appear as real servers.
● Each VE has its own processes, file system, user accounts, network interfaces with IP addresses, routing tables,
firewall rules, and other configuration settings.
● Although VEs can be customized individually, they all share the same OS kernel.
● Hence, OS-level virtualization is known as single-OS-image virtualization.

Advantages of OS Extensions

● Compared to hardware-level virtualization, OS-level virtualization has two major advantages:


1. Minimal startup/shutdown overhead, low resource usage, and high scalability.
2. Ability for a VM and its host environment to synchronize state changes when required.
● These advantages are enabled by two mechanisms:
1. All OS-level VMs on a physical machine share the same operating system kernel.
2. The virtualization layer allows processes inside VMs to access many host resources but prevents them
from modifying those resources

Disadvantages of OS Extensions

● The primary disadvantage is that all OS-level VMs on a container must use the same type of guest operating
system.
● Although VMs may use different distributions, they must belong to the same OS family.
● For example, a Windows XP VM cannot run inside a Linux-based container.
● Cloud users often prefer different operating systems (Windows, Linux, etc.), creating limitations for OS-level
virtualization

Virtualization on Linux or Windows Platforms

● Most Linux platforms are not tied to a special kernel, allowing multiple VMs to run simultaneously on the same
hardware.
● Two OS tools—Linux vServer and OpenVZ—support Linux platforms in running other platform-based
applications via virtualization.
● A third tool, FVM, was developed specifically for OS-level virtualization on Windows NT platforms.

Middleware Support for Virtualization


● Library-level virtualization (also known as user-level ABI or API emulation) creates execution environments for
running alien programs on a platform without running a complete OS-level VM.

● It works mainly through API call interception and remapping.


● Examples of library-level virtualization systems include:
○ Windows Application Binary Interface (WABI)
○ lxrun
○ WINE
○ Visual MainWin
○ vCUDA

vCUDA CUDA – Compute Unified Device Architecture

● vCUDA uses a client–server model for CUDA virtualization.


● It contains three user-space components:
1. vCUDA library
2. Virtual GPU (vGPU) in the guest OS — client
3. vCUDA stub in the host OS — server
● The vCUDA library is placed in the guest OS as a replacement for the standard CUDA library.
● It intercepts and forwards API calls from the guest to the host stub.
● It also creates and manages virtual GPUs (vGPUs).

Feature Linux Virtualization Windows Virtualization

Primary hypervisor KVM (built-in) Hyper-V (built-in), or third-party software

Popular tools KVM, QEMU, VirtualBox Hyper-V, VirtualBox, VMware

Setup complexity Can be more complex with command-line Easier with a GUI for built-in options like
interfaces, but KVM is integrated into the kernel Hyper-V and user-friendly third-party tools

Use case Server consolidation, cloud environments, Running other OSes on a desktop, testing,
development environments development
How VM and VMM Work Together to Achieve Virtualization :
The Virtual Machine (VM) and Virtual Machine Monitor (VMM)/Hypervisor work together in a layered architecture to
virtualize hardware, isolate resources, and allow multiple operating systems to run on one physical machine.
Roles
● Virtual Machine (VM):
A software-based computer with virtual CPU, memory, and storage. It runs its own guest OS, which believes it
controls real hardware.
● Virtual Machine Monitor (VMM) / Hypervisor:
The software layer between the VMs and the physical hardware. It creates, manages, and monitors VMs and
handles all hardware access on their behalf.
● Host Hardware:
The actual physical server whose resources (CPU, RAM, disk, network) are controlled by the VMM.

How They Work Together


1. Resource Allocation
● VMM allocates and manages CPU, memory, and I/O resources.
● VMs use the virtual resources provided by the VMM.
● Ensures fair sharing and prevents any VM from overusing resources.

2. Instruction Interception
● Guest OS issues privileged instructions assuming full hardware control.
● VMM intercepts these instructions and safely executes or emulates them.
○ Full virtualization: VMM traps and translates instructions.
○ Paravirtualization: Guest OS uses hypercalls to request services from the VMM.

3. Isolation and Security


● VMM enforces strong isolation so VMs cannot access or affect each other.
● Faults or attacks inside one VM do not impact other VMs or the host.

Structures / Tools for Implementing Virtualization:


1. Types of Virtualization Architectures
Depending on where the virtualization layer is placed, three VM architectures exist:
● Hypervisor-based virtualization
● Paravirtualization
● Host-based virtualization
A hypervisor (VMM) performs the main virtualization operations.

2. Hypervisor and Xen Architecture


Hypervisor
● Provides hypercalls for guest OS and applications.
● Two types:
○ Microkernel hypervisor (e.g., Microsoft Hyper-V): contains only core functions like memory management
and CPU scheduling; device drivers remain outside.
○ Monolithic hypervisor (e.g., VMware ESX): includes drivers and all virtualization functions.
Xen Hypervisor
● Open-source microkernel hypervisor.
● Very small because it has no native device drivers.
● Allows guest OSes direct access to hardware through a safe mechanism.
● Used in Citrix XenServer and Oracle VM.

3. Full Virtualization (Binary Translation)


Concept
● Hardware virtualization can be:
○ Full virtualization
○ Host-based virtualization
Full Virtualization
● Guest OS runs unmodified
● Uses binary translation to trap sensitive instructions.
● Noncritical instructions execute directly on hardware; critical ones are trapped to the VMM.
● Guest OS is unaware of virtualization.

Binary Translation Process
● VMM scans the instruction stream.
● Privileged instructions are trapped and emulated.
● Combines direct execution + binary translation.

4. Host-Based Virtualization
● Virtualization software installs on top of the host OS (no OS modification required).
● Host OS provides device drivers and low-level services.
● Easy to deploy, flexible, but lower performance due to multiple layers.
● Requires binary translation when guest and host ISAs differ.

5. Paravirtualization
● Guest OS is modified to work with the VMM.
● Uses hypercalls instead of privileged instructions.
● Reduces overhead and improves performance.
● Issues: lower compatibility, harder OS maintenance.
● Used by Xen, KVM, and VMware ESX.

Compiler-Supported Paravirtualization
● Sensitive instructions replaced at compile time with hypercalls.
● Faster than runtime binary translation.

6. KVM (Kernel-Based Virtual Machine)


● Uses hardware-assisted paravirtualization.
● Supports unmodified OSes (Windows, Linux, Solaris, UNIX variants).
● Integrated directly into the Linux kernel.

Feature Hypervisor Architecture Full Virtualization Para-Virtualization Hardware-Assisted


Virtualization

Concept Software (VMM) that sits Guest OS runs Guest OS is modified Hardware provides
directly on hardware or unmodified using to communicate with virtualization support
host OS to run VMs binary translation + VMM using hypercalls (Intel VT-x, AMD-V) to
direct execution improve performance

Cloud Forms base layer for cloud Allows running legacy Used to optimize VM Used in modern clouds
Implementation data centers; manages OSes in cloud without performance in cloud to achieve near-native
multiple VMs efficiently modification platforms; lower VM performance
overhead

Guest OS Can run unmodified OS Runs unmodified OS Guest OS must be Runs unmodified OS
Requirement modified (hardware handles
traps)

Performance High (especially Type-1 Lower than para- Higher performance; Very high performance
hypervisors) virtualization due to reduced overhead (hardware assists
binary translation virtualization)
VMM Role VMM creates, manages VMM traps and VMM receives VMM uses hardware
VMs; schedules CPU, translates privileged hypercalls instead of instructions to handle
memory, device access instructions traps traps efficiently

Instruction Hypervisor intercepts all Binary translation Privileged instructions CPU directly supports
Handling privileged instructions replaces sensitive replaced by virtualization
instructions with safe hypercalls in the OS instructions (VMX/SVM
sequences mode)

Hardware Not mandatory Not mandatory Not mandatory Strong hardware


Dependence support required

Complexity Medium to high High (due to High (requires OS Lower (hardware


translation engine) modification) simplifies VMM)

Examples VMware ESXi, Microsoft VMware Workstation, Xen (PV mode), KVM KVM, Xen HVM,
Hyper-V, Xen, KVM VirtualBox (software with VirtIO drivers, Hyper-V (with VT-
mode), QEMU full VMware ESX PV x/AMD-V)
virtualization drivers

Use in Cloud Widely used in IaaS Used for compatibility Used for improving Used in modern cloud
platforms and OS support VM performance infrastructures (AWS,
GCP, Azure)

Virtualization in Multi-Core Processors

1. Virtualizing a multi-core processor is relatively more complicated than virtualizing a uni-core processor.
2. Though multicore processors are claimed to have higher performance by integrating multiple processor cores in
a single chip, muti-core virtualization has raised some new challenges to computer architects, compiler
constructors, system designers, and application programmers.
3. There are mainly two difficulties: Application programs must be parallelized to use all cores fully, and software
must explicitly assign tasks to the cores, which is a very complex problem.
Virtualization on multi-core processors is achieved by distributing virtual machines (VMs) across multiple cores and
using a Virtual Machine Monitor (VMM)/Hypervisor to manage hardware sharing.
Key Points:
1. Core-Level Parallelism
● Each VM can be assigned a dedicated core or share multiple cores.
● Multiple VMs run simultaneously without blocking each other.

2. Hypervisor Scheduling
● The hypervisor schedules VMs across cores using techniques like:
○ Time-slicing
○ Load balancing
○ Core affinity
● Ensures efficient use of CPU resources.

3. Hardware-Assisted Virtualization
● Modern multi-core CPUs include virtualization extensions (Intel VT-x, AMD-V).
● These extensions:
○ Speed up VM execution
○ Reduce hypervisor overhead
○ Enable safe execution of privileged instructions

4. Isolation of VMs
● Each VM runs independently on assigned cores.
● Faults in one VM do not affect VMs running on other cores.
5. Improved Performance
● VMs benefit from true parallel execution.
● Ideal for cloud computing where multiple customers’ workloads run concurrently.
6. Support for SMP (Symmetric Multi-Processing) VMs
● Multi-core processors allow creating multi-CPU virtual machines.
● A single VM can use multiple cores to run heavy workloads.

Feature Hardware-Based Virtualization Software-Based Virtualization

Definition Virtualization supported directly by Virtualization achieved using software


CPU hardware features (Intel VT-x, techniques like binary translation or host
AMD-V). OS support.

Hardware-Based Virtualization in Cloud Computing


● Used by modern cloud providers (AWS, Azure, GCP).
● Multi-core processors with Intel VT-x/AMD-V support allow fast VM creation.
● Hypervisors like KVM, Xen, Hyper-V use hardware extensions for:
○ Fast VM switching
○ Direct execution of guest OS
○ Secure isolation

B. Software-Based Virtualization in Cloud Computing


● Used when hardware assist is not available or for compatibility.
● Implemented using:
○ Binary translation (Full virtualization)
○ Host-based virtualization
○ Paravirtualization (Guest OS modified to use hypercalls)
● Tools like VirtualBox, VMware Workstation, and QEMU emulate hardware using software.

Physical and Virtual Clusters:

A physical cluster is a group of physically interconnected computers (nodes) that work together as a single system.
Each node has its own hardware, OS, and network interface.

A virtual cluster is a group of virtual machines (VMs) that behave like a cluster but run on top of physical servers
using virtualization technology. Multiple VMs can be created on the same physical machine and grouped logically as a
cluster.

Benefits Provided by Virtual Clusters


● High scalability (new VMs can be created instantly).
● Low cost (no need to buy more physical hardware).
● Efficient resource utilization using CPU, memory, and disk sharing.
● High availability through VM replication and snapshots.
● Portable and flexible (VMs can run on any physical machine).
● Energy-efficient because fewer physical servers are used.
● Easy to manage through hypervisors and cloud platforms.
Feature Physical Cluster Virtual Cluster

Basic Unit Physical machines Virtual machines

Scalability Limited (requires adding more physical Highly scalable (can create more VMs
hardware) instantly from a pool)

Cost High (due to dedicated physical Low cost (uses existing hardware, high
hardware cost) utilization)

Flexibility Low (static hardware configuration) Very flexible (VMs can be created,
moved, and destroyed quickly)

Deployment Time Slow (requires hardware installation and Fast (VM creation typically within
OS setup) minutes)

Resource Utilization Low to moderate High (better sharing of CPU, memory,


and storage)

Fault Tolerance Hardware dependent (requires separate VM snapshots, live migration support
hardware redundancy) (software-defined failover)

Management Harder (needs physical maintenance Easier (centralized management via the
and access) hypervisor)

Live Migration in Virtual Clusters & Its Importance


Live migration is the process of moving a running VM from one physical server to another without shutting it down.

Steps:
Pre-Migration: Select target host and check resource availability.
Memory Pre-Copy: Copy all VM memory pages from source to target while VM runs.
Iterative Copy: Re-copy dirty (modified) pages until they become minimal.
Stop-and-Copy: Pause VM briefly and transfer remaining memory + CPU/device state.
Switchover: Resume VM on target host.
Post-Migration: Clean up resources on the source.
Importance
● Zero downtime during maintenance or upgrades.
● Load balancing — VMs moved from overloaded servers to underloaded ones.
● Fault tolerance — prevents service disruption if hardware fails.
● Energy saving — lightly loaded servers can be turned off after migrating VMs.
● Improves cloud reliability and performance.

Live VM Migration Steps and Performance Effects

v In a cluster built with mixed nodes of host and guest systems, the normal method of operation is to run everything on the
physical machine.

v When a VM fails, its role could be replaced by another VM on a different node, as long as they both run with the same
guest OS.

v In other words, a physical node can fail over to a VM on another host.

v This is different from physical-to-physical failover in a traditional physical cluster.

v The advantage is enhanced failover flexibility.

v The potential drawback is that a VM must stop playing its role if its residing host node fails.

v However, this problem can be mitigated with VM life migration.

v Figure shows the process of life migration of a VM from host A to host B.

v The migration copies the VM state file from the storage area to the host machine.
v As shown in Figure, live migration of a VM consists of the following six steps:

Steps 0 and 1: Start migration.

v This step makes preparations for the migration, including determining the migrating VM and the destination host.

v Although users could manually make a VM migrate to an appointed host, in most circumstances, the migration is
automatically started by strategies such as load balancing and server consolidation.

Steps 2: Transfer memory.

v Since the whole execution state of the VM is stored in memory, sending the VM’s memory to the destination node ensures
continuity of the service provided by the VM.

v All of the memory data is transferred in the first round, and then the migration controller recopies the memory data which
is changed in the last round.

v These steps keep iterating until the dirty portion of the memory is small enough to handle the final copy.

v Although precopying memory is performed iteratively, the execution of programs is not obviously interrupted.

Step 3: Suspend the VM and copy the last portion of the data.

v The migrating VM’s execution is suspended when the last round’s memory data is transferred.

v Other nonmemory data such as CPU and network states should be sent as well.

v During this step, the VM is stopped and its applications will no longer run.

v This “service unavailable” time is called the “downtime” of migration, which should be as short as possible so that it can
be negligible to users.

Steps 4 and 5: Commit and activate the new host.


v After all the needed data is copied, on the destination host, the VM reloads the states and recovers the execution of
programs in it, and the service provided by this VM continues.

v Then the network connection is redirected to the new VM and the dependency to the source host is cleared.

v The whole migration process finishes by removing the original VM from the source host.

Key Considerations for Designing & Deploying Virtual Clusters in Cloud Computing
1. Resource Provisioning
● Ensure proper allocation of CPU, RAM, storage, and network.
● Support elastic scaling based on workload demand.

2. VM Placement Strategy
● Optimal placement of VMs on host machines to avoid hotspots.
● Balance performance, energy consumption, and fault tolerance.

3. Performance Isolation
● Prevent one VM’s workload from affecting others.
● Use hypervisor controls (CPU quotas, memory caps).

4. Network Configuration
● Design virtual networks (VLANs, SDN) for high throughput and low latency.
● Ensure proper routing, isolation, and bandwidth allocation.

5. Storage Management
● Use shared storage systems for VM images and migration support.
● Ensure high I/O performance and redundancy.

6. Security & Isolation


● Apply VM hardening, firewalls, and access control.
● Ensure hypervisor security to protect all VMs.

7. Live Migration Support


● Prepare shared storage and compatible hypervisor versions.
● Maintain low downtime and network consistency.

8. Fault Tolerance & High Availability


● Use redundancy, failover mechanisms, and auto-recovery.
● Ensure cluster consistency even when hosts fail.

9. Monitoring & Management Tools


● Track VM performance, health, and resource usage.
● Automate scaling, placement, and failure detection.

10. Cost Optimization


● Optimize VM density without over-provisioning.
● Use pay-as-you-go models to reduce operational costs.

Resource Management Techniques in Virtual Clusters


Resource management decides how CPU, memory, storage, and network bandwidth are allocated to VMs.
Techniques:
1. Resource Allocation Policies
○ Fixed allocation (static)
○ Dynamic allocation (changes based on workload)
2. Load Balancing
○ Distributes VMs across hosts to avoid overload.
3. Capacity Planning
○ Predicting future resource needs and provisioning accordingly.
4. Admission Control
○ Deciding when to allow/deny creation of new VMs.
5. Resource Monitoring
○ Tracking CPU, memory, storage usage to ensure efficiency.
6. VM Migration
○ Used to balance load and improve utilization.

6. How Resource Scheduling and Allocation Work in a Virtual Cluster


Step-by-Step Process
1. Monitoring
○ Hypervisor monitors VM usage (CPU, memory, I/O).
2. Scheduling
○ VMM decides which VM runs on which physical core.
○ Uses CPU scheduling (time-slicing, fair share, priority scheduling).
3. Allocation
○ VMM assigns specific CPU cores, RAM, and disk to VMs based on policies.
4. Dynamic Adjustment
○ Resources can be increased or reduced depending on workload.
5. Load Balancing
○ VMs may be moved via live migration to balance resource usage.
6. Enforcement
○ Hypervisor ensures a VM cannot exceed its allocated resources.

7. Advantages of Using Virtual Clusters for Resource Management in Cloud


● Elastic scaling (resources can grow or shrink automatically).
● Better utilization of processors and memory.
● Cost savings because fewer physical servers are needed.
● High availability using clustering + live migration.
● Improved performance through dynamic allocation and load balancing.
● Easy isolation of workloads, improving security.
● Fast deployment of new cluster nodes (VMs).
● Energy efficiency by consolidating workloads.

Serverless computing is a cloud execution model where a cloud provider manages the underlying servers,
allowing developers to build and run applications without provisioning or managing infrastructure
Benefits/Features of Serverless Computing in the Context of Virtualization
● No server management: Cloud provider handles provisioning, scaling, and maintenance.
● Auto-scaling: Functions scale instantly based on demand.
● Pay-per-use: Charges only for execution time, improving cost efficiency.
● High availability: Built-in fault tolerance and distributed execution.
● Fast deployment: Developers only upload code; infrastructure setup is removed.
● Improved resource utilization: Provider runs functions in highly optimized, multi-tenant virtualized environments.

Storage virtualization is the process of abstracting physical storage devices (HDDs/SSDs) into a single unified logical
storage pool that applications and VMs can use
.
How it Facilitates Data Management in Cloud
● Centralized management: All storage appears as one pool.
● Improved scalability: New storage can be added seamlessly.
● Better utilization: Dynamic allocation avoids unused space.
● High availability: Supports replication, snapshots, and failover.
● Simplified backup & recovery: Logical storage makes data migration easier.
● Efficient VM operations: VM migration, cloning, and snapshotting depend on
virtualized storage.

Tools for Storage Virtualization


● VMware vSAN
● OpenStack Cinder
● Ceph Storage
● IBM SAN Volume Controller (SVC)
● NetApp ONTAP

Virtualization uses a hypervisor to create full virtual machines (VMs), each with its own operating system,
making them resource-intensive but offering strong isolation and the ability to run different OSs on one
machine. Containerization, on the other hand, virtualizes the operating system to run applications in
lightweight, isolated containers that share the host OS kernel, resulting in faster startup times and lower
resource usage, but less security and the limitation that all containers on a host must use the same OS

Containerization virtualizes the operating-system level, allowing multiple isolated applications to run on the same
kernel. It enables:
● Lightweight virtualization (faster than VMs).
● Portable application deployment across environments.
● Rapid scaling of microservices.
● Isolation of applications through namespaces & cgroups.
● Consistent runtime environment (same image everywhere).
● Efficient resource utilization on cloud VMs.
Popular Containerization Tools
● Docker
● Kubernetes (orchestration)
● Podman
● LXC/LXD
● Containerd
● OpenShift

How Application Virtualization Enables Deployment Across Multiple OS/Devices


Application virtualization separates an application from the underlying OS by encapsulating it into a virtual container or
runtime environment.
It provides:
● Cross-OS compatibility: Apps can run on different operating systems without modification.
● No installation required: Apps execute in isolated virtual environment.
● Mobility: Apps can be streamed or deployed over cloud to any device.
● Conflict-free execution: Avoids DLL conflicts, registry issues, or OS-specific dependencies.
● Centralized updates: Only need to update the virtualized application once.

This enables cloud providers to deliver apps to Windows, Linux, MacOS, mobile devices, etc., via virtualized runtimes.

CPU Virtualization & Its Role in VM Management


CPU virtualization allows a physical CPU to be abstracted into multiple virtual CPUs (vCPUs) so different VMs believe
they have full access to the processor.
Role in Virtual Machine Management
● Sharing CPU resources: Multiple VMs run concurrently on the same processor.
● Scheduling: Hypervisor schedules CPU time slices among VMs.
● Isolation: Each VM gets its own virtual CPU, protected from others.
● Privileged instruction handling: Sensitive instructions are trapped and handled by the VMM.
● Performance improvement: Hardware-assisted virtualization (Intel VT-x, AMD-V) reduces overhead.
● Load balancing: Hypervisor maps vCPUs to physical cores for optimal performance.
● Support for multi-core VMs: Multiple vCPUs simulate multi-core processors inside a VM.

Important Points
● A CPU is virtualizable if the VM’s privileged and unprivileged instructions can run in user mode, while the VMM
runs in supervisor mode.
● When a VM executes privileged, control-sensitive, or behavior-sensitive instructions, they are trapped by the
VMM.
● The VMM acts as a unified mediator to manage and validate all CPU accesses from multiple VMs, ensuring
correctness and system stability.
● RISC-based CPUs are naturally virtualizable because all sensitive instructions are privileged.
● x86 architectures were originally not designed for virtualization, requiring techniques like binary translation or
hardware extensions (VT-x, AMD-V).

Software Virtualization Tools


Software virtualization tools create virtual machines, virtual applications, or virtual environments using software-based
techniques. These tools abstract hardware, OS, or application layers.
Common Software Virtualization Tools
● VMware Workstation / VMware Fusion – Desktop virtualization.
● Oracle VirtualBox – Open-source desktop VM tool.
● Microsoft Hyper-V – OS-level hypervisor.
● KVM (Kernel-Based Virtual Machine) – Linux kernel virtualization module.
● Xen / XenServer – Paravirtualization and full virtualization support.
● QEMU – Hardware emulator and virtualizer.
● Parallels Desktop – Virtualization for macOS.
● VMware ThinApp – Application virtualization.
● Microsoft App-V – Application streaming/virtualization.
● Docker, LXC, Containerd – OS-level container virtualization.
● OpenVZ – Linux container-based virtualization.
Data Virtualization refers to abstracting and presenting data from multiple sources as a unified view without moving
or copying the data.
It enables real-time integration, centralized access, and easier data management in cloud environments.

1. Tools Used for Data Virtualization


➤ IBM Cloud Pak for Data / IBM Infosphere Federation Server
➤ Cisco Data Virtualization
➤ Oracle Data Service Integrator
➤ Microsoft SQL Server PolyBase

Here are the key aspects of Data Virtualization in a very short format:

Data Virtualization: Key Functions


1. Abstraction Layer: Creates a virtual data layer above heterogeneous data sources (databases, cloud storage,
files, APIs). Applications interact with this unified layer.
2. Data Integration Without Physical Movement: Data is not copied or migrated (unlike ETL). Queries are sent to
source systems in real-time.
3. Query Federation: Breaks a user query into multiple sub-queries, sends them to corresponding sources, and
then combines the results into a single unified response.
4. Metadata Management: Maintains unified metadata catalogs for all sources and presents data in a standardized
format (tables, views, APIs).
5. Data Security & Governance: Provides centralized control for security (authentication, authorization, masking,
encryption) and rule-based data access.
6. Caching & Optimization: Tools use caching, indexing, and query optimization techniques to improve
performance and reduce the load on underlying physical data sources.

steps for ensuring security in virtualized data centers:


1. Harden the Hypervisor
2. Secure VM Images and Templates
3. Implement Strong Access Control & Authentication
4.. Use Virtual Firewalls and IDS/IPS
5. Encrypt Data at Rest and in Transit
6. Monitor and Audit VM Activities
7. Encrypted Live Migration
8 Employ Role-Based Access Control (RBAC)
9. Backup and Disaster Recovery Planning

Impact of Virtualization on CPU, Memory, and I/O Device Performance


CPU Impact
● Overhead from instruction trapping: Privileged instructions are intercepted by the VMM, causing additional
processing time.
● Context-switch overhead: Switching between VMs increases CPU scheduling latency.
● vCPU to pCPU mapping challenges: If many vCPUs share limited physical CPUs, performance drops.
● Improvement with hardware virtualization: Intel VT-x and AMD-V reduce CPU overhead significantly.

Memory Impact
● Memory overcommitment: Hypervisor allocates more virtual memory than physically available, causing
swapping.
● Additional metadata: Page tables, shadow paging, and mapping structures increase memory overhead.
● NUMA effects: Poor placement of VM memory across NUMA nodes reduces throughput.
● Ballooning and compression: Techniques may introduce latency during memory reclamation.

I/O Device Impact


● I/O bottleneck: All device access must pass through the VMM or device emulator.
● High latency: Emulated devices cause slower disk and network operations.

● Interrupt handling overhead: Additional layers increase interrupt and DMA processing delays.
● Hardware-assisted I/O: SR-IOV and VT-d significantly reduce overhead by enabling direct device access.

2. How I/O Virtualization Enables Better Resource Utilization


Key Benefits
● Shared I/O devices: Multiple VMs can share a single NIC, disk, or GPU, reducing hardware cost.
● Direct device assignment: Technologies like SR-IOV and PCI passthrough allow VMs to access hardware
directly, improving throughput.
● Load balancing: Hypervisor distributes I/O operations across available physical devices.
● Isolation: Each VM gets separate virtual functions (VFs) preventing interference.
● Efficient multiplexing: The VMM efficiently handles I/O requests from many VMs, improving utilization of
network/bus bandwidth.

How It Works
● VMM intercepts, queues, and schedules I/O requests.
● Uses buffering and caching to reduce latency.
● Employs device emulation or paravirtualized I/O drivers

I/O Virtualization

v I/O virtualization involves managing the routing of I/O requests between virtual devices and the shared physical
hardware.

v At the time of this writing, there are three ways to implement I/O virtualization: full device emulation, para-virtualization,
and direct I/O.

v Full device emulation is the first approach for I/O virtualization.

v Generally, this approach emulates well-known, real-world devices. Direct I/O virtualization lets the VM access devices
directly.

v It can achieve close-to-native performance without high CPU costs.

v However, current direct I/O virtualization implementations focus on networking for mainframes.

v There are a lot of challenges for commodity hardware devices.

v For example, when a physical device is reclaimed (required by workload migration) for later reassignment, it may have
been set to an arbitrary state (e.g., DMA to some arbitrary memory locations) that can function incorrectly or even crash
the whole system.

v Since software-based I/O virtualization requires a very high overhead of device emulation, hardware-assisted I/O
virtualization is critical.
Impact of Virtualization on CPU, Memory, and I/O Device Performance:
CPU Impact
● Overhead from instruction trapping: Privileged instructions are intercepted by the VMM, causing additional
processing time.
● Context-switch overhead: Switching between VMs increases CPU scheduling latency.
● vCPU to pCPU mapping challenges: If many vCPUs share limited physical CPUs, performance drops.
● Improvement with hardware virtualization: Intel VT-x and AMD-V reduce CPU overhead significantly.

Memory Impact
● Memory overcommitment: Hypervisor allocates more virtual memory than physically available, causing
swapping.
● Additional metadata: Page tables, shadow paging, and mapping structures increase memory overhead.
● NUMA effects: Poor placement of VM memory across NUMA nodes reduces throughput.
● Ballooning and compression: Techniques may introduce latency during memory reclamation.

I/O Device Impact


● I/O bottleneck: All device access must pass through the VMM or device emulator.
● High latency: Emulated devices cause slower disk and network operations.
● Interrupt handling overhead: Additional layers increase interrupt and DMA processing delays.
● Hardware-assisted I/O: SR-IOV and VT-d significantly reduce overhead by enabling direct device access.

2. How I/O Virtualization Enables Better Resource Utilization


Key Benefits
● Shared I/O devices: Multiple VMs can share a single NIC, disk, or GPU, reducing hardware cost.
● Direct device assignment: Technologies like SR-IOV and PCI passthrough allow VMs to access hardware
directly, improving throughput.
● Load balancing: Hypervisor distributes I/O operations across available physical devices.
● Isolation: Each VM gets separate virtual functions (VFs) preventing interference.
● Efficient multiplexing: The VMM efficiently handles I/O requests from many VMs, improving utilization of
network/bus bandwidth.
How It Works
● VMM intercepts, queues, and schedules I/O requests.
● Uses buffering and caching to reduce latency.
● Employs device emulation or paravirtualized I/O drivers (like VirtIO) for faster performance.

1. Virtualization for Data-Center Automation


● Automates allocation of hardware, software, and databases to millions of users.
● Driven by virtualization + cloud computing.
● Supports high availability, workload balancing, backups, and scaling.

2. Server Consolidation in Data Centers


● Workloads are heterogeneous: chatty (bursty) and non-interactive (HPC).
● Traditional static allocation → servers remain underutilized.
● Virtualization-based consolidation reduces number of physical servers.
● Benefits:
○ Better hardware utilization.
○ Faster provisioning (VM cloning).
○ Reduced cost, footprint, power, cooling.
○ Improved availability + easy VM mobility.
● Requires efficient multi-level scheduling (VM, server, data center levels).
3. Virtual Storage Management
● Storage virtualization meaning changes in VM environments.
● Two data types: VM images and application data.
● Key concepts: encapsulation + isolation of VMs.
● Storage becomes a bottleneck because many VMs share the same disk.
● VMM handles storage requests, making storage management more complex than traditional OS.

4. Cloud OS for Virtualized Data Centers


● VI managers used to create and manage VMs + virtual clusters.
● Tools: Nimbus, Eucalyptus, OpenNebula (open-source), vSphere 4 (proprietary).
● All support virtual networking; only vSphere supports virtual storage + strong data protection.
● Hypervisors used: Xen, KVM (open-source tools), ESX/ESXi (VMware).

5. Trust Management in Virtualized Data Centers


● VMM provides isolation but becomes a security-critical component.
● Management VM is high-privilege → if compromised, whole system is at risk.
● VM rollback threatens randomness → session key reuse → cryptographic vulnerabilities.
● Replay can break TCP sequence numbers, enabling TCP hijacking.

6. VM-Based Intrusion Detection


● IDS types: HIDS (host-based) and NIDS (network-based).
● Virtualization-enhanced IDS isolates VMs to limit damage.
● IDS can be:
○ Running inside each VM, or
○ Integrated with the VMM at high privilege.
● Combines advantages of HIDS + NIDS.
● Includes policy engine + policy module for monitoring events.
● Logs must remain untampered even if OS is compromised.
● Honeypots/honeynets used to trap attackers; can be virtual or physical.
Unit 3
Service-Oriented Architecture (SOA) is an architectural model in which application components provide services to
other components through standardized, network-accessible interfaces.
These services are loosely coupled, reusable, and platform-independent.

Aspect SOA (Service-Oriented Architecture) Traditional Software Architecture


(Monolithic / Tightly Coupled)

Design Approach Service-based, modular, loosely coupled Application-based, tightly integrated

Integration Uses standard protocols Uses proprietary or tightly coupled APIs


(SOAP/REST/XML/JSON)

Interoperability High—works across platforms & languages Low—requires same platform or vendor

Scalability Highly scalable—services scale independently Limited—entire application must scale

Reusability Services can be reused across applications Low reusability; logic tied to application

Deployment Independent deployment of services Entire system must be deployed together

Maintenance Easy to update/replace individual services Complex—requires modifying the whole


system
Communication Through service bus or API calls Direct function calls within program

Cloud Suitability Ideal for cloud (IaaS/PaaS/SaaS) Not well suited for cloud environments

Flexibility High—supports dynamic binding & Low—hard to modify once built


composition

Service provider: The service provider is the maintainer of the service and the organization that makes available one or
more services for others to use. To advertise services, the provider can publish them in a registry, together with a
service contract that specifies the nature of the service, how to use it, the requirements for the service, and the fees
charged.

Service consumer: The service consumer can locate the service metadata in the registry and develop the required client
components to bind and use the service.

Key Properties / Characteristics of SOA


1. Loose Coupling – Services are independent of each other.
2. Reusability – Services can be reused across multiple applications.
3. Interoperability – Works across different platforms, languages, and systems.
4. Discoverability – Services can be dynamically discovered via registries.
5. Standardized Interfaces – Uses open standards like SOAP, REST, WSDL, XML, JSON.
6. Composability – Multiple services can be combined to form complex applications.
7. Autonomy – Each service controls its own logic and resources.
8. Statelessness – Services generally do not store client state.

How SOA Is Supported in Cloud Computing (Expanded Points)


SOA fits naturally with cloud computing because cloud delivers everything as a service, which is exactly the foundation
of SOA.
Cloud computing supports SOA through the following mechanisms:

1. Standardized Web Service Interfaces


● Cloud platforms expose functionality using REST, SOAP, JSON, XML, gRPC.
● These standard interfaces ensure interoperability across different systems.

2. API Gateways
● API gateways manage access to cloud services.
● They provide:
○ Authentication & authorization
○ Rate limiting
○ Traffic management
○ API versioning
○ Service routing
● This makes SOA implementation more secure and controlled.

3. Service Registries & Catalogs


● Cloud platforms maintain service directories where services are:
○ Published
○ Discovered
○ Managed
● Helps support SOA concepts like discoverability and service contracts.

4. Microservices Architecture (Cloud-Native SOA)


● Modern cloud apps are built as microservices, which are essentially fine-grained SOA services.
● Each microservice is independently deployable, scalable, and replaceable.

5. Multi-Tenant Service Delivery


● Cloud environments allow multiple users or organizations to access the same services without interference.
● This aligns with SOA goals of:
○ Resource sharing
○ Loose coupling
○ Scalability

6. Orchestration & Workflow Management


Cloud services provide:
● AWS Step Functions
● Azure Logic Apps
● Google Cloud Workflows
These tools orchestrate multiple services—just like SOA’s composability principle.

7. Enterprise Service Bus (ESB) & Cloud Messaging


Cloud platforms provide messaging and integration tools:
● AWS SQS, SNS
● Azure Service Bus
● Google Pub/Sub
These act like cloud-based ESBs supporting routing, mediation, and transformation of messages.

8. Service-Level Agreements (SLAs)


● Cloud providers define SLAs for each service—uptime, latency, throughput.
● This matches SOA’s requirement for QoS (Quality of Service).

9. Elastic Scalability
● Cloud automatically scales services based on demand (auto-scaling).
● SOA-based services benefit from dynamic scaling and load balancing.

10. DevOps & CI/CD Support


Cloud platforms support:
● Automated deployments
● Continuous integration
● Container orchestration (Kubernetes)

Architecture:
Consumer Layer : This is the top layer where end-users or applications interact with the system. The consumer can be
a mobile app, a web application, or another software that requests services. The main role of this layer is to send
requests and display results back to the user.
Example: A shopping mobile app where the customer searches for a product or makes a payment request.

Business Process Layer : This layer defines the sequence of steps or workflow to complete a business task. It does not
perform the task itself but organizes services in the right order. It ensures that multiple services work together to
achieve a complete business function.
Example: In an online shopping system → first search for a product → add it to the cart → make the payment →
confirm the order → initiate delivery.

Service Layer : The service layer contains small, independent, and reusable services. Each service performs a specific
function and can be used in multiple applications. Services here are loosely coupled, meaning they work independently
but can be combined for larger tasks.
Example: Login service for authentication, Payment service for transactions, and Delivery service for shipping details.

Integration Layer : Different services may be developed using different technologies and may use different data formats
like JSON, XML, or SOAP messages. The integration layer works like a connector that makes sure all services can talk
to each other smoothly. It handles message transformation, routing,and communication.
Example: A shopping application using a third-party payment gateway and courier service needs integration so that
they all work together without compatibility issues.

Resource Layer : This is the bottom layer where actual data and resources are stored. It includes databases, files, and
backend systems that services depend on. The resource layer provides the required data to services when requested.
Example: A product database with item details, a customer database with personal information, and a transaction
database with order/payment records.

Components of SOA :
1. Service Provider
o The one who creates and offers the service.
o Publishes service details in a registry.
o Example: Bank providing fund transfer service.
2. Service Consumer (Client)
The one who uses the service.
o
o Finds the service in the registry and then invokes it.
o Example: Mobile app using the fund transfer service.
3. Service Registry (Broker)
o A directory that stores all available services.
o Helps consumers to discover services.
o Example: UDDI Universal Description, Discovery, and Integration
4. Service Contract
o Rules and description about how to use the service.
o Includes input, output, and protocols like SOAP/REST.
o Example: WSDL Web Services Description Language.
5. Enterprise Service Bus (ESB)
o Middleware that connects different services.
o Handles message passing, routing, and transformation.
o Example: Mule ESB. Mule Enterprise Service Bus.

How SOA Is Supported in Cloud Computing


SOA aligns naturally with cloud computing because cloud services are delivered as standardized, modular units.
Cloud supports SOA through:
● Standardized Web Services (REST, SOAP)
● API Gateways for service access
● Service registries and service catalogs
● Microservices-based cloud applications
● Multi-tenant service delivery models

Interoperability refers to the ability of different cloud services, platforms, or applications to communicate, exchange
data, and work together smoothly, even if they are built using different technologies, programming languages, or
vendors.

Integration is the process of connecting multiple cloud services, applications, or systems so they can function as a
unified solution by sharing data, workflows, and business processes.

How SOA Enables Interoperability in Cloud


● Uses standard protocols like HTTP, XML, JSON, SOAP for communication.
● Ensures platform and language independence (Java, .NET, Python can all access the same service).
● Supports loose coupling, allowing systems built with different technologies to interact smoothly.
● Employs standard service contracts (WSDL, REST APIs) so all services follow common rules.
● Avoids vendor lock-in by using open, platform-neutral communication formats.
● Enables cross-cloud communication between AWS, Azure, GCP, etc.

How SOA Enables Integration in Cloud


● Uses ESB (Enterprise Service Bus) for message routing, transformation, and mediation
● Connects disparate systems (legacy applications, databases, cloud microservices).
● Allows composition of multiple services into larger business workflows.
● Supports integration across on-premise + cloud environments (hybrid cloud).
● Simplifies connecting third-party services such as payment gateways, identity providers, and analytics tools.
● Enables service orchestration, ensuring multiple services work together to complete tasks.

Role of SOA in Cloud Computing


● Enables service-based delivery models (IaaS, PaaS, SaaS).
● Supports scalability by allowing services to run and scale independently.
● Facilitates integration across distributed cloud environments.
● Enhances agility because services can be reused, modified, or replaced without affecting entire systems.
● Improves cloud application portability and flexibility.
● Provides a standardized framework for building and exposing cloud services.
● Enables loose coupling, making cloud systems easier to maintain and update.
● Supports interoperability between different platforms, languages, and cloud vendors.
● Helps implement multi-tenant architectures by isolating services for different users.
● Facilitates rapid deployment of applications using reusable service components.
● Allows cloud providers to offer on-demand self-service through service APIs.
● Enhances monitoring and governance of cloud services through service contracts and registries.
● Improves fault tolerance, as independent services can fail without bringing down the entire application.
● Helps cloud applications adopt microservices architecture, which is built on SOA principles.
● Simplifies migration to cloud by wrapping legacy systems as services.
How Software Communicates Using SOA
Software communicates in SOA through standardized services. The process is:
1. Service Request
One software (consumer) sends a request to another software (service provider).
2. Uses Standard Protocols
Communication happens through HTTP, REST, SOAP, etc.
3. Common Data Formats
Messages are exchanged using XML or JSON, so different systems can understand each other.
4. Service Contract
A contract (like WSDL or API spec) defines how to call the service and what data it needs.
5. Service Registry
Services can be searched and discovered using a registry (e.g., UDDI).
6. ESB (Middleware)
The Enterprise Service Bus helps in routing, converting, and connecting different services.
7. Response Back
The service processes the request and sends back the result in the same standard format.

A public cloud is a cloud environment where computing resources (servers, storage, applications) are owned and
operated by a third-party provider and delivered over the internet to multiple customers on a shared infrastructure.
Examples: AWS, Microsoft Azure, Google Cloud Platform (GCP).

2. Benefits of Public Cloud


● Low cost / Pay-as-you-go model – no hardware investment.
● High scalability – resources increase or decrease automatically.
● High availability – built-in redundancy, global data centers.
● Easy accessibility – access from anywhere via internet.
● Rapid deployment – services available instantly.
● No maintenance overhead – provider manages hardware and updates.
● Wide range of services – compute, storage, AI/ML, databases, etc.
● Global reach – services accessible across multiple geographic regions.

3. Limitations of Public Cloud


● Security and privacy concerns (shared environment).
● Less control over infrastructure – provider manages backend.
● Vendor lock-in – difficult to migrate between providers.
● Network dependency – requires stable internet.
● Less Customizable – Limited control over configuration compared to on-premises systems.
● Compliance challenges – strict industries (finance, healthcare).

4. Key Security Risks in Public Cloud


1. Data Breaches
Sensitive data stored in shared infrastructure may be exposed.
2. Unauthorized Access
Weak authentication or misconfigured access policies.
3. Insecure APIs
Poorly secured service interfaces can be abused.
4. Data Loss
Accidental deletion, corruption, or provider issues.
5. Multi-tenancy Risks
Multiple users share the same physical hardware → isolation failure risks.
6. Misconfigurations
Incorrect storage bucket/VM/firewall settings lead to vulnerabilities.

5. Security Measures to Address These Risks


● Strong Authentication & MFA (Multi-Factor Authentication).
● Identity & Access Management (IAM) – least-privilege permissions.
● Encryption – encrypt data at rest and in transit.
● Regular Backups and disaster recovery plans.
● Network Security Tools – firewalls, VPNs, security groups.
● Monitoring & Logging – CloudTrail, CloudWatch, Azure Monitor.
● Compliance Standards – follow GDPR, HIPAA, ISO 27001.
● Secure APIs – API gateways, throttling, authentication.
● Tenant isolation techniques – hypervisor security, VPCs.

6. Concept of Multi-Tenancy in Public Cloud


Multi-tenancy means multiple users (tenants) share the same underlying physical infrastructure, while keeping their data
and applications isolated.
Implications
● Better resource utilization
● Lower costs due to sharing
● Need strong isolation between tenants
● Risk of noisy neighbors (performance interference)
● Stronger VM/container security is required

Multi-Tenancy in Public Cloud – Short Process


1. Tenant Account Creation
User registers and gets a separate, logically isolated environment.
2. Identity & Access Setup
Cloud assigns unique IAM roles, permissions, and security policies for each tenant.
3. Logical Resource Isolation
Tenants get isolated VMs, VPCs, storage, and databases using virtualization and SDN.
4. Shared Physical Infrastructure
All tenants share the same hardware, but see only their own virtual resources.
5. Workload Deployment
Tenants deploy apps on their VMs, containers, or serverless environments.
6. Isolation by Hypervisor / Orchestrator
Ensures tenants cannot access each other and keeps CPU, memory, and I/O separated.
7. Resource Scheduling
Cloud automatically balances resources among tenants for efficiency.
8. Monitoring & Metering
Tracks each tenant’s usage (CPU, storage, network) for billing.
9. Security Enforcement
Applies network isolation, encryption, firewalls, and access controls per tenant.
10. Pay-As-You-Go Billing
Each tenant is billed individually based on actual usage

7. How Public Cloud Platforms Support Different Programming Languages & Frameworks
Public clouds support all major languages through:
● SDKs and APIs (Python, Java, C#, Go, [Link], PHP, Ruby)
● Managed runtime environments (Java, .NET, Python, PHP, [Link])
● Containers (Docker, Kubernetes support)
● Serverless computing (AWS Lambda, Azure Functions, GCP Cloud Functions)
● PaaS platforms (App Engine, Azure App Services, Elastic Beanstalk)
● Database drivers for all languages
● DevOps tools (CI/CD pipelines, Cloud Build, GitHub Actions, Azure DevOps)

8. Security Measures & Compliance Certifications Offered by Public Cloud Providers


Security Measures
● Data encryption
● Firewalls & network isolation
● DDoS protection
● IAM and role-based access
● Security monitoring and threat detection
● Key Management Services (KMS)
● Secure APIs and API gateways

Common Compliance Certifications


● ISO 27001 / 27017 / 27018
● SOC 1, SOC 2, SOC 3 System and Organization Controls

9. How Public Cloud Platforms Handle Scalability & High Availability


Scalability
● Auto-scaling – automatically increases or decreases resources.
● Elastic Load Balancers distribute traffic across servers.
● On-demand resource provisioning for peak usage.
● Serverless computing automatically scales functions with load.

Cloud-native services such as:


● AWS Auto Scaling
● Azure VM Scale Sets
● GCP Instance Groups

High Availability
● Multiple availability zones (AZs) – redundant data centers.
● Data replication across zones and regions.
● Synchronous replication (within region)
● Asynchronous replication (across regions)
● Failover mechanisms for apps and databases.
● Content Delivery Networks (CDNs) for global delivery.
● Backup and disaster recovery services.

How Elasticity Supports Cloud Service Models


Elasticity allows cloud resources to expand and shrink automatically based on workload demand. It supports cloud
service models like:
1. IaaS (Infrastructure as a Service)
● Automatically adds/removes VMs, storage, and network resources.
● Ensures infrastructure matches the exact load without manual intervention.
2. PaaS (Platform as a Service)
● Automatically scales application runtimes, databases, and middleware.
● Developers don’t manage servers; the platform adjusts capacity itself.
3. SaaS (Software as a Service)
● Adjusts user sessions, backend processing, and storage as more users join.
● Ensures apps like Gmail, Netflix, and Office 365 stay responsive.

[Link] as a Service (FaaS) / ServerlessThis model is the ultimate expression of elasticity. The platform instantly
provisions and executes a containerized function in response to an event, and then scales down to zero when the
function is complete.
ex:serverless image-processing app using AWS Lambda

dynamic scaling is an automation technique that adjusts the amount of provisioned resources (like virtual
machines, CPU, or memory) in real-time based on actual, fluctuating workload demands
Importance of Elasticity in Dynamic Scaling
● Automatically adjusts resources based on real-time demand.
● Prevents performance drops during traffic spikes by adding more resources instantly.
● Avoids resource wastage by removing extra capacity when demand decreases.
● Reduces operational cost through pay-as-you-use scaling.
● Ensures high availability by keeping applications responsive under varying loads.
● Supports modern cloud models like serverless, microservices, and container orchestration.

infrastructure as a service providers examples ,working:


Examples of IaaS Providers
● Amazon Web Services (AWS EC2)
● Microsoft Azure Virtual Machines
● Google Cloud Compute Engine
● IBM Cloud Infrastructure

How IaaS Works


1. Cloud provider hosts physical infrastructure (servers, storage, networking).
2. Users create virtual machines, storage volumes, and networks on demand.
3. Resources are metered and billed based on consumption (pay-as-you-go).
4. Users manage:
○ OS
○ Applications
○ Runtime
○ Middleware
5. Cloud provider manages:
○ Physical hardware
○ Virtualization
○ Data center security & networking

benefits and considerations of using SAaS for enterprise application, use case:
● No installation or maintenance (runs fully in the cloud).
● Low cost — pay only for subscription, no hardware needed.
● Automatic updates & patches handled by provider.
● Accessible from anywhere via internet.
● Highly scalable — add/remove users easily.
● Improved collaboration (multi-user access, shared data).
● Fast deployment — ready to use immediately.
● Security & backup handled by provider.
● Supports multi-tenancy, reducing overall cost.

Considerations / Challenges When Using SaaS


● Data security & privacy issues (data stored on provider’s cloud).
● Vendor lock-in — difficult to migrate to another service.
● Limited customization compared to in-house software.
● Dependency on internet connectivity.
● Compliance issues for sensitive industries.
● Performance latency if data centers are far.

General Use Cases:


● Email services
● Online storage (Google Drive, Dropbox)
● E-commerce platforms (Shopify)

How platform as a service enable developers to build and deploy applications in cloud,example:
Platform as a Service (PaaS) provides a complete cloud-based environment for developing, testing, deploying, and
managing applications. It removes the need for developers to manage hardware, servers, operating systems, or
infrastructure.
How PaaS Enables Development & Deployment
● Provides preconfigured development environments (runtime, OS, frameworks).
● Offers built-in tools for coding, debugging, testing, and deployment.
● Supports automatic scaling of applications.
● Manages databases, storage, and networking behind the scenes.
● Enables continuous integration & continuous delivery (CI/CD) pipelines.
● Allows developers to focus only on writing code, not infrastructure.
● Ensures faster time-to-market with ready-made services and APIs.
● Supports multi-language and multi-framework development.

Examples of PaaS Providers


● Google App Engine
● Microsoft Azure App Service
● AWS Elastic Beanstalk

SOA Technologies (Short Notes)


1. Web Services (SOAP & REST)
Used for communication between services.
● SOAP: XML-based, secure, used in enterprises.
● REST: Lightweight, uses HTTP + JSON, widely used in cloud APIs.
2. WSDL (Web Services Description Language)
A service contract that describes what a service does, inputs, outputs, and how to call it (mainly for SOAP services).
3. UDDI (Universal Description, Discovery & Integration)
A directory where services are published and discovered. In cloud, similar to service catalogs.
4. ESB (Enterprise Service Bus)
Middleware that connects and integrates services. Handles routing, message transformation, and communication.
5. BPEL (Business Process Execution Language)
Used to orchestrate multiple services into one workflow (e.g., payment + order + delivery services).
6. Governance Tools
Apply rules, policies, and monitoring to ensure services are secure, standardized, reusable, and compliant in cloud
environments.

SOAP (Simple Object Access Protocol) :SOAP is a protocol used to exchange data between applications using XML
messages over transport protocols like HTTP or SMTP. It follows a stateless, one-way message exchange, which can
be combined to form request–response patterns.
Key Points
● Uses XML for structured messaging.
● Works over HTTP/SMTP through bindings.
● Supports reliable, platform-independent communication.
● Two versions: SOAP 1.1 and SOAP 1.2

SOAP Nodes
1. SOAP Sender – Creates and sends SOAP messages.
2. SOAP Receiver – Receives and processes messages
(may send a response or fault).
3. SOAP Intermediary – Both sender and receiver;
forwards messages after processing header blocks.

How SOAP Communication Works


● SOAP Client sends a SOAP Request.
● Message travels over the Internet via HTTP/SMTP.
● SOAP Server receives and forwards it to the SOAP Service.
● SOAP Service executes the operation and returns a SOAP Response.
● Response is delivered back to the client.
SOAP ensures secure, structured, and reliable communication between distributed applications

Types of SOAP Messaging Requests


1. Remote Procedure Call (RPC):
RPC lets a client execute a procedure on a remote server as if it were local. It follows a client-server model using
synchronous request/response in XML.
RPC Components:
● Client: Initiates request and receives reply.
● Client Stub: Packages request, sends to server, unpacks reply.
● RPC Runtime: Manages message transmission between client and server.
● Server Stub: Unpacks request, sends to server, packages response.
● Server: Executes request and sends reply.

2. Document Requests:
Instead of parameters, an XML document is sent in the SOAP message body. For example, a Purchase Order service
receives an XML purchase order document and returns an XML response after processing

REST (Representational State Transfer)


REST is a design style for building web services that allows applications to communicate over the internet easily and
efficiently.
REST is a lightweight, fast design style for web services that uses standard HTTP protocols. It’s widely used in social
media, e-commerce, cloud, and mobile apps, emphasizing simplicity, reusability, and scalability.
Key Concepts:
1. Stateless Communication:
Each client request contains all needed data; servers don’t store session info, enabling easy scaling.
2. Resources:
Everything (users, products, data) is a resource identified by a unique URI.
Example: /users/101 represents user with ID 101.
3. HTTP Methods:
| Method | Purpose | Example |
|--------|----------------------|-----------------------|
| GET | Retrieve data | /books/123 |
| POST | Create new resource | /books |
| PUT | Update resource | /books/123 |
| DELETE | Delete resource | /books/123 |
4. Data Formats:
Supports JSON (most popular), XML, HTML, and
plain text.

Advantages:
● Simple and easy to learn
● Fast and lightweight compared to SOAP
● Platform-independent
● Scalable for large systems
● Flexible for web, mobile, and cloud apps
● Cacheable for better performance

Example:
In an online bookstore, REST manages books via HTTP methods: POST to add, GET to fetch details, PUT to update,
and DELETE to remove a book, making resource management straightforward.

Main Pricing Strategies for Compute, Storage, and Networking Resources


1. Compute Pricing:
○ On-Demand: Pay per usage (e.g., per second/minute/hour). Flexible but usually more expensive.
○ Reserved Instances: Commit to use for a long period (1-3 years) at a discounted rate.
○ Spot Instances: Use spare capacity at lower prices but with the risk of interruption.
○ Auto-scaling: Dynamically adjusts compute resources based on demand to avoid over-provisioning.

2. Storage Pricing:
○ Pay-as-you-go: Charged based on actual data stored.
○ Tiered Storage: Different costs for hot (frequent access), cool (infrequent), and archive (rarely accessed)
storage
○ Data Transfer Costs: Charges for data moving out of the cloud or between regions.
○ I/O Operations: Pricing based on number of read/write operations in some storage types.

3. Networking Pricing:
○ Data Transfer: Costs for outbound data to the internet or between regions
○ Bandwidth: Charges based on the network throughput used.
○ Private Connections: Costs for dedicated connections like Direct Connect.
○ Load Balancer Usage: Charges based on the number of processed requests or traffic.

Influence on Cost Optimization


● Right-Sizing Resources: Choosing appropriate instance types and storage tiers reduces waste and lowers bills.
● Using Reserved and Spot Instances: Cuts compute costs significantly by leveraging long-term commitment or
spare capacity
● Data Transfer Minimization: Optimizing data flows and using CDNs reduce expensive outbound data charges.
● Auto-Scaling: Automatically matches resource supply to demand, avoiding paying for idle resources.
● Storage Tiering: Moving less accessed data to cheaper tiers saves storage costs.
● Network Traffic Optimization: Efficient routing and caching reduce bandwidth usage and associated
[Link] options and capabilities provided by public cloud platforms for efficient data transfer and
communication:

Networking Options in Public Cloud Platforms


1. Virtual Private Cloud (VPC):
○ Isolated virtual networks within the cloud.
○ Customizable IP address ranges, subnets, routing tables, and gateways.
2. Load Balancers:
○ Distribute incoming traffic across multiple servers to ensure availability and performance.
○ Types: Application Load Balancer (Layer 7), Network Load Balancer (Layer 4).
3. Content Delivery Network (CDN):
○ Caches content closer to users globally to reduce latency and improve load times.
○ Example: AWS CloudFront, Azure CDN, Google Cloud CDN.
4. Direct Connect / ExpressRoute / Dedicated Interconnect:
○ Private, high-bandwidth connections between on-premises data centers and the cloud provider,
bypassing the public internet.
○ Offers lower latency and increased security.
5. VPN Gateway:
○ Secure IPsec VPN connections to extend on-premises networks into the cloud securely.
6. Peering:
○ Connects different VPCs or cloud networks either within the same region or across regions for low-
latency communication.
7. Multi-Region Networking:
○ Supports global deployment of applications with fast inter-region data transfer and failover capabilities.

Key Networking Capabilities for Efficient Data Transfer


● High Throughput and Low Latency: Cloud networks offer optimized routing and high bandwidth to support large
data transfers.
● Network Security: Built-in firewalls, security groups, and network access control lists (ACLs) ensure secure
communication.
● Auto-Scaling Network Resources: Automatically adapts bandwidth and connections based on traffic demands.
● Advanced Traffic Management: Routing policies, geo-routing, and load balancing optimize network paths and
distribute traffic efficiently.
● Monitoring & Analytics: Tools to track network performance, bottlenecks, and optimize usage.

AWS (Amazon Web Services):

● Cloud computing platform offering IaaS, PaaS, and SaaS.


● Services: Compute (EC2, Lambda), Storage (S3, EBS), Databases (RDS, DynamoDB), Networking (VPC,
Route 53), AI/ML, Analytics, Security, IoT, etc.
AWS was introduced in 2006,

Microsoft Azure:

● Cloud platform providing a broad set of cloud services for building, deploying, and managing apps.
● Services: Compute (VMs, Azure Functions), Storage (Blob Storage, Disk Storage), Databases (SQL Database,
Cosmos DB), Networking (Virtual Network, Azure DNS), AI, DevOps, Security, IoT, etc.
● Azure was introduced in 2010

Benefits and Capabilities of Azure App Services and AWS Services

● Azure App Service: Managed platform for building, deploying, and scaling web apps and APIs quickly with
integrated DevOps and CI/CD pipelines. Supports multiple languages (.NET, Java, [Link], Python).

● AWS Services (like Elastic Beanstalk, Lambda): Easy deployment, scaling, and management of apps without
managing infrastructure. Elastic Beanstalk handles provisioning and scaling; Lambda offers serverless compute.

Category AWS Service Examples Microsoft Azure Service Examples

Compute EC2 (VMs), Lambda (Serverless), Virtual Machines, Azure Functions (Serverless),
(IaaS/PaaS) ECS/EKS (Containers), Lightsail Azure Kubernetes Service (AKS), Azure App
(Simple VPS) Service

Storage S3 (Object Storage), EBS (Block), Blob Storage, Disk Storage, Azure Files, Azure
EFS (File), Glacier (Archive) Archive Storage

Database RDS (Managed SQL), Aurora Azure SQL Database, Azure Cosmos DB (NoSQL),
(Cloud-Native SQL), DynamoDB Azure Database for MySQL/PostgreSQL, Azure
(NoSQL), Redshift (Data Synapse Analytics
Warehouse)
Networking VPC, Route 53 (DNS), API Azure Virtual Network, Azure DNS, Azure API
Gateway, CloudFront (CDN) Management, Azure Content Delivery Network

Serverless Computing on AWS and Azure

AWS Lambda:

● Runs your code in response to events (like HTTP requests, file uploads, database changes).
● Automatically provisions and scales the compute resources as needed.
● You only pay for the actual compute time your code runs (no idle cost).
● Supports multiple programming languages ([Link], Python, Java, C#, Go, Ruby).
● Integrates with many AWS services (S3, DynamoDB, API Gateway, CloudWatch).
● Handles all infrastructure management: patching, scaling, fault tolerance.

Azure Functions:

● Executes small pieces of code (functions) triggered by events (HTTP requests, timers, queues).
● Automatically scales based on the workload.
● Billing is based on the number of executions and compute time.
● Supports various languages (.NET, JavaScript, Python, Java, PowerShell).
● Easily integrates with Azure services (Blob Storage, Event Grid, Cosmos DB).
● Manages infrastructure, so developers focus on code only

Web Application Deployment on AWS


1. Choose Services:
Use services like Amazon EC2 (virtual servers), Elastic Beanstalk (PaaS for easy deployment), or AWS
Lambda + API Gateway (serverless).
2. Prepare Application:
Develop your app locally or in your IDE, package your code (e.g., ZIP file, Docker container).
3. Upload & Deploy:
○ For Elastic Beanstalk: Upload your code via AWS Console or CLI, and Beanstalk handles provisioning
servers, load balancing, scaling, and monitoring automatically.
○ For EC2: Launch EC2 instances, configure environment, deploy app manually or with automation tools.
○ For Serverless: Upload function code to Lambda, configure triggers.
4. Set Up Networking:
Configure Amazon VPC, security groups, load balancers, and DNS with Route 53.
5. Monitor & Manage:
Use CloudWatch for logging, performance metrics, and alarms.

Web Application Deployment on Azure


1. Choose Services:
Use Azure App Service (PaaS for web apps), Azure Virtual Machines (IaaS), or Azure Functions (serverless).
2. Prepare Application:
Develop your app, then package it or containerize
3. Upload & Deploy:
○ For Azure App Service: Deploy directly from Visual Studio, Azure CLI, or via CI/CD pipelines (GitHub,
Azure DevOps). Azure App Service handles infrastructure, scaling, and patches.
○ For VMs: Create VMs, configure environment, deploy app manually or automated.
○ For Azure Functions: Upload function code and configure triggers.
4. Configure Networking:
Set up Virtual Networks, Application Gateway, Azure DNS, and firewall rules.
5. Monitor & Manage:
Use Azure Monitor and Application Insights to track app performance and logs.

AWS Architecture and Components


● Regions and Availability Zones (AZs):
AWS divides its infrastructure into Regions (geographically isolated areas). Each Region has multiple
Availability Zones, which are physically separated data centers for high availability and fault tolerance.
● Key Components:
○ EC2: Virtual servers
○ S3: Object storage
○ RDS: Managed databases
○ VPC: Virtual private cloud for networking
○ IAM: Identity and access management
○ Lambda: Serverless computing
○ CloudFront: CDN for content delivery
○ CloudWatch: Monitoring and logging
● Regional Resource Management:
AWS manages resources independently in each Region and AZ. Users deploy resources in specific
Regions/AZs to reduce latency and improve fault tolerance. Data sovereignty and compliance needs are met by
choosing the right Region.
● Global Resource Management:
AWS provides Global Services like Route 53 (DNS), CloudFront (CDN), and IAM that operate across Regions
to ensure a unified global experience.

Azure Architecture and Components


● Regions and Availability Zones:
Azure organizes its infrastructure into Regions, each containing multiple Availability Zones (physically separate
data centers) to provide resilience and high availability.
● Key Components:
○ Azure Virtual Machines: Compute resources
○ Azure Blob Storage: Object storage
○ Azure SQL Database: Managed relational databases
○ Azure Virtual Network (VNet): Isolated networks
○ Azure Active Directory (AAD): Identity and access management
○ Azure Functions: Serverless computing
○ Azure CDN: Content delivery network
○ Azure Monitor: Monitoring and diagnostics
● Regional Resource Management:
Azure users deploy resources within Regions and Availability Zones to ensure compliance, reduce latency, and
maintain service availability.
● Global Resource Management:
Azure provides global services such as Traffic Manager (DNS load balancing), Azure Front Door (global HTTP
load balancing), and Azure AD for unified identity management across Regions.

Troubleshooting Techniques and Resources in AWS and Azure


Introduction
Cloud platforms like AWS and Azure support app deployment but can face issues like deployment failures, config
errors, network or permission problems, and performance bottlenecks. Troubleshooting ensures smooth app operation
using various tools and logs.

AWS Troubleshooting Tools:


● CloudWatch: Logs and monitors CPU, memory, and errors.
● X-Ray: Traces requests to find latency and failures.
● CLI Debug Mode: Use --debug for detailed command info.
● Elastic Beanstalk Dashboard: Checks app health and deployment status.
● IAM Policy Simulator: Troubleshoots permission errors.
● VPC Flow Logs: Monitors network traffic and connectivity.

Azure Troubleshooting Tools:


● Azure Monitor: Tracks performance metrics.
● Application Insights: Detects exceptions and slow responses.
● Service Health: Alerts for outages and service issues.
● Diagnostics & Log Analytics: Analyzes logs for errors.
● Visual Studio/VS Code Debugger: Step-by-step app debugging.
● Resource Health: Checks status of Azure resources.

Common Resources
● Official docs: AWS Docs, Azure Docs
● Community forums and Q&A sites
● Paid enterprise support plans
● SDK/CLI verbose logging
● Automation scripts for monitoring and fixing issues

Examples:
● AWS: EC2 can’t access S3 → Check CloudWatch, IAM roles, network ACLs.
● Azure: Slow Web App → Use Application Insights, Azure Monitor, optimize DB.
● Cross-platform: Deployment script fails → Use CLI debug flags (--debug or --verbose).

Authentication and Access Control Process on AWS and Azure


1. Identity Management
● AWS: Use AWS Identity and Access Management (IAM) to create and manage users, groups, roles, and
policies.
● Azure: Use Azure Active Directory (Azure AD) for managing identities and access.

2. User Authentication
● Authenticate users via credentials, federated identity (e.g., Google, Facebook), or enterprise SSO integrated
with IAM or Azure AD.
● Support multi-factor authentication (MFA) for added security.

3. Role-Based Access Control (RBAC)


● Assign permissions based on roles instead of individual users.
● AWS: Define IAM roles with specific policies granting access to AWS resources.
● Azure: Use RBAC in Azure AD to assign roles to users, groups, or applications.

4. Policy Definition
● Define fine-grained access policies specifying what resources can be accessed and what actions are allowed
(read, write, delete)
● Policies are JSON documents in AWS; in Azure, they are managed via Azure Role Definitions.

5. Token-Based Authentication
● Use temporary security tokens for applications or services to access resources securely.
● AWS uses AWS Security Token Service (STS) to provide temporary credentials.
● Azure uses OAuth 2.0 tokens issued by Azure AD for API and resource access.

6. Application Integration
● Applications call AWS or Azure APIs, passing tokens or credentials to authenticate and authorize each request.
● Use SDKs provided by AWS and Azure for simplified authentication handling.

7. Monitoring and Auditing


● Monitor access with AWS CloudTrail and Azure Monitor / Azure AD logs to track authentication events and
access patterns.
● Audit permissions regularly to enforce least privilege.

Aspect AWS Azure

SDKs AWS SDKs for Java, Python (Boto3), Azure SDKs for .NET, Java, Python,
JavaScript, .NET, Ruby, PHP, Go, C++ JavaScript, [Link], Go, PHP

CLI Tools AWS CLI (command-line interface) Azure CLI, Azure PowerShell

IDEs & Extensions AWS Toolkit for Visual Studio, VS Code, Azure Tools for Visual Studio, VS
JetBrains IDEs Code, JetBrains IDEs

Serverless Development AWS SAM (Serverless Application Model), AWS Azure Functions Core Tools, Azure
Lambda console, Amplify Portal, Azure Logic Apps

Container Tools AWS ECS, EKS integrations with Docker, Azure Kubernetes Service (AKS),
Kubernetes Azure Container Instances
Infrastructure as Code AWS CloudFormation, CDK (Cloud Azure Resource Manager (ARM)
Development Kit) Templates, Bicep

CI/CD Services AWS CodePipeline, CodeBuild, CodeDeploy Azure DevOps, GitHub Actions
(integrated)

Monitoring & Debugging AWS CloudWatch, X-Ray Azure Monitor, Application Insights

Security Tools IAM, AWS Secrets Manager, Cognito Azure AD, Azure Key Vault, Azure
Security Center

Mobile Development AWS Amplify, Device Farm Azure Mobile Apps, App Center

GCP:
Google Cloud Platform (GCP) began with the introduction of Google App Engine in 2008.
GCP Services & Offerings
● Compute:

○ Compute Engine (VMs)

○ Google Kubernetes Engine (GKE)

○ App Engine (PaaS)

○ Cloud Functions (Serverless)

● Storage:

○ Cloud Storage (object storage)

○ Persistent Disk (block storage)

○ Filestore (managed file storage)

● Databases:

○ Cloud SQL (managed relational DB)

○ Cloud Spanner (global-scale relational DB)

○ Bigtable (NoSQL wide-column DB)

○ Firestore (NoSQL document DB)

● Networking:

○ Virtual Private Cloud (VPC)

○ Cloud Load Balancing

○ Cloud CDN
● Big Data & Analytics:

○ BigQuery (data warehouse)

○ Dataflow (stream and batch data processing)

○ Dataproc (managed Spark/Hadoop)

● AI & Machine Learning:

○ AI Platform

○ AutoML

○ Vision API, Natural Language API, Translation API

● Security & Identity:

○ Cloud IAM (Identity and Access Management)

○ Cloud Identity

○ Security Command Center

Key Features
● Global network with high-speed fiber optic backbone

● Auto-scaling and load balancing

● Strong focus on AI/ML integration

● Live migration of VMs for high availability

● Integrated logging and monitoring (Cloud Logging & Cloud Monitoring)

● Pay-as-you-go pricing model

● Multi-layer security, compliance certifications

Development Tools & SDKs


● Cloud SDK (gcloud CLI): Command-line interface to manage GCP resources

● Client Libraries / SDKs: Available for multiple languages including:

○ Python

○ Java

○ [Link] / JavaScript

○ Go

○ .NET

○ Ruby
○ PHP

● IDEs & Plugins:

○ Cloud Code plugin for VS Code and JetBrains (smart IDE support)

○ Integration with popular IDEs (Eclipse, IntelliJ)

● Serverless Framework Support:

○ Cloud Functions CLI and SDKs

○ Integration with Cloud Run (serverless containers)

● Infrastructure as Code:

○ Deployment Manager (GCP’s infrastructure management tool)

○ Terraform support (third-party)

● CI/CD:

○ Cloud Build (build, test, deploy)

○ Integration with GitHub Actions and other CI/CD tools

Fault Tolerance in Cloud Platforms


● Redundant Infrastructure
● Automatic Failover
● Load Balancing
● Data Replication
● Health Monitoring and Self-Healing
● Distributed Architecture
● Auto Scaling

Disaster Recovery in Cloud Platforms


● Backup and Restore
● Geo-Redundancy and Multi-Region Deployment
● Snapshot and Image Management
● Disaster Recovery as a Service (DRaaS)
● Continuous Data Protection (CDP)
● Failover Testing and Drills
● Compliance and Security Controls
Unit 4
Resource Management : In cloud computing, resource management refers to the process of managing the usage of
resources in a cloud environment. This includes compute resources (like CPU and memory), storage resources (like
disk space), and network resources (like bandwidth).

Resource management is crucial for optimizing the performance and cost of a cloud environment.
It involves monitoring resource usage, allocating resources, and managing resource capacity.

Cloud resource management policies can be loosely grouped into five classes:
• Admission control.
• Capacity allocation.
• Load balancing.
• Energy optimization.
• Quality-of-service (QoS) guarantees.

1. Admission Control

Admission control decides whether to accept or reject incoming workloads based on available resources to prevent
system overload.
Key Features
● Prevents resource over-commitment
● Ensures QoS levels for accepted workloads
● Protects system stability
● Matches demand with available capacity
Procedure
1. Check current resource usage (CPU, RAM, I/O).
2. Compare requirements of incoming request with available capacity.
3. If resources are sufficient → Accept.
4. If insufficient or violating QoS → Reject or queue the request.

2. Capacity Allocation
Capacity allocation is the process of assigning CPU, memory, storage, and bandwidth to virtual machines or
applications.
Key Features
● Ensures fair and efficient resource distribution
● Supports dynamic scaling
● Prevents resource starvation
● Policy-driven allocation (priority, quotas, limits)
Procedure
1. Identify resource demand of each VM/application.
2. Apply allocation policies (priority, weights, quotas).
3. Distribute CPU, RAM, and storage accordingly.
4. Continuously monitor and adjust allocations based on workload changes.
3. Load Balancing

Load balancing distributes workloads across servers or VMs to prevent overload and increase performance.
Key Features
● Even workload distribution
● Improved throughput and response time
● Fault tolerance (redirects traffic on failure)
● Supports auto-scaling
Procedure
1. Monitor traffic and resource usage on servers.
2. Detect overloaded or underutilized resources.
3. Redirect requests to healthy or lightly loaded servers.
4. Continuously balance load using algorithms (Round Robin, Least Connections, etc.).

4. Energy Optimization
Energy optimization reduces power consumption in cloud data centers while maintaining performance.
Key Features
● Server consolidation (pack workloads onto fewer servers)
● Power-aware scheduling
● Turning off idle servers
● Use of low-power hardware and cooling techniques
Procedure
1. Monitor workload utilization.
2. Consolidate VMs onto fewer physical machines.
3. Turn off or put idle servers in sleep mode.
4. Use thermal and power-aware scheduling for future workloads.

5. Quality of Service (QoS) Guarantees


QoS guarantees ensure that cloud services meet specific performance requirements such as latency, throughput, and
availability.
Key Features
● SLA-based performance assurance
● Prioritization of critical workloads
● Bandwidth and resource reservation
● Monitoring and violation detection
Procedure
1. Define QoS requirements in SLA (e.g., 99.9% uptime).
2. Reserve required resources (CPU, memory, bandwidth).
3. Prioritize traffic based on QoS levels.
4. Continuously monitor performance and trigger corrective actions if SLA violations occur.
Main Resource Management Policies
1. Admission Control Policy
Decides which requests or workloads are allowed to enter the system based on available resources to prevent overload
and ensure stability.
2. Capacity Allocation Policy
Defines how CPU, memory, storage, and bandwidth are assigned to applications or VMs based on priority, quotas, or
demand.
3. Load Balancing Policy
Ensures even distribution of workloads across servers or VMs to avoid bottlenecks and improve overall performance.
4. Energy Optimization Policy
Aims to reduce power consumption in cloud data centers by consolidating workloads, powering down idle servers, and
using energy-efficient scheduling.
5. Quality of Service (QoS) Policy
Guarantees required performance levels (latency, throughput, availability) by prioritizing critical workloads and reserving
necessary resources.
6. Fairness Policy
Ensures equal or proportionate resource access for all users or processes to prevent starvation and maintain system
fairness.
7. SLA-Based Policy
Allocates resources according to Service Level Agreements, ensuring the cloud provider meets promised performance
guarantees.

Challenges in Achieving Stability Across Multiple Levels


Cloud management happens at three levels:
Level 1: Task Scheduling → Level 2: VM Allocation → Level 3: Physical Resource Control
Primary Challenges
1. Interference Between Levels
Scheduling at one level may conflict with resource allocation at another.
2. Oscillation
Rapid up-down scaling creates instability.
3. Multi-tenant Interference
Resource sharing causes unpredictable performance.
4. Latency in Control Loops
Slow feedback and delayed measurements upset stability.
5. Overreaction to load spikes
Scaling too aggressively leads to cost increase.

How They Are Addressed


● Feedback Control Theory (PID controllers) for stable resource scaling
● Multi-level coordinated resource management
● Adaptive thresholds rather than fixed thresholds
● Predictive scaling using ML
● SLA-driven policies to avoid aggressive resizing

Monitoring & Resource Utilization Mechanisms in Cloud


Monitoring is essential for detecting overload, failures, or inefficient resources.
Common Monitoring Mechanisms
1. Performance Metrics Monitoring
○ CPU, RAM, I/O, network bandwidth
○ Services: AWS CloudWatch, Azure Monitor, GCP Cloud Monitoring
2. Log Monitoring
○ System logs, access logs, error logs
○ Services: CloudWatch Logs, Azure Log Analytics
3. Tracing & Profiling
○ Tracks request paths
○ AWS X-Ray, Azure Application Insights
4. Health Checks
○ Liveness and readiness probes
○ Used in Kubernetes, auto-scaling groups
5. Resource Utilization Dashboards
○ Real-time graphs for VM usage, disk I/O, memory
6. Auto-Scaling Monitors
○ Trigger scale actions based on thresholds.
7. Network Monitoring
○ VPC flow logs, Azure Network Watcher

Specialized Autonomic Performance Management refers to a set of self-managing controllers in cloud systems where
each controller focuses on a specific performance goal, such as:
● performance optimization,
● power/energy saving,
● load balancing,
● fault tolerance,
● QoS maintenance

How Coordination of Specialized APM(specialized automatic performance management)s Improves Resource


Management:
Prevents Conflicting Actions
Maintains System Stability
Enables Global Optimization
Efficient and Fair Resource Allocation
Supports Autonomic, Self-Managing Cloud
Handles Complex Workloads
Reduces Operational Costs

Stability of a two-level resource allocation architecture:


Two-level resource allocation is a cloud resource management architecture where resources are managed in two
layers:
Level 1: Global Resource Manager (Top Level)
● Manages total cloud resources across datacenters or clusters.
● Decides how much resource each application, VM, or tenant gets (quotas, limits, shares).
● Ensures fairness, SLA satisfaction, and high-level policies.

Level 2: Local Resource Managers (Bottom Level)


● Operate inside each VM, container, host OS, or application runtime.
● Perform local scheduling of CPU, memory, I/O for tasks/threads inside the allocated quota.
● Examples: Linux CFS scheduler, Kubernetes pod scheduler, JVM thread scheduler.

In short:
Top level → allocates resources.
Bottom level → schedules and uses those resources.

Global Resource Manager (GRM)


● Runs across the entire cloud/datacenter.
● Performs:
○ Admission control
○ High-level scheduling
○ Capacity allocation
○ Load balancing
Examples:
Kubernetes master, Mesos master, Azure/AWS resource managers.
2. Local Resource Managers (LRM)
● Run on each host/VM/container.
● Perform:
○ CPU scheduling
○ Memory/page allocation
○ I/O throttling
○ Local queuing and prioritization
Examples:
Docker/Kubernetes kubelet, Linux kernel scheduler, JVM runtime scheduler.
3. Feedback Loops
● Both levels monitor metrics (CPU%, latency, queue length).
● Information flows upward; commands flow downward.
● Ensures dynamic adjustment.
Why Stability Matters in Effectiveness
Stability is critical because:
1. Prevents Resource Oscillations
If allocations fluctuate too fast, both global and local managers may repeatedly adjust, causing:
● Increased latency
● Dropped requests
● Over-provisioning or starvation
2. Ensures Predictable Performance
Applications require stable CPU/memory to meet SLAs.
3. Avoids Thrashing
Unstable systems may frequently migrate VMs or restart tasks.
4. Reduces Cost
Stable allocations prevent unnecessary scaling or provisioning.
5. Enables Efficient Multitenancy
Tenants receive consistent, fair resource shares.

5. Factors Contributing to Stability

1. Accurate Monitoring
Reliable metrics (CPU%, queue length, latency) prevent incorrect decisions.
2. Proper Feedback Loop Timing
● Too fast → oscillation
● Too slow → poor responsiveness
3. Clear Separation of Responsibilities
Global manager: long-term allocation
Local manager: short-term scheduling
→ minimizes conflict.
4. Workload Predictability
More predictable loads are easier to stabilize.
5. Quota, Limit, and Priority Controls
Avoids resource contention and starvation.
6. Efficient Load Balancing
Distributes loads evenly, reducing stress on local managers.

6. Key Challenges in Maintaining Stability


1. Interference Between Levels
Local schedulers may override global decisions.
Example:
Global allocates 2 CPU cores → local scheduler overschedules tasks → contention.
2. Conflicting Objectives
● Top-level wants fairness
● Local-level wants performance
Conflicts can create instability.
3. Inaccurate or Delayed Feedback
Late monitoring signals cause over-corrections.
4. Highly Variable Workloads
Bursty workloads (e.g., web traffic) disrupt stability.
5. Multi-Tenant Environments
Each tenant behaves differently, affecting others.

7. Ensuring Stability Using Feedback Control Mechanisms


Feedback control uses sensors, controllers, and actuators to keep resources within desired limits.
Feedback Control Steps
1. Monitor (Sensor)
Collect metrics: CPU%, queue length, response time.
2. Analyze (Controller)
Compare actual performance with target SLA.
3. Plan Adjustment
Determine scaling up/down or quota changes.
4. Execute (Actuator)
Enforce new resource limits or schedules.
How Feedback Control Ensures Stability
1. Avoids Overreaction
Uses smoothing, thresholds, and dampening:
● only act when thresholds are exceeded
● exponential smoothing to avoid swings

2. Keeps Resource Use Near Targets


Maintains a stable operating point.
3. Coordinates Top and Bottom Levels
● Level 1 adjusts long-term quotas
● Level 2 uses short-term scheduling
This prevents both levels from making fast conflicting changes.
4. Predicts Future Load
Predictive controllers (e.g., model-based, ML-based) prevent instability.

Two-Level Resource Allocation Concept


● Cloud uses two controllers:
Local controller (application-level) + Global controller (service-provider level).
● Based on control theory with a closed-loop feedback system.

2. Components of the Control System


● Inputs: workload + policies (admission control, capacity allocation, load balancing, energy optimization, QoS).
● System Components:
○ Sensors → measure performance
○ Controllers → make resource decisions
● Output: resource allocations to applications.

3. Role of Feedback
● Sensors provide feedback to controllers to maintain stability.
● If outputs change too much → system becomes unstable → thrashing and wasted resources.

4. Sources of Instability
1. Delay in system response to a control action.
2. Granularity issue → small control change causes large output change.
3. Oscillations → frequent, large input changes + weak control.

5. Types of Autonomic Policies


Threshold-based policies trigger actions (like scaling up/down) when performance crosses predefined upper
or lower limits.
Sequential decision (Markov-based) policies make step-by-step decisions by predicting future system states
and choosing the optimal action for long-term performance.

6. Lessons for Ensuring Stability


● Control actions must follow a steady rhythm—avoid rapid consecutive changes.
● Adjust only after the system stabilizes.
● Controller must understand application stabilization time.
● Thresholds must be properly spaced—too close → instability.
● Adding/removing even one VM can push system past another threshold → more instability.

Feedback control based on dynamic thresholds:


Dynamic threshold-based feedback control provides a solution by continuously adjusting thresholds according to
system behavior and workload, allowing the system to respond proactively and maintain optimal performance
A threshold defines a boundary or limit for a system metric (e.g., CPU utilization, memory usage, response time)
beyond which corrective action is triggered.

In dynamic thresholds, this limit is not fixed; it is updated periodically based on feedback from the system metrics.
This approach enables the system to adapt to fluctuating workloads and maintain desired performance levels.
Example:
Target CPU utilization: 70%
Current CPU utilization: 80%•
Feedback mechanism adjusts the threshold downwards to
prevent overload.
Threshold update formula (Proportional control):

Where K is the gain factor controlling how fast the threshold adapts.
Feedback Control Loop with Dynamic Thresholds
The dynamic threshold system works in a closed-loop feedback manner:
[Link] Metrics: Continuously measure system parameters like CPU load, memory, task queue length, or response
time.
[Link] Error: Compare the current value with the target (desired) value:
Update Threshold: Adjust the threshold using proportional, integral, derivative, or PID control based on the error.
[Link] Action: If the system metric crosses the updated threshold, corrective actions are taken (e.g., scaling
resources, throttling tasks, load redistribution).
[Link]: The loop continuously monitors and adapts to maintain stability and optimal performance.

Advantages of Feedback Control with Dynamic Thresholds


Adaptability: Automatically adjusts to changing workloads.
Better Resource Utilization: Reduces idle resources and prevents overloading.
Improved QoS: Minimizes response time violations and prevents system instability.
Flexibility: Can be integrated with different control strategies for precise management.
Scalability: Useful in cloud environments where resources and tasks scale dynamically.

Applications in Cloud Computing


Task Scheduling: Dynamically adjust task acceptance thresholds to balance load across servers.
Load Balancing: Monitor server performance and redistribute tasks based on dynamic thresholds.
Autoscaling: Adjust VM or container thresholds for CPU/memory usage to scale resources up or down.
QoS Management: Ensure response times and throughput meet the desired targets under fluctuating demand.

Autonomic Manager Reference Architecture (MAPE-K):


The Autonomic Manager Reference Architecture (MAPE-K) represents the self-management process used in
autonomic computing systems. It consists of four main phases — Monitor, Analyze, Plan, and Execute — all
connected through a shared Knowledge base.
MAPE-K Autonomic Manager – Important Points
● Monitor: Collects system and environment data using sensors.
● Analyze: Processes data to detect problems, trends, or predict workload behavior.
● Plan: Decides corrective actions or strategies to maintain/improve performance.
● Execute: Applies actions through effectors (e.g., scaling, configuration changes).
● Knowledge Base: Stores policies, historical data, models, and rules used by all stages.
● Outcome: Forms a feedback loop enabling systems to self-manage and adapt automatically.

Role of Coordination Among Autonomic Performance Managers (APMs) – Key Points


1. Conflict Resolution
● Different APMs have different goals (performance vs energy).
● Coordination prevents contradictory actions and avoids oscillations.
2. Maintaining System Stability
● Prevents managers from triggering conflicting or rapid changes.
● Avoids resource thrashing and performance drops.
3. Global Optimization
● Ensures decisions consider overall goals: performance, energy, cost, reliability.
● Avoids local-only optimization.
4. Communication & Information Sharing
● APMs share state, metrics, and planned actions.
● Enables smarter cooperative decisions.
5. Modularity & Scalability
● Managers remain specialized and independent.
● New managers/resources can be added easily through coordination mechanisms.
6. Conflict-Free Resource Allocation
● Ensures fair, efficient distribution of resources among multiple APMs
● Prevents one manager from overriding or starving another.

Benefits of Coordination
● Higher system stability and better QoS.
● Improved resource utilization and energy efficiency.
● Automatic and adaptive resource management.
● Scalable to large, multi-node cloud systems.
● Reduces human effort and simplifies management.

Challenges in Coordination
● Maintaining stability and preventing oscillations.
● Communication overhead between managers.
● Designing effective policies and conflict-resolution rules.
● Managing heterogeneous workloads and mixed resources.

How Autonomic Managers Interact With Cloud Components


Autonomic Managers communicate with different cloud layers:
1. With Virtual Machines (VMs)
● Monitor VM load
● Scale VMs up/down
● Migrate VMs to balance clusters
2. With Hypervisors
● Obtain resource usage
● Enforce caps/quotas
● Manage VM placement
3. With Cloud Controllers (Global Managers)
● Receive policies (pricing, SLAs, capacity limits)
● Report performance status
4. With Network & Storage Systems
● Optimize data placement
● Route traffic for low latency
● Balance I/O load
5. With Auto-scaling Services
● Trigger scale-out/scale-in actions
● Predict resource needs
Interaction is automatic, using APIs, sensors, telemetry, and event-based triggers.

Why Coordination Between Autonomic Managers Is Necessary


Cloud has multiple managers:
● Load Balancing Manager
● Energy Optimization Manager
● Capacity Allocation Manager
● Admission Control Manager
● QoS Manager
If each worked independently → conflicts occur.
Examples of Conflicts Without Coordination
● Energy Manager tries to consolidate VMs to reduce power
● Load Balancer spreads VMs across hosts → opposite action
● QoS Manager demands high resources → Capacity Manager tries to reduce usage
Thus coordination ensures harmony among objectives.

How Coordination Contributes to Efficient Resource Management


1. Avoids Conflicting Decisions
Managers share state and align objectives so one manager’s decision does not cancel out another’s.
2. Ensures Global Optimization
Instead of optimizing locally (per VM), managers work together to optimize:
● cluster load
● energy consumption
● SLA compliance
● resource cost
3. Reduces Oscillations and Instability
Proper coordination prevents constant scaling up/down or VM migrations (thrashing).
4. Enables Predictable Performance (QoS)
QoS manager stabilizes latency and throughput while other managers ensure enough resources.
5. Improves Resource Utilization
Capacity and load-balancing managers coordinate to avoid:
● over-provisioning
● underutilization
● hot spots
6. Enhances Autonomic Decision Quality
Shared knowledge improves prediction and intelligent planning.
7. Faster Adaptation to Workload Changes
Managers collectively respond to spikes and failures using a unified control loop.

How Autonomic Managers Coordinate (Mechanisms)


1. Shared Knowledge Base (K in MAPE-K)
Managers use the same policy database, workload history, and thresholds.
2. Policy Hierarchy
Global Manager sets overall constraints.
Local Managers optimize within those boundaries.
3. Event-Driven Communication
Managers send events like:
● overload detected
● energy threshold crossed
● SLA violation predicted
4. Control Loops Synchronization
Manager loops operate at different time scales to avoid conflicts:
● Global Manager → slow, long-term
● Local Manager → fast, short-term
5. Resource Orchestration Platforms
Kubernetes, OpenStack, VMware vSphere coordinate managers via orchestration engines.

Control theory:
Control theory is the study of monitoring a system and adjusting it to reach the desired state or [Link] using
control theory the system continuously monitors CPU, RAM and network usage compares it with desired state and
adjusts task allocation or balances the load. This ensures that the system gradually and safely moves towards the goal
state, completing all tasks on time while keeping
resource stable and efficiently used.
An optimal controller manages a queueing system by using feedback and prediction. External traffic λ(k) enters the
queue and is also sent to a predictive filter, which forecasts future load and disturbances. The controller receives three
inputs:
● the forecast signal (s),
● the reference target (r),
● the current queue state q(k) from feedback.
Using these, it computes the optimal control action u*(k) to regulate how the queue processes traffic. The queueing
dynamics combine λ(k) and u*(k) to produce the output ω(k). Continuous feedback helps the controller adjust decisions
and maintain stability even when disturbances occur.

Applications of Control Theory in Cloud Computing


1. Task Scheduling
● Dynamically allocates CPU, memory, and storage based on workload.

● Prevents overload by shifting tasks from busy to idle servers.

2. QoS (Quality of Service) Control


● Monitors response time, delay, throughput.

● Adjusts resources when performance drops to maintain required service levels.


3. Energy Optimization
● Reduces power by lowering CPU frequency or turning off idle servers.

● Saves energy without breaking QoS targets.

4. Load Balancing
● Distributes incoming tasks using feedback from server load.

● Avoids bottlenecks and keeps resource usage efficient.

5. Autonomic (Self-Managing) Systems


● Automatically monitors, analyzes, and adjusts resources.

● Reduces human intervention and enhances stability in distributed systems.

6. Performance–Energy Tradeoff
● Balances performance needs with energy savings.

● Ensures both efficiency and service quality.

Key Insights (Short)


1. Dynamic Real-Time Adaptation
● Resources adjust continuously as workload changes.

2. Optimal Resource Utilization


● Prevents both underuse (idle servers) and overload.

3. System Stability
● Feedback control avoids oscillations and service disruptions.

4. Better QoS
● Ensures low delay, smooth performance, and timely task completion.

5. Scalability & Autonomy


● Enables automatic scaling and large-scale cloud management without manual effort.

Role of Control Theory in Optimizing Task Scheduling in Cloud Computing

1. Ensures Stable and Efficient Task Scheduling

Control theory treats the cloud as a closed-loop system with feedback.


Schedulers use real-time metrics (CPU, queue length, latency) to adjust task assignments dynamically, preventing
overload and underutilization.

2. Maintains Service Quality (QoS)

Controllers measure deviation from target performance (response time, throughput, deadlines).
If QoS drops, the controller automatically:

● allocates more CPU/memory


● reschedules tasks
● triggers scaling

This maintains SLA compliance.

3. Prevents System Instability

Without control, rapid scheduling changes cause thrashing.


Control theory ensures adjustments are smooth, stable, and proportional, avoiding oscillations.

How Control Theory Handles Varying Workload Demands and Prioritizes Tasks
1. Real-Time Monitoring of Workload
Sensors track:
● workload intensity
● number of incoming tasks
● queue length
● resource usage
This data feeds back into the controller.
2. Adaptive Allocation Based on Workload Change
If workload spikes → controller increases scheduling rate, allocates more VMs/containers.
If workload drops → controller scales down to save cost/energy.
3. Priority-Based Scheduling
Controllers use weighted feedback rules to assign:
● more resources to high-priority tasks
● limited resources to lower-priority jobs
This ensures fairness and SLA-driven priority handling.
4. Predictive Behavior
Model-based or predictive controllers can anticipate workload changes and adjust scheduling before bottlenecks occur.

How Control Theory Enables Dynamic Adaptation and Real-Time Decisions


1. Continuous Feedback Loop
The MAPE-K loop (Monitor → Analyze → Plan → Execute) acts as a real-time control system.
Schedulers continuously adapt based on updated performance metrics.
2. Automatic Scaling and Load Adjustment
Controllers automatically:
● scale resources up/down
● migrate tasks
● rebalance loads
● throttle or accelerate task execution
All in real time, without human intervention.
3. Corrective Actions Based on Deviations
If deviation from target (e.g., deadline miss) is detected:
● increase concurrency
● reassign tasks to idle nodes
● modify scheduling time slice
These adjustments happen instantly.

4. Stability Even Under Rapid Changes


Control mechanisms avoid:
● overshooting
● oscillation
● uncontrolled scaling
ensuring smooth operation at all load levels.

Resource Bundling
Resource bundling means grouping multiple cloud resources (compute, storage, network, software) into a single
package or “bundle” that can be allocated as one unit. This helps providers manage resources efficiently and meet user
performance or SLA requirements.
Purpose
● Simplifies allocation and management.
● Improves utilization by combining underused resources.
● Helps meet user-specific performance/SLA needs.

Key Components of a Resource Bundle


1. Compute Resources
● Definition: Provide processing power for running applications.
● Examples:
○ CPU: General computing tasks.
○ GPU: AI/ML, graphics, high-performance tasks.
○ VMs: Virtualized systems running apps independently.
● Purpose: Ensures applications execute efficiently with the required processing capability

2. Storage Resources
● Definition: Store and manage application data.
● Examples:
○ Block Storage: Fast access; suitable for databases/filesystems.
○ Object Storage: Scalable; used for media, backups, cloud-native apps.
● Purpose: Provides reliable, accessible data storage.

3. Network Resources
● Definition: Enable communication between cloud components.
● Examples:
○ Bandwidth: Speed of data transfer.
○ Virtual Network Interfaces (VNI): Allow VMs/containers to connect securely.
● Purpose: Ensures smooth data exchange within the cloud and over the internet.

4. Software Resources
● Definition: Pre-installed tools, systems, or services within the bundle.
● Examples:
○ OS packages (Linux/Windows)
○ Middleware (web servers, databases)
○ Frameworks / pre-configured apps
● Purpose: Reduces setup time and simplifies deployment with ready-to-use environments.

Resource Bundling Techniques


1. Static Bundling
● Resources are fixed and pre-defined; cannot change once created.
● Example: A VM bundle with 2 CPUs, 8 GB RAM, 100 GB storage, Linux OS—always the same configuration.
● Pros: Simple, predictable.
● Cons: Not flexible for changing workloads.

2. Dynamic Bundling
● Resources change automatically based on demand; taken from a shared pool.
● Example: A web app starts with 2 VMs, and when traffic increases, the cloud automatically adds more
VMs/storage.
● Pros: Efficient, adapts to workload changes.
● Cons: More complex to manage.

3. User-Defined Bundling
● Users custom-select the resources for their own bundle.
● Example: A user chooses 4 VMs, 200 GB storage, 2 Gbps network, and a pre-installed database for their app.
● Pros: Highly flexible, avoids resource wastage.
● Cons: Requires user knowledge; wrong choices may cause inefficiency.

Combinatorial Auctions
Combinatorial auctions let users bid on bundles of resources instead of single items. Cloud systems use them for fair
and efficient resource allocation.

1. ASCA – Ascending Clock Auction


● Resource prices increase gradually over rounds.
● If demand > supply, the price for that resource rises next round.
● Auction stops when demand = supply.
● Example: If many users bid for CPUs, the CPU price keeps rising until only the highest-valuing users stay.

2. SCA – Simultaneous Clock Auction


● All resources have simultaneous price clocks.
● Users bid on bundles while all prices rise in parallel whenever demand exceeds supply.
● Example: A user wanting 2 CPUs + 50 GB storage bids while both CPU and storage prices increase together.

3. SCVA – Simultaneous Clock with Virtual Auction


● Users place bids on virtual resource bundles, even if not physically allocated yet.
● System computes the best overall allocation from all virtual bids.
● Example: A user needing 3 CPUs + 50 GB storage + high-speed bandwidth gets an optimized bundle chosen
from many virtual combinations.
4. CPA – Clock Proxy Auction
● Hybrid approach: users specify desired bundles; a proxy agent auto-bids for them.
● Reduces manual effort and ensures efficient final allocation.
Unit 5

Evolution of Storage Technology

The evolution of storage technology is the gradual improvement of data-storage methods—from punch cards to cloud
storage—aimed at increasing capacity, speed, reliability, and reducing cost.

Era Storage Type / Key Points Example


Device

1950s–1970s (Early Punch Cards Paper cards with punched Used in early tabulating
Mechanical & Magnetic holes; slow, bulky, very low machines
Storage) capacity

Magnetic Tape Sequential data storage; Tape reels used for archival
suitable for backup and archival storage

1970s–1990s (Magnetic Hard Disk Drive Introduced in 1956; random Hard disks used in early
Storage Era) access, large storage capacity computers

Floppy Disk Portable storage with small Used for software installation
capacity (80 KB–1.44 MB) and file transfer

1980s–2000s (Optical Compact Disc Stores around 700 MB; used for Audio CDs, software discs
Storage Era) music and software

Digital Versatile Stores about 4.7 GB; suitable Movie discs, software
Disc for movies and multimedia distribution

Blu-ray Disc High-definition media; stores up Blu-ray movie discs


to 50 GB

2000s–Present (Solid-State Flash Drive (USB Portable, durable, fast data USB pen drives
Storage) Drive) transfer
Solid-State Drive No moving parts; very high SSDs in modern laptops and
speed, low power consumption servers

2000s–Present (Network & Network Attached Storage accessible over a NAS devices in homes/offices
Cloud Storage) Storage network for file sharing

Storage Area High-speed network storage for SAN used in data centers
Network enterprises

Cloud Storage Remote, scalable, pay-as-you- Google Drive, Amazon Web


go storage Services Simple Storage
Service

Storage model

A storage model explains how data is stored, organized, and accessed in computer systems or databases. It decides
the structure, efficiency, and performance of data storage.

File-Based Storage Model


• Data is stored as files and folders on storage devices.
• Each file contains user data such as documents, images, or videos.
• Example: Data stored in C:\Documents on your computer.
Advantages: Simple and easy to use.
Disadvantages: Difficult to manage large data; data duplication possible.

Block-Based Storage Model


• Data is divided into blocks of fixed size and stored on disks.
• Each block has a unique address for quick access.
• Used in: Hard Disk Drives (HDDs), Solid State Drives (SSDs).
Advantages: Fast data access; suitable for databases.

Disadvantages:
Complex to manage.

Object-Based Storage Model


• Data is stored as objects, not files or blocks.
• Each object contains:
– Data (actual content)
– Metadata (information about the data)
– Unique ID (used to locate the object)
• Used in cloud storage systems such as Amazon S3, Google Drive, Dropbox.
Advantages: Scalable, easy to manage, supports big data.
Disadvantages: Needs internet and is not compatible with older
systems.

Database Storage Models

Hierarchical Model

• Data is stored in a tree-like structure (parent–child relationship).


• Each child record has only one parent.
• Used in early mainframe databases.
• Example: Company → Department → Employee (Company is the parent, Employee is the child).
Advantages: Fast access for one-to-many relationships.
Disadvantages: Complex for many-to-many relationships.

Network Model •
Data is stored as records connected by links (pointers). •A
record can have multiple parents and children. •
More flexible than the hierarchical model. •
Example: A student can take many courses, and each course can have many
students.
Advantages: Handles complex relationships.
Disadvantages: Difficult to design and maintain.

Relational Model
• Data is stored in tables (rows and columns).
• Each table represents one entity (such as Student or
Employee).
• Relationships are created using primary and foreign keys.
• Most widely used model.
• Example: A Student table and a Course table linked through
Student_ID.

GPFS (IBM Spectrum Scale)


Definition:
GPFS, also called IBM Spectrum Scale, is a high-performance distributed parallel file system used in supercomputing,
research, and enterprise environments. It allows fast, simultaneous file access across many servers.
Background
• Developed by IBM in the early 2000s (successor to TigerShark).
• Designed for large clusters, supporting file systems up to 4 PB with 4096 disks.
• Behaves like a POSIX file system but works in a distributed manner.
Architecture & Working (Short)
• Data is striped across disks (16 KB–1 MB blocks) for parallel access.
• Metadata (file attributes, block addresses) stored in inodes & indirect blocks.
• Disks connected via a SAN; compute nodes spread across LANs.
• Metadata servers manage directory and file structure; clients read/write directly from storage nodes.
• Supports parallel reads/writes, so multiple clients can access the same file at once.

Reliability & Fault Tolerance


• Uses write-ahead logging: metadata updates are logged before being committed, helping recovery after failures.
• Each I/O node keeps its own log file and can help recover a failed node.
• RAID + dual-attached controllers used to mask disk failures.
• Data and metadata are replicated across two disks to avoid data loss.

Key Features (Short)


• Parallel Access: Multiple users read/write simultaneously.
• High Scalability: Petabyte-level storage across hundreds of nodes.
• Fault Tolerant: Logs, replication, RAID, and recovery mechanisms.
• POSIX-Compliant: Works like a traditional file system.
• Prefetching & Disk Parallelism: Faster I/O through concurrent reads.

Applications (Examples)
• Weather and physics simulations in supercomputers.
• Genome analysis and scientific computing.
• Large enterprise databases and big-data analytics.
• High-performance data centers.

Google File System (GFS)


Definition:
Google File System (GFS) is a scalable distributed file system designed by Google to store and process very large files
across thousands of commodity servers. It provides high throughput, fault tolerance, and reliability for applications like
Search, Maps, and MapReduce.
Key Concepts
● Built for very large files (GB–TB) and bulk processing, not small-file workloads.
● Most operations are sequential reads and appends, not random writes.
● Uses cheap commodity hardware, so failures are expected and handled automatically.
● Offers relaxed consistency to simplify design without burdening developers.

Major Design Decisions


1. Files are split into large 64 MB chunks stored on chunk servers.
2. Each chunk is replicated (default 3 copies) for fault tolerance.
3. A single master server manages metadata, chunk locations, and namespace.
4. Clients read/write directly from chunk servers, not through the master (avoids bottlenecks).
5. Atomic append allows many applications to write to the same file safely.
6. No client-side caching to avoid consistency problems.
7. High-bandwidth network design with pipelined data transfer.
8. Efficient checkpointing, logging, and garbage collection for fast recovery.

How GFS Works


● Client asks master for chunk location.
● Master returns chunk handle + list of replicas.
● Client sends data to all replicas → primary replica orders updates → secondaries follow the same order → client
gets confirmation

Reliability Features
● Master keeps metadata in an operation log for crash recovery.
● Periodic heartbeats exchange chunk lists and detect failures.
● Checksums verify data integrity.
● Garbage collection cleans deleted files lazily.

SAN:
A Storage Area Network (SAN) is a high-speed, dedicated network that connects cloud servers to a centralized pool of
storage devices. It provides block-level storage that can be accessed as if it were locally attached to the server.
Reason for Introduction
● To handle growing data storage needs in data centers and cloud environments.
● To provide high performance, scalable, and reliable storage independent of servers.
● To overcome limitations of traditional direct-attached storage (low scalability, difficult management).
Properties / Features
● High-speed data transfer using Fibre Channel or iSCSI.
● Block-level storage (appears as local disk to servers).
● Centralized management and easy scalability.
● Supports data redundancy and backup.
● Low latency and high throughput.
● Supports virtualization and cloud workloads.

Advantages
● Very fast access to large volumes of data.
● Highly scalable and easy to expand.
● Improves reliability through redundancy.
● Centralized storage simplifies management and backup.
● Enables server clustering and virtualization in cloud.

Disadvantages
● Expensive to deploy and maintain.
● Requires specialized hardware and expertise.
● Complex setup and configuration.
● Can have single-point failures if not properly designed.
● Fibre Channel networks can be costly.

Applications
● Cloud data centers and enterprise storage environments.
● Virtualized servers (VMware, Hyper-V).
● High-performance databases (Oracle, SQL Server).
● Backup, disaster recovery, and archival systems.
● Large-scale applications requiring fast I/O (ERP, analytics).
Parallel File System (PFS)
Definition
A Parallel File System is a high-performance file system that splits data across multiple storage servers and allows
many clients to read/write data simultaneously in parallel.
It is used in cloud computing, HPC, and big data environments to achieve very high throughput.

Advantages
● High throughput due to parallel read/write operations.
● Scales easily by adding more storage and compute nodes.
● Supports large files and massive datasets.
● Fault-tolerant with data replication/striping.
● Efficient for HPC, AI, big data workloads.

Disadvantages
● Complex setup and management.
● Requires high-speed interconnect (InfiniBand, high-bandwidth network).
● Expensive infrastructure.
● Application must be designed to take advantage of parallelism.

Applications
● High-Performance Computing (HPC).
● Scientific simulations and research computing.
● Big data analytics.
● Machine learning / AI training.
● Cloud computing platforms (AWS, Azure, GCP) for large-scale workloads.

How PFS Improves Throughput


● Data is striped across multiple storage nodes.
● Many clients can access different blocks at the same time.
● Parallel I/O eliminates bottlenecks of single-server file systems.
● High-bandwidth interconnects ensure fast data movement.
● Metadata servers optimize access to file locations.

How PFS Supports Complex Computing Workloads


● Handles large datasets required for simulations, AI, modeling.
● Allows concurrent access by thousands of nodes.
● Provides low-latency and fast I/O needed for scientific computations.
● Supports distributed computing frameworks like MPI and MapReduce.
● Ensures data consistency and reliability even under heavy load.

Examples of Parallel File Systems


● Google File System (GFS)
● Hadoop HDFS

Distributed File System (DFS)


A Distributed File System allows files to be stored across multiple networked machines but accessed as if they were on
a single system. It supports sharing, scalability, and reliable access to large datasets in cloud and distributed
environments.

Challenges in DFS
1. Data Consistency
● Ensuring all replicas of a file remain up-to-date and synchronized.
● Difficult due to concurrent access, network delays, and update conflicts.
● Requires consistency protocols like locking, versioning, or quorum mechanisms.
2. Fault Tolerance
● Nodes, disks, or networks can fail anytime.
● DFS must detect failures, recover lost data, and continue operation without stopping.
● Achieved through replication, logging, checkpointing, and self-healing mechanisms.
3. Scalability
● System must handle growing data, more clients, and more nodes.
● Metadata management becomes a bottleneck as file count increases.
● Load balancing, distributed metadata servers, and efficient indexing are required.

Data Replication in DFS


Data replication means storing multiple copies of the same data on different nodes.
It is used to improve fault tolerance, availability, and read performance.
How Replication Works
● Every file/chunk is stored on 2–3 replicas (or more).
● When one replica fails, data is served from another.
● Replicas are kept consistent using update propagation and versioning.
● Systems like GFS, HDFS use master–chunk server model for managing replicas.

Benefits of Replication
● Protects against node/server failures.
● Ensures high availability and durability.
● Allows faster read operations (nearest replica serves the request).
● Supports load balancing by spreading reads across replicas.

Hadoop Distributed File System (HDFS)


Definition
HDFS is a distributed file system designed by Hadoop to store and process very large datasets across multiple
machines. It provides high throughput, scalability, and fault tolerance, making it ideal for big data applications.

HDFS Architecture (as shown in diagram)


HDFS follows a master–slave architecture with:
1. NameNode (Master)
● Stores metadata: filenames, directory structure, block locations, replication level.
● Manages the file namespace and controls access to files.
● Does not store actual data.

2. DataNodes (Slaves)
● Store actual data blocks.
● Periodically send heartbeat + block reports to the NameNode.
● Handle read/write requests from clients.

3. Client
● Interacts with NameNode to get metadata and DataNode locations.
● Reads/writes data directly to DataNodes (not through NameNode), improving performance.

How HDFS Works (Read & Write)


Write Operation
1. Client requests NameNode to create a file.
2. NameNode returns list of DataNodes for storing replicas.
3. Client writes the block to DataNode1 → DataNode1 replicates to DataNode2 → to DataNode3 (pipeline).
4. NameNode updates metadata.

Read Operation
1. Client asks NameNode for block locations.
2. NameNode returns nearest DataNode.
3. Client reads block directly from that DataNode.

You might also like