CC
CC
Cloud Computing is a technology that allows you to store and access data and applications over the internet instead of
using your computer’s hard drive or a local [Link] enables users to access shared resources—including networks,
servers, storage, applications, and services—on demand, without direct active management by the user.
It builds on virtualization, distributed systems, and service-oriented architectures (SOA) to deliver scalable and reliable
computing as a service.
Characteristics:
1. Agility for organizations
2. Cost reductions, Centralization of infrastructure in locations with lower costs.
3. Device and location independence, which means no maintenance, required.
4. Pay-per-use means utilization and efficiency improvements for systems that are often only 10–20% utilized.
5. Performances are being monitored by IT experts i.e., from the service provider end.
6. Productivity increases which results in multiple users who can work on the same data simultaneously.
7. Time may be saved as information does not need to be re-entered when fields are matched
8. Availability improves with the use of multiple redundant sites
9. Scalability and elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis in
near real-time without users having to engineer for peak loads.
10. Self-service interface.
11. Resources that are abstracted or virtualized.
12. Security can improve due to centralization of data
Cloud Characteristic Description Application
On-demand self-service Users can automatically provision and manage Server, Time, Network, Storage
computing resources without human
intervention.
Broad network access Services are available over standard networks Smartphones, Tablets, PCs
and accessible through various devices.
Resource pooling Provider’s resources are shared among Physical and Virtual Resources
multiple users using a multi-tenant model. with Dynamic Provisioning
Rapid elasticity Resources can be quickly scaled up or down Adding or Removing Nodes,
based on demand. Servers, or Instances
Measured service Resource usage is monitored and billed on a Metering, Billing, Monitoring
(pay as you go) pay-per-use basis.
Multi-tenancy Multiple users share the same infrastructure Shared Servers, Databases
securely.
Resilient computing Systems are designed for fault tolerance and Backup, Load Balancing,
high availability. Disaster Recovery
Flexible pricing models Users pay only for the resources they Pay-per-use, Subscription, Spot
consume. Pricing
Sustainability Uses energy-efficient and eco-friendly data Green IT, Renewable Energy
centers. Data Centers
Principles:
1. Federation: Enables collaboration and resource sharing between different clouds securely and transparently.
2. Independence:Users can access required virtual resources independent of provider type or tools.
3. Isolation: Ensures each user’s data and processes are separated and protected from others.
4. Elasticity:Resources can scale up or down automatically based on user demand.
5. Business Orientation:Focuses on building efficient platforms that ensure service quality and SLA compliance.
6. Trust:Establishes secure and reliable interaction between providers and users.
7. Service Orientation:Follows a service-based model (IaaS, PaaS, SaaS) instead of delivering physical hardware.
8. Virtualization: Abstracts hardware and runs multiple virtual machines on one physical host efficiently.
9. Autonomy and Automation:Cloud systems self-manage, self-configure, and recover automatically with minimal
human control.
10. Scalability and Parallelism:Supports horizontal and vertical scaling and parallel execution of distributed workloads.
11. Dynamic Resource Management: Allocates and releases computing resources dynamically according to workload
demand.
12. Utility Computing Principle:
Operates like public utilities, charging users only for what they consume.
13. Security and Isolation:Protects multi-tenant environments through encryption, virtualization, and access control.
Ex:
2. Microsoft Azure
1. Cost: It reduces the huge capital costs of buying hardware and software.
2. Speed: Resources can be accessed in minutes, typically within a few clicks.
3. Scalability: We can increase or decrease the requirement of resources according to the business requirements.
4. Productivity: While using cloud computing, we put less operational effort. We do not need to apply patching, as well
as no need to maintain hardware and software. So, in this way, the IT team can be more productive and focus on
achieving business goals.
5. Reliability: Backup and recovery of data are less expensive and extremely fast for business continuity.
6. Security: Many cloud vendors offer a broad set of policies, technologies, and controls that strengthen our data
security.
Advantages of CC over internet:The cloud delivers more flexibility and reliability, increased performance and
efficiency, and helps to lower IT costs. It also improves innovation, allowing organizations to achieve faster time to
market and incorporate AI and machine learning use cases into their strategies.
On-Demand Self-Service:
● Users can provision computing resources (e.g., storage, processing power) as needed without requiring human
interaction with service providers.
● This is faster and more flexible than traditional Internet services where provisioning may require manual setup.
Broad Network Access:
● Cloud services are accessible over the Internet from a wide range of devices (laptops, smartphones, tablets).
● The service is device-independent, unlike some Internet applications tied to specific platforms.
Resource Pooling / Multi-Tenancy:
● Computing resources are pooled to serve multiple users dynamically.
● Internet-based applications often dedicate resources to individual users, leading to underutilization.
Rapid Elasticity / Scalability:
● Cloud resources can be scaled up or down quickly to handle varying workloads.
● Traditional Internet hosting often has fixed capacity, making sudden spikes difficult to handle.
Measured Service / Pay-Per-Use:
● Cloud providers monitor and charge based on actual resource usage.
● In typical Internet setups, users often pay for fixed resources regardless of usage.
Reliability and Redundancy:
● Clouds provide fault tolerance and data replication across multiple data centers.
● Standard Internet services may not offer the same level of reliability.
Automation and Management:
● Cloud platforms handle maintenance, updates, and monitoring automatically.
● Internet applications often require manual administration.
Cloud Computing Advantages ☁️
Cost Savings Traditional IT requires large CapEx investment, regardless of usage. Cloud's Pay-per-
Use model (Measured Service) and OpEx structure avoid over-provisioning and initial
capital outlay.
Scalability/Agility Traditional IT scaling is slow (procuring, installing, configuring hardware). Cloud's Rapid
Elasticity is nearly instant and automated, matching variable demand.
Reliability/Reach Traditional IT disaster recovery is complex and expensive. Cloud provides built-in High
Availability and Global Deployment via vast, redundant data centers.
Innovation/Access Traditional IT requires internal expertise and time to integrate new tech. Cloud offers
instant Access to Advanced Services (AI, ML) as managed utilities.
what makes cc so interesting to it stakeholders and research practitioners:
IT Stakeholders: Individuals or groups (like managers, investors, or employees) who have a vested interest in the
performance, decisions, and outcomes of IT systems.
Developers: Professionals who design, build, and maintain software applications or systems using programming,
frameworks, and tools.
Researchers: Individuals who systematically investigate and experiment to generate new knowledge, technologies, or
solutions in a specific domain.
Financial Model Cost Savings: Trading high, fixed Capital Expenditure (CapEx) for low, variable
Shift Operational Expenditure (OpEx) allows for more predictable budgeting and frees up
capital for core business investments. The Pay-per-Use model (Measured Service)
ensures they only pay for what they consume.
Agility & Time-to- Rapid Deployment: Spinning up new servers, databases, or complex environments in
Market minutes instead of weeks or months allows the business to test new ideas and launch new
products much faster than competitors.
Elastic Scalability Matching Demand: The ability to instantly scale resources up or down (Rapid Elasticity) to
handle sudden spikes (like a holiday sale or viral event) without over-provisioning
infrastructure, and then shrinking back down to save cost.
Focus on Core Reduced Management Overhead: Outsourcing the mundane, complex tasks of managing
Business physical hardware, patching operating systems, and maintaining data centers allows the
in-house IT team to focus on strategic, value-generating activities.
Global Reach Instant Globalization: Deploying applications to data centers all over the world in minutes
allows a business to reach new international markets quickly and improve user experience
by reducing latency.
Enhanced Built-in Disaster Recovery: Cloud providers offer superior resilience, reliability, and
Resilience automated backup services, drastically reducing the risk and cost associated with
business continuity planning.
Massive Compute Big Data & HPC: Access to massive, on-demand compute clusters (High-Performance
Power Computing - HPC), Petabytes of storage, and high-speed networking that no single
institution could afford or maintain themselves. This enables research into Big Data,
climate modeling, and genomics.
Access to Advanced Innovation Acceleration: Immediate access to pre-built, fully managed PaaS and SaaS
Services tools for Artificial Intelligence (AI), Machine Learning (ML), serverless computing, and
quantum computing simulators. Researchers can skip infrastructure setup and go
straight to writing code and running experiments.
Collaboration & Shared Environments: The ability to easily share code, data sets, and entire computing
Reproducibility environments with collaborators across the globe, enhancing research transparency
and reproducibility.
Cost-Effective Low Barrier to Entry: Researchers can spin up a powerful, high-cost environment, run a
Experimentation simulation for a few hours, and shut it down, only paying for the minimal time it was
used. This encourages low-risk experimentation.
Ubiquitous Access Work from Anywhere: Data and applications are accessible from any location with an
internet connection, allowing for seamless work between the lab, office, and home.
Cloud service models:
Customer High – users manage OS, Moderate – users manage Low – provider manages
Control applications, and applications and data; everything; users just access
middleware. provider manages OS and the application.
infrastructure.
User Install and maintain OS, Develop, deploy, and Use the application; minimal
Responsibility applications, and patches. maintain applications and technical management.
data.
Scalability Flexible; users can scale Supports automatic scaling Scales automatically for multiple
virtual machines and of applications developed users; provider handles load.
storage as needed. on the platform.
Examples Amazon EC2, Amazon S3, Google App Engine, [Link], Google Docs,
Joyent Microsoft Azure App Zoho Office
Service, Heroku
Cost Structure Pay for computing Pay for platform usage; Subscription-based; low upfront
resources used; upfront reduces cost of managing cost; multi-tenant architecture
OS/application licensing hardware/software. reduces cost.
required by the user.
Use Case Developers need full Application developers End-users accessing ready-to-
control over infrastructure focusing on building apps use applications for business or
for custom solutions. without managing personal use.
infrastructure.
Disadvantages • Limited control over • Limited control over • Limited customization options.
physical infrastructure. infrastructure and • Possible data security
• Security management configurations. concerns.
responsibility lies with the • Dependence on provider
user. reliability.
Resource Ownership Resources are often Resources are owned by Resources are owned by a
owned by a single multiple, collaborating third-party vendor (Cloud
organization (e.g., an organizations (multiple Provider).
internal corporate administrative domains).
network).
Advantages Highly reliable, highly Efficiently utilizes idle CPU Cost-effective (OpEx, pay-as-
scalable, and robust cycles across the globe, you-go), elasticity (scale
due to redundancy. capable of running up/down instantly), low/no
massive parallel jobs. capital expenditure.
Disadvantages Complex to design High complexity due to Security/Compliance concerns
and debug due to lack heterogeneous resources (lack of control), dependency
of a global clock. High and security boundaries. on Internet connectivity, vendor
dependency on Requires specific lock-in risk.
network middleware for access.
communication.
Example Blockchain, Peer-to- Large Hadron Collider Amazon Web Services (AWS)
Peer networks (e.g., (LHC) Computing Grid EC2, Microsoft Azure, Google
BitTorrent), Multi-tier (WLCG), Folding@Home Cloud Platform (GCP), and
Web Applications. (Volunteer Computing), SaaS apps like Gmail or
weather modeling. Salesforce.
1. Centralized Computing
A computing model where all processing is done on a single central server/mainframe,
and users access it through terminals or thin clients.
Features:
1. Single control point – one central system manages processing and resources.
2. High data security – data is stored in one location.
3. Low client hardware needs – clients only interact; no heavy processing.
4. Easier maintenance – updates and backups handled at the central server.
2. Distributed Computing
A model where processing is spread across multiple independent computers that communicate over a network to
achieve a common goal.
Features:
1. Resource sharing – multiple systems share CPU, storage, and applications.
2. Scalability – nodes can be added easily to increase performance.
3. Fault tolerance – failure of one node doesn’t stop the whole system.
4. Parallel processing – tasks can be executed concurrently on different nodes.
3. Cloud Computing
A model that delivers on-demand computing resources (servers, storage, applications) over the internet as a service,
with pay-as-you-go usage.
Features:
1. On-demand self-service – users provision resources automatically.
2. Elasticity & scalability – resources scale up/down based on needs.
3. Measured service – pay for what you use.
4. Broad network access – accessed via internet from any device.
4. Grid Computing
A distributed computing model that combines geographically distributed, heterogeneous
resources to solve large-scale scientific or technical problems.
Features
1. Heterogeneous resource pooling – uses many different types of systems.
2. High-performance computing – suitable for scientific simulations.
3. Decentralized management – resources owned by different organizations.
4. Virtual organizations – users share resources across institutions.
Cloud Deployment Models, as defined by NIST (National Institute of Standards and Technology), describe the nature,
location, and ownership of the cloud infrastructure.
Public Cloud Services offered 1. Multi-tenant - Low cost- No - Less control- AWS, Google
to the general 2. Highly scalable maintenance- Security concerns Cloud,
public over the 3. Internet-based High reliability Microsoft
internet. Owned access Azure
and managed by 4. Managed by vendors
third-party
providers.
Hybrid Cloud Combination of 1. Cloud bursting - High flexibility- - Complex setup- Private cloud
public + private 2. Workload portability High scalability- Needs strong + AWS
cloud with 3. Mix of secure + Secure for network combination
data/application scalable sensitive data
portability. 4. Balanced cost
Enables flexible
workload
distribution.
Community Shared by multiple 1. Shared infrastructure - Cheaper than - Not highly Government
Cloud organizations with 2. Good security private- Good scalable- Higher community
similar needs. 3. Community-based collaboration- cost than public cloud,
Costs and 4. Managed Higher security Research
resources are internally/externally cloud
shared equally.
On-Demand Computing
On-demand computing is a cloud model where computing resources such as storage, servers, applications, and
processing power are provided whenever the user needs them, without human intervention.
Users can provision and release resources automatically based on demand.
Key Features:
1. Self-Service Provisioning – Users can create, run, or delete resources without contacting the provider.
2. Elastic Scaling – Resources automatically scale up or down based on workload.
3. Pay-per-Use Model – Users pay only for the amount of computing they consume.
4. Rapid Deployment – New applications or servers can be launched within minutes.
5. Automation – Resource allocation is handled by automated cloud systems.
6. Resource Pooling – Multiple users share a centralized resource pool efficiently.
Example:
1. A user instantly creates a virtual machine on AWS only when needed and shuts it down after use.
2. Netflix automatically adds more servers during peak streaming hours and reduces them when demand
how internet computing has transformed traditional computing models and allowed for scaalable solutions:
Traditional computing :
Traditional computing models refer to the on-premise IT infrastructure where organizations buy, install, and manage
their own hardware, software, servers, networking, and storage.
All computation and data processing occur within the organization’s physical premises.
what is effective work load distribution,its contribution in distributed and cloud systems:
Effective workload distribution is the process of dividing tasks and computational loads across multiple systems or
servers in a balanced way so that no single machine is overloaded.
It ensures optimal performance, faster processing, and efficient resource utilization.
● Ensures High Performance: Detects congestion and bottlenecks early to maintain smooth operations.
● Improves Reliability & Availability: Identifies failures quickly, enabling fast troubleshooting and minimizing
downtime.
● Supports Load Balancing: Monitors traffic across nodes to ensure efficient workload distribution and prevent
overloads.
● Enhances Security: Detects abnormal traffic (like DDoS) and helps apply security policies promptly.
● Capacity Planning: Tracks data usage trends to inform hardware upgrades and resource additions.
● Optimizes Resource Utilization: Ensures all network devices and resources operate within optimal limits,
preventing waste.
● Service Quality (QoS) Assurance: Maintains performance of critical applications by setting traffic priorities.
2. Key Parts
● Client: User device/application sending requests.
● Server: Central machine providing data, storage, computation.
● Network: Medium enabling communication between them.
● Protocols: Rules (HTTP, FTP, TCP/IP) enabling data exchange.
Benefits
● Centralized control and management
● High security and data integrity
● Easy to scale by upgrading servers
● Simple to maintain and manage
Drawbacks
● Single point of failure (server)
● Server overload if too many clients
● Higher cost (powerful servers required)
● Limited fault tolerance unless replicated
Benefits
● No central point of failure
● Highly scalable (peers increase → capacity increases)
● Cost-effective (no central server needed)
● Balanced load (each peer contributes resources)
Drawbacks
● Security is weaker (no central control)
● Unreliable peers reduce performance
● Hard to manage and coordinate
● Data consistency is difficult to maintain
Impact on Scalability
● New VMs can be created instantly → quick expansion.
● Workloads can be moved/migrated → balanced scaling.
● Physical resources are used efficiently → supports large-scale distributed systems.
2. How Containers Differ from Traditional Virtual Machines & Their Impact on Scalability
Difference (Simple Points)
Feature Virtual Machines (VMs) Containers
Impact on Scalability
● Containers scale faster (seconds) → more responsive to changing demand.
● Higher density: More containers can run on one machine than VMs.
● Lower cost scaling: Uses fewer resources.
● Ideal for microservices → each service scaled independently.
Throughput Work/data processed per second Shows system capacity under load
Fault Tolerance System works even when parts Prevents service interruptions
fail
Consistency All nodes have same correct data Avoids errors and data mismatches
Resource How efficiently resources are used Avoids wastage and ensures
Utilization optimal usage
CPU Overhead Extra CPU cycles used by Use hardware-assisted virtualization (Intel VT-x, AMD-
hypervisor V), optimized hypervisors
Memory Overhead Extra RAM used for multiple Use memory ballooning, deduplication, increase
VMs physical RAM
I/O Overhead Slower disk and network Use paravirtualized drivers, SSD storage, SR-IOV,
operations faster networking
Boot/Startup Delay VMs take longer to start Use lightweight VMs or containers
Context Switching Overhead Switching between VMs Limit excessive VM creation, use CPU pinning
causes delay
Network Function Virtualization (NFV) & Its Role in Enhancing Cloud Computing Systems:
Network Function Virtualization (NFV) replaces hardware-based network devices (routers, firewalls, load balancers)
with software-based virtual network functions (VNFs) running on standard servers.
It is an architectural concept that virtualizes entire classes of network node functions—such as firewalls, load balancers,
intrusion detection systems, and DNS—away from dedicated proprietary hardware and onto standard, industry-standard
commodity servers
1. Improves Scalability VNFs (Virtual Network Functions) can be scaled up/down instantly without
installing new hardware.
2. Reduces Costs Uses general commodity servers instead of expensive dedicated network
appliances, lowering CapEx.
3. Faster Deployment New network services (firewalls, VPNs, load balancers) can be launched in
minutes as software.
4. Flexibility & Agility Network functions can be moved, replicated, or updated via software without
downtime or physical reconfiguration.
5. Automation Integrates with orchestration tools for auto-scaling and creating self-healing
networks.
6. Better Resource Utilization Multiple virtual network functions efficiently share the same underlying
physical hardware resources.
Clustering is the technique of connecting multiple servers/computers to work together as a single system.
If one node fails or is overloaded, another node takes over—improving performance and reliability.
Load-Balancing Cluster Distributes user requests across multiple nodes to avoid overload.
High-Availability (HA) Cluster Provides failover—if one node fails, another replaces it instantly.
High-Performance (HPC) Cluster Multiple nodes work in parallel for heavy computation tasks.
Storage Cluster Multiple storage nodes combine to form scalable, redundant storage
systems.
Grid/Distributed Cluster Nodes located in different locations coordinate to process large jobs.
Scalability: The ability of a system to handle an increasing amount of work by adding resources, typically measured
by how performance is sustained under heavier load.
System Reliability: The probability that a system or component will perform its required function without failure for
a specified period of time under stated conditions.
Availability: The percentage of time a system is operational and accessible to users when needed
Contribution to Scalability
● Horizontal Scalability: Clustering allows adding more nodes (servers) to increase capacity without changing the
existing system.
● Resource Pooling: The CPU, memory, and network resources of many servers are combined, enabling the
system to handle very high traffic.
● Near-Linear Performance Growth: As demand increases, new servers can be added, and the load is evenly
distributed, giving almost proportional performance improvement.
● Elasticity: Auto-scaling tools can automatically add or remove virtual/container nodes when workload crosses
thresholds, ensuring smooth performance during traffic spikes.
An internet-based system is a software platform that uses the internet to allow users to access and interact
with applications and data without needing to install software locally
1. Single Point of Failure If a critical component (database, firewall, load balancer) has no redundancy, its
(SPOF) failure stops the entire system. This reduces uptime, and without high availability,
true scaling is impossible.
2. Database Bottlenecks Databases are hard to scale horizontally because of consistency issues. As traffic
grows, the database becomes the slowest part, limiting application scaling.
3. Network Latency & As nodes spread across regions, data travel time increases. Higher latency and
Bandwidth Limits limited bandwidth slow response times and restrict the benefits of adding more
servers.
4. Inter-Service Communication Microservices require many internal API calls. Serialization, network delays, and
Overhead connection overhead accumulate and reduce throughput as the system grows.
5. Cache Invalidation Distributed caches must stay consistent. Updating cached data across many
Complexity nodes is complex and resource-heavy, slowing write operations and limiting
scaling.
6. Resource Contention CPU, RAM, and disk I/O become overloaded when many services share the
same hardware, restricting horizontal scaling.
7. Load Balancing Challenges Poor or uneven distribution of user traffic overwhelms some nodes while others
stay idle, preventing effective scaling.
How Scalable Computing Over the Internet Improves Fault Tolerance & Reliability in Cloud Computing:
Scalable Computing over the Internet refers to the on-demand provisioning and de-provisioning of computing resources
(like CPU, memory, storage, and networking) delivered over a broad network (the Internet), which allows the system to
automatically and efficiently handle a massive, variable workload while maintaining performance and controlling costs
2. Improved Reliability
Reliability means the system operates correctly for long periods. Cloud scalability improves reliability through:
● No Single Point of Failure: Distributed and replicated components ensure the system continues even if one part
breaks.
● Resource Pooling & Elasticity: Workloads move automatically to stable servers when resources fail or overload.
● Software-Defined Infrastructure: Automated, standardized deployments reduce human errors and keep systems
stable.
● High Availability SLAs: Cloud providers guarantee high uptime (like 99.99%) due to their scalable, reliable
architecture.
Best Practices for Securing VMs & Containers (with Energy Efficiency):
Containers are lightweight, isolated environments that package an application along with all its dependencies (libraries,
runtime, configuration) so it can run consistently across different systems.
They share the host OS kernel, making them faster, smaller, and more portable than virtual machines.
Energy efficiency in cloud computing refers to using cloud resources and data centers in a way that reduces power
consumption while maintaining high performance.
It focuses on optimizing hardware usage, improving cooling systems, using renewable energy, and deploying scalable
techniques (like virtualization, auto-scaling, and containers) to minimize wasted energy.
B. For Containers
Energy efficiency in cloud computing involves using less energy to perform the same computational work by
optimizing hardware, software, and operational practices in data centers
Virtualization & Running many Virtual Machines (VMs) or Maximizes host utilization (less idle
Consolidation Containers on a single physical server (host). power draw) and reduces the total
number of physical servers needed.
Dynamic Voltage and Automatically adjusting the CPU clock speed and Reduces energy consumption during
Frequency Scaling (DVFS) voltage based on the current workload. periods of low usage while
maintaining high performance when
needed.
Power-Aware Load Distributing workloads not just by CPU usage, but Saves significant power by de-
Balancing by power consumption. Consolidating workloads powering idle hardware rather than
onto fewer servers and putting the idle servers letting it sit active.
into a low-power sleep state (hibernation or deep
sleep).
Efficient Cooling Using free cooling (outside air when temperatures Reduces the massive power
Technologies allow), liquid cooling (more effective than air), and required for cooling, which can
optimizing data center temperature setpoints account for 40% or more of total data
(operating at higher ambient temperatures). center power consumption.
Server Hardware Using high-efficiency power supply units (PSUs) Reduces wasted power in
Optimization and low-power hardware components (e.g., solid- conversion and operation.
state drives (SSDs) instead of spinning disks, or
specialized, lower-power CPU architectures).
● Use Hardware Acceleration: Enable CPU features like AES-NI to handle encryption in hardware, reducing
CPU usage and energy consumption.
● Efficient Monitoring & Logging: Use lightweight or agentless monitoring and filter logs at the source so only
important security events are processed.
● Minimal/Secure OS Images: Use minimal, hardened operating systems (e.g., Alpine, Distroless) to reduce
attack surface and avoid unnecessary background processes.
● Right-Sizing & Auto-Scaling Security: Deploy security tools (firewalls, IDS) as virtualized functions and apply
auto-scaling so they run at full capacity only when needed, saving energy during low-traffic periods.
1. Less Overheating = Fewer Failures: Efficient cooling reduces hardware crashes and downtime, improving
system stability and availability.
2. Stable Power = Higher Availability: Optimized power use lowers outages, strengthening overall cloud
reliability.
3. Reduced Attack Surface: Consolidation means fewer active servers, giving attackers fewer targets.
4. More Budget for Security: Savings from lower energy costs can be reinvested into advanced security tools and
monitoring.
5. Stronger Physical Security: Modern energy-efficient data centers include better access control, surveillance,
and fire protection.
Unit 2
● Virtualization is a computer architecture technology by which multiple virtual machines (VMs) are multiplexed in
the same hardware machine.
● The purpose of a VM is to enhance resource sharing by many users and improve computer performance in
terms of resource utilization and application flexibility.
● Hardware resources (CPU, memory, I/O devices, etc.) or software resources (operating system and software
libraries) can be virtualized in various functional layers.
● Virtualization is a technology that creates virtual versions of computing resources (servers, storage, networks)
using a hypervisor. It allows multiple virtual machines (VMs) to run on a single physical machine while remaining
isolated from each other.
1. Resource Pooling – Combines multiple physical resources (CPU, RAM, storage) into a shared pool that can be
allocated dynamically to many users.
2. Isolation – Each VM runs independently and securely, ensuring that faults or security breaches in one virtual
environment do not affect others.
3. Scalability – Virtual machines or containers can be created or removed quickly (elasticity) based on real-time
demand, allowing for rapid scaling.
4. Cost Efficiency – Reduces hardware requirements by consolidating many workloads onto fewer physical
servers, leading to lower capital and operational expenditure (OpEx).
5. Support for Multi-Tenancy – Enables multiple users or organizations (tenants) to securely share the same
physical infrastructure, which is foundational to the public cloud model.
6. On-Demand Self-Service – Allows users to provision resources automatically through APIs or web portals
without manual intervention from the service provider.
7. Measured Service – Provides the ability to track and monitor the consumption of virtualized resources,
enabling the pay-per-use and utility billing models.
8. Disaster Recovery (DR) and Business Continuity – Enables the easy creation of replicas and snapshots of
entire virtual systems, facilitating quick backup and recovery processes.
9. Live Migration – Allows a running virtual machine to be moved between different physical servers without
interrupting the service, enhancing maintenance efficiency and load balancing.
10. Hardware Abstraction – Decouples the software/OS from the underlying physical hardware, making
applications highly portable across different data centers or hardware vendors.
B. Efficiency
1. Hardware Optimization – Runs multiple workloads on the same physical server.
2. Energy Saving – Reduces power consumption by consolidating resources.
3. Cost Reduction – Lowers maintenance and operational expenses.
4. Performance Improvement – Balances workloads for better server performance.
5. Space Utilization – Minimizes need for extra physical hardware.
C. Scalability
1. On-Demand Resource Scaling – Increases or decreases resources as needed.
2. Elastic Infrastructure – Supports cloud elasticity for fluctuating workloads.
3. Rapid Expansion – Easily adds new virtual machines without new hardware.
4. Flexible Workload Management – Allocates resources dynamically to tasks.
5. Future-Proofing – Adapts to growing business or application requirements.
D. Resource Utilization
1. Maximum CPU Usage – Optimizes processor capacity across VMs.
2. Memory Optimization – Shares RAM efficiently among multiple virtual machines.
3. Storage Efficiency – Uses available storage effectively.
4. Reduced Idle Resources – Minimizes unused hardware to improve performance.
5. Balanced Workloads – Ensures all resources are utilized proportionally.
A Virtual Machine (VM) is a software-based emulation of a physical computer. It runs an operating system and
applications just like a physical machine, but is isolated from the underlying hardware and other VMs.
Key point: VMs are created and managed by a hypervisor or virtual machine monitor (VMM).
Working: The hypervisor abstracts hardware and allocates CPU, memory, storage, and network to each VM. Multiple
VMs can run on a single physical host.
Benefits:
● Isolation between VMs
● Efficient resource utilization
● Supports testing, development, and multi-OS environments
The Instruction Set Architecture (ISA) Level is the lowest form of virtualization implementation, focusing on
translating or emulating CPU instructions.
At the ISA level, virtualization is performed by emulating a given Instruction Set Architecture (ISA) by the ISA of the
host machine. This means the system runs software compiled for one type of CPU on a computer with a different CPU.
● Working: This approach uses software translation (like binary translation or interpretation) to convert the
instructions of the guest architecture into instructions that the host processor can understand and execute. This
process creates a Virtual ISA (V-ISA).
Benefits
● High Portability: It enables running a large amount of legacy binary code written for various processors on any
given new hardware host machine.
● Architecture Agnostic: It allows the software to be entirely independent of the underlying physical processor
architecture.
Drawbacks/Overheads
● Extremely High Performance Overhead: The constant, instruction-by-instruction interpretation and translation
process consumes significant CPU resources, leading to very slow execution compared to running the software
natively.
● Slower Execution: The time required to translate instructions adds considerable latency, severely impacting
performance.
The Hardware Abstraction Layer (HAL) Level is the classic method of system virtualization, where a hypervisor
creates virtual hardware environments for VMs.
Hardware-level virtualization is performed right on top of the bare hardware (Type 1) or on top of a host OS (Type 2).
The virtualization layer (Hypervisor) creates a uniform, virtual hardware environment for each Virtual Machine (VM).
● Working: The hypervisor manages the underlying physical resources, such as processors, memory, and I/O
devices. It presents a virtualized view of the hardware to the Guest OS, which then runs as if it were on a
dedicated machine.
Benefits
● Strong Isolation: Each VM is fully isolated from others, providing a high level of security and fault tolerance.
● Hardware Independence: It facilitates the easy migration of entire VMs between different physical hosts.
● Consolidation: It efficiently consolidates multiple virtual workloads onto fewer physical machines.
Drawbacks/Overheads
● Memory Overhead: Each VM must include its own full copy of the operating system (Guest OS), consuming
significant RAM.
● I/O Overhead: Network and storage Input/Output operations often require interception and translation by the
hypervisor, adding latency.
The Operating System (OS) Level is the basis of containerization, where environments are isolated using OS kernel
features.
This refers to an abstraction layer between the traditional OS and user applications. OS-level virtualization creates
isolated containers on a single shared physical server that all use the same underlying OS kernel instance.
● Working: It utilizes OS features (like Linux namespaces and cgroups) to isolate containers, giving them their
own file system, process tree, and network interface. The containers behave like real servers but without the
overhead of a full separate kernel.
Benefits
● Lightweight and Fast: Containers start up extremely quickly (seconds/milliseconds) and have minimal
overhead since they don't require a full Guest OS.
● High Density: Allows for the allocation of hardware resources among a large number of mutually distrusting
users in a very efficient manner.
● Resource Utilization: Maximizes resource utilization on a single physical server.
Drawbacks/Overheads
● Limited OS Types: All containers must share the same host OS kernel; you cannot run a Linux container on a
Windows host without an intermediate translation layer.
● Weaker Isolation: Isolation is maintained at the kernel level, which is less secure than the full hardware
isolation provided by HAL-level virtualization.
The Library Support Level (or User-Level API Level) virtualizes the communication link between applications and the
OS.
Most applications use APIs exported by user-level libraries. This level of virtualization involves controlling the
communication link between applications and the rest of the system through API hooks.
● Working: An intermediate layer intercepts the application's API calls and translates them into compatible calls
recognized by the host operating system. This makes the application believe it is running in its intended
environment.
Benefits
● Application Compatibility: Enables applications written for one OS to run on another without requiring full OS-
level or hardware virtualization.
● Lower Overhead: Generally incurs less overhead than running a full VM.
Drawbacks/Overheads
● Limited OS Feature Access: The translation layer may not support all complex system calls or OS features.
● Compatibility Issues: The translation may not be 100% accurate or complete for all applications.
The User-Application Level (or Process-Level Virtualization) runs applications within a controlled runtime environment.
Application-level virtualization is also known as process-level virtualization and often involves deploying High-Level
Language (HLL) VMs.
● Working: The application code is compiled into an intermediate bytecode (e.g., Java bytecode). The
virtualization layer (like the JVM or .NET CLR) sits as an application program on top of the host OS and
executes this bytecode. Other forms involve wrapping the application in an isolated layer (sandboxing).
Benefits
● Maximum Portability: The application code (bytecode) is entirely OS and hardware-independent ("Write Once,
Run Anywhere").
● Security and Sandboxing: The runtime environment provides inherent security by isolating the application
process from the host OS and other applications.
Drawbacks/Overheads
● Performance Overhead: The interpretation or Just-In-Time (JIT) compilation of bytecode by the runtime
environment adds some processing overhead.
● Requires Runtime: The host system must have the specific runtime environment (e.g., JVM) installed.
● OS-level virtualization inserts a virtualization layer inside the operating system to partition a machine’s physical
resources.
● It allows multiple isolated virtual machines (VMs) to run within a single operating system kernel.
● These VMs are also called Virtual Execution Environments (VEEs), Virtual Private Systems (VPS), or simply
containers
● To users, these VEs appear as real servers.
● Each VE has its own processes, file system, user accounts, network interfaces with IP addresses, routing tables,
firewall rules, and other configuration settings.
● Although VEs can be customized individually, they all share the same OS kernel.
● Hence, OS-level virtualization is known as single-OS-image virtualization.
Advantages of OS Extensions
Disadvantages of OS Extensions
● The primary disadvantage is that all OS-level VMs on a container must use the same type of guest operating
system.
● Although VMs may use different distributions, they must belong to the same OS family.
● For example, a Windows XP VM cannot run inside a Linux-based container.
● Cloud users often prefer different operating systems (Windows, Linux, etc.), creating limitations for OS-level
virtualization
● Most Linux platforms are not tied to a special kernel, allowing multiple VMs to run simultaneously on the same
hardware.
● Two OS tools—Linux vServer and OpenVZ—support Linux platforms in running other platform-based
applications via virtualization.
● A third tool, FVM, was developed specifically for OS-level virtualization on Windows NT platforms.
Setup complexity Can be more complex with command-line Easier with a GUI for built-in options like
interfaces, but KVM is integrated into the kernel Hyper-V and user-friendly third-party tools
Use case Server consolidation, cloud environments, Running other OSes on a desktop, testing,
development environments development
How VM and VMM Work Together to Achieve Virtualization :
The Virtual Machine (VM) and Virtual Machine Monitor (VMM)/Hypervisor work together in a layered architecture to
virtualize hardware, isolate resources, and allow multiple operating systems to run on one physical machine.
Roles
● Virtual Machine (VM):
A software-based computer with virtual CPU, memory, and storage. It runs its own guest OS, which believes it
controls real hardware.
● Virtual Machine Monitor (VMM) / Hypervisor:
The software layer between the VMs and the physical hardware. It creates, manages, and monitors VMs and
handles all hardware access on their behalf.
● Host Hardware:
The actual physical server whose resources (CPU, RAM, disk, network) are controlled by the VMM.
2. Instruction Interception
● Guest OS issues privileged instructions assuming full hardware control.
● VMM intercepts these instructions and safely executes or emulates them.
○ Full virtualization: VMM traps and translates instructions.
○ Paravirtualization: Guest OS uses hypercalls to request services from the VMM.
4. Host-Based Virtualization
● Virtualization software installs on top of the host OS (no OS modification required).
● Host OS provides device drivers and low-level services.
● Easy to deploy, flexible, but lower performance due to multiple layers.
● Requires binary translation when guest and host ISAs differ.
5. Paravirtualization
● Guest OS is modified to work with the VMM.
● Uses hypercalls instead of privileged instructions.
● Reduces overhead and improves performance.
● Issues: lower compatibility, harder OS maintenance.
● Used by Xen, KVM, and VMware ESX.
●
Compiler-Supported Paravirtualization
● Sensitive instructions replaced at compile time with hypercalls.
● Faster than runtime binary translation.
Concept Software (VMM) that sits Guest OS runs Guest OS is modified Hardware provides
directly on hardware or unmodified using to communicate with virtualization support
host OS to run VMs binary translation + VMM using hypercalls (Intel VT-x, AMD-V) to
direct execution improve performance
Cloud Forms base layer for cloud Allows running legacy Used to optimize VM Used in modern clouds
Implementation data centers; manages OSes in cloud without performance in cloud to achieve near-native
multiple VMs efficiently modification platforms; lower VM performance
overhead
Guest OS Can run unmodified OS Runs unmodified OS Guest OS must be Runs unmodified OS
Requirement modified (hardware handles
traps)
Performance High (especially Type-1 Lower than para- Higher performance; Very high performance
hypervisors) virtualization due to reduced overhead (hardware assists
binary translation virtualization)
VMM Role VMM creates, manages VMM traps and VMM receives VMM uses hardware
VMs; schedules CPU, translates privileged hypercalls instead of instructions to handle
memory, device access instructions traps traps efficiently
Instruction Hypervisor intercepts all Binary translation Privileged instructions CPU directly supports
Handling privileged instructions replaces sensitive replaced by virtualization
instructions with safe hypercalls in the OS instructions (VMX/SVM
sequences mode)
Examples VMware ESXi, Microsoft VMware Workstation, Xen (PV mode), KVM KVM, Xen HVM,
Hyper-V, Xen, KVM VirtualBox (software with VirtIO drivers, Hyper-V (with VT-
mode), QEMU full VMware ESX PV x/AMD-V)
virtualization drivers
Use in Cloud Widely used in IaaS Used for compatibility Used for improving Used in modern cloud
platforms and OS support VM performance infrastructures (AWS,
GCP, Azure)
1. Virtualizing a multi-core processor is relatively more complicated than virtualizing a uni-core processor.
2. Though multicore processors are claimed to have higher performance by integrating multiple processor cores in
a single chip, muti-core virtualization has raised some new challenges to computer architects, compiler
constructors, system designers, and application programmers.
3. There are mainly two difficulties: Application programs must be parallelized to use all cores fully, and software
must explicitly assign tasks to the cores, which is a very complex problem.
Virtualization on multi-core processors is achieved by distributing virtual machines (VMs) across multiple cores and
using a Virtual Machine Monitor (VMM)/Hypervisor to manage hardware sharing.
Key Points:
1. Core-Level Parallelism
● Each VM can be assigned a dedicated core or share multiple cores.
● Multiple VMs run simultaneously without blocking each other.
2. Hypervisor Scheduling
● The hypervisor schedules VMs across cores using techniques like:
○ Time-slicing
○ Load balancing
○ Core affinity
● Ensures efficient use of CPU resources.
3. Hardware-Assisted Virtualization
● Modern multi-core CPUs include virtualization extensions (Intel VT-x, AMD-V).
● These extensions:
○ Speed up VM execution
○ Reduce hypervisor overhead
○ Enable safe execution of privileged instructions
4. Isolation of VMs
● Each VM runs independently on assigned cores.
● Faults in one VM do not affect VMs running on other cores.
5. Improved Performance
● VMs benefit from true parallel execution.
● Ideal for cloud computing where multiple customers’ workloads run concurrently.
6. Support for SMP (Symmetric Multi-Processing) VMs
● Multi-core processors allow creating multi-CPU virtual machines.
● A single VM can use multiple cores to run heavy workloads.
A physical cluster is a group of physically interconnected computers (nodes) that work together as a single system.
Each node has its own hardware, OS, and network interface.
A virtual cluster is a group of virtual machines (VMs) that behave like a cluster but run on top of physical servers
using virtualization technology. Multiple VMs can be created on the same physical machine and grouped logically as a
cluster.
Scalability Limited (requires adding more physical Highly scalable (can create more VMs
hardware) instantly from a pool)
Cost High (due to dedicated physical Low cost (uses existing hardware, high
hardware cost) utilization)
Flexibility Low (static hardware configuration) Very flexible (VMs can be created,
moved, and destroyed quickly)
Deployment Time Slow (requires hardware installation and Fast (VM creation typically within
OS setup) minutes)
Fault Tolerance Hardware dependent (requires separate VM snapshots, live migration support
hardware redundancy) (software-defined failover)
Management Harder (needs physical maintenance Easier (centralized management via the
and access) hypervisor)
Steps:
Pre-Migration: Select target host and check resource availability.
Memory Pre-Copy: Copy all VM memory pages from source to target while VM runs.
Iterative Copy: Re-copy dirty (modified) pages until they become minimal.
Stop-and-Copy: Pause VM briefly and transfer remaining memory + CPU/device state.
Switchover: Resume VM on target host.
Post-Migration: Clean up resources on the source.
Importance
● Zero downtime during maintenance or upgrades.
● Load balancing — VMs moved from overloaded servers to underloaded ones.
● Fault tolerance — prevents service disruption if hardware fails.
● Energy saving — lightly loaded servers can be turned off after migrating VMs.
● Improves cloud reliability and performance.
v In a cluster built with mixed nodes of host and guest systems, the normal method of operation is to run everything on the
physical machine.
v When a VM fails, its role could be replaced by another VM on a different node, as long as they both run with the same
guest OS.
v The potential drawback is that a VM must stop playing its role if its residing host node fails.
v The migration copies the VM state file from the storage area to the host machine.
v As shown in Figure, live migration of a VM consists of the following six steps:
v This step makes preparations for the migration, including determining the migrating VM and the destination host.
v Although users could manually make a VM migrate to an appointed host, in most circumstances, the migration is
automatically started by strategies such as load balancing and server consolidation.
v Since the whole execution state of the VM is stored in memory, sending the VM’s memory to the destination node ensures
continuity of the service provided by the VM.
v All of the memory data is transferred in the first round, and then the migration controller recopies the memory data which
is changed in the last round.
v These steps keep iterating until the dirty portion of the memory is small enough to handle the final copy.
v Although precopying memory is performed iteratively, the execution of programs is not obviously interrupted.
Step 3: Suspend the VM and copy the last portion of the data.
v The migrating VM’s execution is suspended when the last round’s memory data is transferred.
v Other nonmemory data such as CPU and network states should be sent as well.
v During this step, the VM is stopped and its applications will no longer run.
v This “service unavailable” time is called the “downtime” of migration, which should be as short as possible so that it can
be negligible to users.
v Then the network connection is redirected to the new VM and the dependency to the source host is cleared.
v The whole migration process finishes by removing the original VM from the source host.
Key Considerations for Designing & Deploying Virtual Clusters in Cloud Computing
1. Resource Provisioning
● Ensure proper allocation of CPU, RAM, storage, and network.
● Support elastic scaling based on workload demand.
2. VM Placement Strategy
● Optimal placement of VMs on host machines to avoid hotspots.
● Balance performance, energy consumption, and fault tolerance.
3. Performance Isolation
● Prevent one VM’s workload from affecting others.
● Use hypervisor controls (CPU quotas, memory caps).
4. Network Configuration
● Design virtual networks (VLANs, SDN) for high throughput and low latency.
● Ensure proper routing, isolation, and bandwidth allocation.
5. Storage Management
● Use shared storage systems for VM images and migration support.
● Ensure high I/O performance and redundancy.
Serverless computing is a cloud execution model where a cloud provider manages the underlying servers,
allowing developers to build and run applications without provisioning or managing infrastructure
Benefits/Features of Serverless Computing in the Context of Virtualization
● No server management: Cloud provider handles provisioning, scaling, and maintenance.
● Auto-scaling: Functions scale instantly based on demand.
● Pay-per-use: Charges only for execution time, improving cost efficiency.
● High availability: Built-in fault tolerance and distributed execution.
● Fast deployment: Developers only upload code; infrastructure setup is removed.
● Improved resource utilization: Provider runs functions in highly optimized, multi-tenant virtualized environments.
Storage virtualization is the process of abstracting physical storage devices (HDDs/SSDs) into a single unified logical
storage pool that applications and VMs can use
.
How it Facilitates Data Management in Cloud
● Centralized management: All storage appears as one pool.
● Improved scalability: New storage can be added seamlessly.
● Better utilization: Dynamic allocation avoids unused space.
● High availability: Supports replication, snapshots, and failover.
● Simplified backup & recovery: Logical storage makes data migration easier.
● Efficient VM operations: VM migration, cloning, and snapshotting depend on
virtualized storage.
Virtualization uses a hypervisor to create full virtual machines (VMs), each with its own operating system,
making them resource-intensive but offering strong isolation and the ability to run different OSs on one
machine. Containerization, on the other hand, virtualizes the operating system to run applications in
lightweight, isolated containers that share the host OS kernel, resulting in faster startup times and lower
resource usage, but less security and the limitation that all containers on a host must use the same OS
Containerization virtualizes the operating-system level, allowing multiple isolated applications to run on the same
kernel. It enables:
● Lightweight virtualization (faster than VMs).
● Portable application deployment across environments.
● Rapid scaling of microservices.
● Isolation of applications through namespaces & cgroups.
● Consistent runtime environment (same image everywhere).
● Efficient resource utilization on cloud VMs.
Popular Containerization Tools
● Docker
● Kubernetes (orchestration)
● Podman
● LXC/LXD
● Containerd
● OpenShift
This enables cloud providers to deliver apps to Windows, Linux, MacOS, mobile devices, etc., via virtualized runtimes.
Important Points
● A CPU is virtualizable if the VM’s privileged and unprivileged instructions can run in user mode, while the VMM
runs in supervisor mode.
● When a VM executes privileged, control-sensitive, or behavior-sensitive instructions, they are trapped by the
VMM.
● The VMM acts as a unified mediator to manage and validate all CPU accesses from multiple VMs, ensuring
correctness and system stability.
● RISC-based CPUs are naturally virtualizable because all sensitive instructions are privileged.
● x86 architectures were originally not designed for virtualization, requiring techniques like binary translation or
hardware extensions (VT-x, AMD-V).
Here are the key aspects of Data Virtualization in a very short format:
Memory Impact
● Memory overcommitment: Hypervisor allocates more virtual memory than physically available, causing
swapping.
● Additional metadata: Page tables, shadow paging, and mapping structures increase memory overhead.
● NUMA effects: Poor placement of VM memory across NUMA nodes reduces throughput.
● Ballooning and compression: Techniques may introduce latency during memory reclamation.
● Interrupt handling overhead: Additional layers increase interrupt and DMA processing delays.
● Hardware-assisted I/O: SR-IOV and VT-d significantly reduce overhead by enabling direct device access.
How It Works
● VMM intercepts, queues, and schedules I/O requests.
● Uses buffering and caching to reduce latency.
● Employs device emulation or paravirtualized I/O drivers
I/O Virtualization
v I/O virtualization involves managing the routing of I/O requests between virtual devices and the shared physical
hardware.
v At the time of this writing, there are three ways to implement I/O virtualization: full device emulation, para-virtualization,
and direct I/O.
v Generally, this approach emulates well-known, real-world devices. Direct I/O virtualization lets the VM access devices
directly.
v However, current direct I/O virtualization implementations focus on networking for mainframes.
v For example, when a physical device is reclaimed (required by workload migration) for later reassignment, it may have
been set to an arbitrary state (e.g., DMA to some arbitrary memory locations) that can function incorrectly or even crash
the whole system.
v Since software-based I/O virtualization requires a very high overhead of device emulation, hardware-assisted I/O
virtualization is critical.
Impact of Virtualization on CPU, Memory, and I/O Device Performance:
CPU Impact
● Overhead from instruction trapping: Privileged instructions are intercepted by the VMM, causing additional
processing time.
● Context-switch overhead: Switching between VMs increases CPU scheduling latency.
● vCPU to pCPU mapping challenges: If many vCPUs share limited physical CPUs, performance drops.
● Improvement with hardware virtualization: Intel VT-x and AMD-V reduce CPU overhead significantly.
Memory Impact
● Memory overcommitment: Hypervisor allocates more virtual memory than physically available, causing
swapping.
● Additional metadata: Page tables, shadow paging, and mapping structures increase memory overhead.
● NUMA effects: Poor placement of VM memory across NUMA nodes reduces throughput.
● Ballooning and compression: Techniques may introduce latency during memory reclamation.
Interoperability High—works across platforms & languages Low—requires same platform or vendor
Reusability Services can be reused across applications Low reusability; logic tied to application
Cloud Suitability Ideal for cloud (IaaS/PaaS/SaaS) Not well suited for cloud environments
Service provider: The service provider is the maintainer of the service and the organization that makes available one or
more services for others to use. To advertise services, the provider can publish them in a registry, together with a
service contract that specifies the nature of the service, how to use it, the requirements for the service, and the fees
charged.
Service consumer: The service consumer can locate the service metadata in the registry and develop the required client
components to bind and use the service.
2. API Gateways
● API gateways manage access to cloud services.
● They provide:
○ Authentication & authorization
○ Rate limiting
○ Traffic management
○ API versioning
○ Service routing
● This makes SOA implementation more secure and controlled.
9. Elastic Scalability
● Cloud automatically scales services based on demand (auto-scaling).
● SOA-based services benefit from dynamic scaling and load balancing.
Architecture:
Consumer Layer : This is the top layer where end-users or applications interact with the system. The consumer can be
a mobile app, a web application, or another software that requests services. The main role of this layer is to send
requests and display results back to the user.
Example: A shopping mobile app where the customer searches for a product or makes a payment request.
Business Process Layer : This layer defines the sequence of steps or workflow to complete a business task. It does not
perform the task itself but organizes services in the right order. It ensures that multiple services work together to
achieve a complete business function.
Example: In an online shopping system → first search for a product → add it to the cart → make the payment →
confirm the order → initiate delivery.
Service Layer : The service layer contains small, independent, and reusable services. Each service performs a specific
function and can be used in multiple applications. Services here are loosely coupled, meaning they work independently
but can be combined for larger tasks.
Example: Login service for authentication, Payment service for transactions, and Delivery service for shipping details.
Integration Layer : Different services may be developed using different technologies and may use different data formats
like JSON, XML, or SOAP messages. The integration layer works like a connector that makes sure all services can talk
to each other smoothly. It handles message transformation, routing,and communication.
Example: A shopping application using a third-party payment gateway and courier service needs integration so that
they all work together without compatibility issues.
Resource Layer : This is the bottom layer where actual data and resources are stored. It includes databases, files, and
backend systems that services depend on. The resource layer provides the required data to services when requested.
Example: A product database with item details, a customer database with personal information, and a transaction
database with order/payment records.
Components of SOA :
1. Service Provider
o The one who creates and offers the service.
o Publishes service details in a registry.
o Example: Bank providing fund transfer service.
2. Service Consumer (Client)
The one who uses the service.
o
o Finds the service in the registry and then invokes it.
o Example: Mobile app using the fund transfer service.
3. Service Registry (Broker)
o A directory that stores all available services.
o Helps consumers to discover services.
o Example: UDDI Universal Description, Discovery, and Integration
4. Service Contract
o Rules and description about how to use the service.
o Includes input, output, and protocols like SOAP/REST.
o Example: WSDL Web Services Description Language.
5. Enterprise Service Bus (ESB)
o Middleware that connects different services.
o Handles message passing, routing, and transformation.
o Example: Mule ESB. Mule Enterprise Service Bus.
Interoperability refers to the ability of different cloud services, platforms, or applications to communicate, exchange
data, and work together smoothly, even if they are built using different technologies, programming languages, or
vendors.
Integration is the process of connecting multiple cloud services, applications, or systems so they can function as a
unified solution by sharing data, workflows, and business processes.
A public cloud is a cloud environment where computing resources (servers, storage, applications) are owned and
operated by a third-party provider and delivered over the internet to multiple customers on a shared infrastructure.
Examples: AWS, Microsoft Azure, Google Cloud Platform (GCP).
7. How Public Cloud Platforms Support Different Programming Languages & Frameworks
Public clouds support all major languages through:
● SDKs and APIs (Python, Java, C#, Go, [Link], PHP, Ruby)
● Managed runtime environments (Java, .NET, Python, PHP, [Link])
● Containers (Docker, Kubernetes support)
● Serverless computing (AWS Lambda, Azure Functions, GCP Cloud Functions)
● PaaS platforms (App Engine, Azure App Services, Elastic Beanstalk)
● Database drivers for all languages
● DevOps tools (CI/CD pipelines, Cloud Build, GitHub Actions, Azure DevOps)
High Availability
● Multiple availability zones (AZs) – redundant data centers.
● Data replication across zones and regions.
● Synchronous replication (within region)
● Asynchronous replication (across regions)
● Failover mechanisms for apps and databases.
● Content Delivery Networks (CDNs) for global delivery.
● Backup and disaster recovery services.
[Link] as a Service (FaaS) / ServerlessThis model is the ultimate expression of elasticity. The platform instantly
provisions and executes a containerized function in response to an event, and then scales down to zero when the
function is complete.
ex:serverless image-processing app using AWS Lambda
dynamic scaling is an automation technique that adjusts the amount of provisioned resources (like virtual
machines, CPU, or memory) in real-time based on actual, fluctuating workload demands
Importance of Elasticity in Dynamic Scaling
● Automatically adjusts resources based on real-time demand.
● Prevents performance drops during traffic spikes by adding more resources instantly.
● Avoids resource wastage by removing extra capacity when demand decreases.
● Reduces operational cost through pay-as-you-use scaling.
● Ensures high availability by keeping applications responsive under varying loads.
● Supports modern cloud models like serverless, microservices, and container orchestration.
benefits and considerations of using SAaS for enterprise application, use case:
● No installation or maintenance (runs fully in the cloud).
● Low cost — pay only for subscription, no hardware needed.
● Automatic updates & patches handled by provider.
● Accessible from anywhere via internet.
● Highly scalable — add/remove users easily.
● Improved collaboration (multi-user access, shared data).
● Fast deployment — ready to use immediately.
● Security & backup handled by provider.
● Supports multi-tenancy, reducing overall cost.
How platform as a service enable developers to build and deploy applications in cloud,example:
Platform as a Service (PaaS) provides a complete cloud-based environment for developing, testing, deploying, and
managing applications. It removes the need for developers to manage hardware, servers, operating systems, or
infrastructure.
How PaaS Enables Development & Deployment
● Provides preconfigured development environments (runtime, OS, frameworks).
● Offers built-in tools for coding, debugging, testing, and deployment.
● Supports automatic scaling of applications.
● Manages databases, storage, and networking behind the scenes.
● Enables continuous integration & continuous delivery (CI/CD) pipelines.
● Allows developers to focus only on writing code, not infrastructure.
● Ensures faster time-to-market with ready-made services and APIs.
● Supports multi-language and multi-framework development.
SOAP (Simple Object Access Protocol) :SOAP is a protocol used to exchange data between applications using XML
messages over transport protocols like HTTP or SMTP. It follows a stateless, one-way message exchange, which can
be combined to form request–response patterns.
Key Points
● Uses XML for structured messaging.
● Works over HTTP/SMTP through bindings.
● Supports reliable, platform-independent communication.
● Two versions: SOAP 1.1 and SOAP 1.2
SOAP Nodes
1. SOAP Sender – Creates and sends SOAP messages.
2. SOAP Receiver – Receives and processes messages
(may send a response or fault).
3. SOAP Intermediary – Both sender and receiver;
forwards messages after processing header blocks.
2. Document Requests:
Instead of parameters, an XML document is sent in the SOAP message body. For example, a Purchase Order service
receives an XML purchase order document and returns an XML response after processing
Advantages:
● Simple and easy to learn
● Fast and lightweight compared to SOAP
● Platform-independent
● Scalable for large systems
● Flexible for web, mobile, and cloud apps
● Cacheable for better performance
Example:
In an online bookstore, REST manages books via HTTP methods: POST to add, GET to fetch details, PUT to update,
and DELETE to remove a book, making resource management straightforward.
2. Storage Pricing:
○ Pay-as-you-go: Charged based on actual data stored.
○ Tiered Storage: Different costs for hot (frequent access), cool (infrequent), and archive (rarely accessed)
storage
○ Data Transfer Costs: Charges for data moving out of the cloud or between regions.
○ I/O Operations: Pricing based on number of read/write operations in some storage types.
3. Networking Pricing:
○ Data Transfer: Costs for outbound data to the internet or between regions
○ Bandwidth: Charges based on the network throughput used.
○ Private Connections: Costs for dedicated connections like Direct Connect.
○ Load Balancer Usage: Charges based on the number of processed requests or traffic.
Microsoft Azure:
● Cloud platform providing a broad set of cloud services for building, deploying, and managing apps.
● Services: Compute (VMs, Azure Functions), Storage (Blob Storage, Disk Storage), Databases (SQL Database,
Cosmos DB), Networking (Virtual Network, Azure DNS), AI, DevOps, Security, IoT, etc.
● Azure was introduced in 2010
● Azure App Service: Managed platform for building, deploying, and scaling web apps and APIs quickly with
integrated DevOps and CI/CD pipelines. Supports multiple languages (.NET, Java, [Link], Python).
● AWS Services (like Elastic Beanstalk, Lambda): Easy deployment, scaling, and management of apps without
managing infrastructure. Elastic Beanstalk handles provisioning and scaling; Lambda offers serverless compute.
Compute EC2 (VMs), Lambda (Serverless), Virtual Machines, Azure Functions (Serverless),
(IaaS/PaaS) ECS/EKS (Containers), Lightsail Azure Kubernetes Service (AKS), Azure App
(Simple VPS) Service
Storage S3 (Object Storage), EBS (Block), Blob Storage, Disk Storage, Azure Files, Azure
EFS (File), Glacier (Archive) Archive Storage
Database RDS (Managed SQL), Aurora Azure SQL Database, Azure Cosmos DB (NoSQL),
(Cloud-Native SQL), DynamoDB Azure Database for MySQL/PostgreSQL, Azure
(NoSQL), Redshift (Data Synapse Analytics
Warehouse)
Networking VPC, Route 53 (DNS), API Azure Virtual Network, Azure DNS, Azure API
Gateway, CloudFront (CDN) Management, Azure Content Delivery Network
AWS Lambda:
● Runs your code in response to events (like HTTP requests, file uploads, database changes).
● Automatically provisions and scales the compute resources as needed.
● You only pay for the actual compute time your code runs (no idle cost).
● Supports multiple programming languages ([Link], Python, Java, C#, Go, Ruby).
● Integrates with many AWS services (S3, DynamoDB, API Gateway, CloudWatch).
● Handles all infrastructure management: patching, scaling, fault tolerance.
Azure Functions:
● Executes small pieces of code (functions) triggered by events (HTTP requests, timers, queues).
● Automatically scales based on the workload.
● Billing is based on the number of executions and compute time.
● Supports various languages (.NET, JavaScript, Python, Java, PowerShell).
● Easily integrates with Azure services (Blob Storage, Event Grid, Cosmos DB).
● Manages infrastructure, so developers focus on code only
Common Resources
● Official docs: AWS Docs, Azure Docs
● Community forums and Q&A sites
● Paid enterprise support plans
● SDK/CLI verbose logging
● Automation scripts for monitoring and fixing issues
Examples:
● AWS: EC2 can’t access S3 → Check CloudWatch, IAM roles, network ACLs.
● Azure: Slow Web App → Use Application Insights, Azure Monitor, optimize DB.
● Cross-platform: Deployment script fails → Use CLI debug flags (--debug or --verbose).
2. User Authentication
● Authenticate users via credentials, federated identity (e.g., Google, Facebook), or enterprise SSO integrated
with IAM or Azure AD.
● Support multi-factor authentication (MFA) for added security.
4. Policy Definition
● Define fine-grained access policies specifying what resources can be accessed and what actions are allowed
(read, write, delete)
● Policies are JSON documents in AWS; in Azure, they are managed via Azure Role Definitions.
5. Token-Based Authentication
● Use temporary security tokens for applications or services to access resources securely.
● AWS uses AWS Security Token Service (STS) to provide temporary credentials.
● Azure uses OAuth 2.0 tokens issued by Azure AD for API and resource access.
6. Application Integration
● Applications call AWS or Azure APIs, passing tokens or credentials to authenticate and authorize each request.
● Use SDKs provided by AWS and Azure for simplified authentication handling.
SDKs AWS SDKs for Java, Python (Boto3), Azure SDKs for .NET, Java, Python,
JavaScript, .NET, Ruby, PHP, Go, C++ JavaScript, [Link], Go, PHP
CLI Tools AWS CLI (command-line interface) Azure CLI, Azure PowerShell
IDEs & Extensions AWS Toolkit for Visual Studio, VS Code, Azure Tools for Visual Studio, VS
JetBrains IDEs Code, JetBrains IDEs
Serverless Development AWS SAM (Serverless Application Model), AWS Azure Functions Core Tools, Azure
Lambda console, Amplify Portal, Azure Logic Apps
Container Tools AWS ECS, EKS integrations with Docker, Azure Kubernetes Service (AKS),
Kubernetes Azure Container Instances
Infrastructure as Code AWS CloudFormation, CDK (Cloud Azure Resource Manager (ARM)
Development Kit) Templates, Bicep
CI/CD Services AWS CodePipeline, CodeBuild, CodeDeploy Azure DevOps, GitHub Actions
(integrated)
Monitoring & Debugging AWS CloudWatch, X-Ray Azure Monitor, Application Insights
Security Tools IAM, AWS Secrets Manager, Cognito Azure AD, Azure Key Vault, Azure
Security Center
Mobile Development AWS Amplify, Device Farm Azure Mobile Apps, App Center
GCP:
Google Cloud Platform (GCP) began with the introduction of Google App Engine in 2008.
GCP Services & Offerings
● Compute:
● Storage:
● Databases:
● Networking:
○ Cloud CDN
● Big Data & Analytics:
○ AI Platform
○ AutoML
○ Cloud Identity
Key Features
● Global network with high-speed fiber optic backbone
○ Python
○ Java
○ [Link] / JavaScript
○ Go
○ .NET
○ Ruby
○ PHP
○ Cloud Code plugin for VS Code and JetBrains (smart IDE support)
● Infrastructure as Code:
● CI/CD:
Resource management is crucial for optimizing the performance and cost of a cloud environment.
It involves monitoring resource usage, allocating resources, and managing resource capacity.
Cloud resource management policies can be loosely grouped into five classes:
• Admission control.
• Capacity allocation.
• Load balancing.
• Energy optimization.
• Quality-of-service (QoS) guarantees.
1. Admission Control
Admission control decides whether to accept or reject incoming workloads based on available resources to prevent
system overload.
Key Features
● Prevents resource over-commitment
● Ensures QoS levels for accepted workloads
● Protects system stability
● Matches demand with available capacity
Procedure
1. Check current resource usage (CPU, RAM, I/O).
2. Compare requirements of incoming request with available capacity.
3. If resources are sufficient → Accept.
4. If insufficient or violating QoS → Reject or queue the request.
2. Capacity Allocation
Capacity allocation is the process of assigning CPU, memory, storage, and bandwidth to virtual machines or
applications.
Key Features
● Ensures fair and efficient resource distribution
● Supports dynamic scaling
● Prevents resource starvation
● Policy-driven allocation (priority, quotas, limits)
Procedure
1. Identify resource demand of each VM/application.
2. Apply allocation policies (priority, weights, quotas).
3. Distribute CPU, RAM, and storage accordingly.
4. Continuously monitor and adjust allocations based on workload changes.
3. Load Balancing
Load balancing distributes workloads across servers or VMs to prevent overload and increase performance.
Key Features
● Even workload distribution
● Improved throughput and response time
● Fault tolerance (redirects traffic on failure)
● Supports auto-scaling
Procedure
1. Monitor traffic and resource usage on servers.
2. Detect overloaded or underutilized resources.
3. Redirect requests to healthy or lightly loaded servers.
4. Continuously balance load using algorithms (Round Robin, Least Connections, etc.).
4. Energy Optimization
Energy optimization reduces power consumption in cloud data centers while maintaining performance.
Key Features
● Server consolidation (pack workloads onto fewer servers)
● Power-aware scheduling
● Turning off idle servers
● Use of low-power hardware and cooling techniques
Procedure
1. Monitor workload utilization.
2. Consolidate VMs onto fewer physical machines.
3. Turn off or put idle servers in sleep mode.
4. Use thermal and power-aware scheduling for future workloads.
Specialized Autonomic Performance Management refers to a set of self-managing controllers in cloud systems where
each controller focuses on a specific performance goal, such as:
● performance optimization,
● power/energy saving,
● load balancing,
● fault tolerance,
● QoS maintenance
In short:
Top level → allocates resources.
Bottom level → schedules and uses those resources.
1. Accurate Monitoring
Reliable metrics (CPU%, queue length, latency) prevent incorrect decisions.
2. Proper Feedback Loop Timing
● Too fast → oscillation
● Too slow → poor responsiveness
3. Clear Separation of Responsibilities
Global manager: long-term allocation
Local manager: short-term scheduling
→ minimizes conflict.
4. Workload Predictability
More predictable loads are easier to stabilize.
5. Quota, Limit, and Priority Controls
Avoids resource contention and starvation.
6. Efficient Load Balancing
Distributes loads evenly, reducing stress on local managers.
3. Role of Feedback
● Sensors provide feedback to controllers to maintain stability.
● If outputs change too much → system becomes unstable → thrashing and wasted resources.
4. Sources of Instability
1. Delay in system response to a control action.
2. Granularity issue → small control change causes large output change.
3. Oscillations → frequent, large input changes + weak control.
In dynamic thresholds, this limit is not fixed; it is updated periodically based on feedback from the system metrics.
This approach enables the system to adapt to fluctuating workloads and maintain desired performance levels.
Example:
Target CPU utilization: 70%
Current CPU utilization: 80%•
Feedback mechanism adjusts the threshold downwards to
prevent overload.
Threshold update formula (Proportional control):
Where K is the gain factor controlling how fast the threshold adapts.
Feedback Control Loop with Dynamic Thresholds
The dynamic threshold system works in a closed-loop feedback manner:
[Link] Metrics: Continuously measure system parameters like CPU load, memory, task queue length, or response
time.
[Link] Error: Compare the current value with the target (desired) value:
Update Threshold: Adjust the threshold using proportional, integral, derivative, or PID control based on the error.
[Link] Action: If the system metric crosses the updated threshold, corrective actions are taken (e.g., scaling
resources, throttling tasks, load redistribution).
[Link]: The loop continuously monitors and adapts to maintain stability and optimal performance.
Challenges in Coordination
● Maintaining stability and preventing oscillations.
● Communication overhead between managers.
● Designing effective policies and conflict-resolution rules.
● Managing heterogeneous workloads and mixed resources.
Control theory:
Control theory is the study of monitoring a system and adjusting it to reach the desired state or [Link] using
control theory the system continuously monitors CPU, RAM and network usage compares it with desired state and
adjusts task allocation or balances the load. This ensures that the system gradually and safely moves towards the goal
state, completing all tasks on time while keeping
resource stable and efficiently used.
An optimal controller manages a queueing system by using feedback and prediction. External traffic λ(k) enters the
queue and is also sent to a predictive filter, which forecasts future load and disturbances. The controller receives three
inputs:
● the forecast signal (s),
● the reference target (r),
● the current queue state q(k) from feedback.
Using these, it computes the optimal control action u*(k) to regulate how the queue processes traffic. The queueing
dynamics combine λ(k) and u*(k) to produce the output ω(k). Continuous feedback helps the controller adjust decisions
and maintain stability even when disturbances occur.
4. Load Balancing
● Distributes incoming tasks using feedback from server load.
6. Performance–Energy Tradeoff
● Balances performance needs with energy savings.
3. System Stability
● Feedback control avoids oscillations and service disruptions.
4. Better QoS
● Ensures low delay, smooth performance, and timely task completion.
Controllers measure deviation from target performance (response time, throughput, deadlines).
If QoS drops, the controller automatically:
How Control Theory Handles Varying Workload Demands and Prioritizes Tasks
1. Real-Time Monitoring of Workload
Sensors track:
● workload intensity
● number of incoming tasks
● queue length
● resource usage
This data feeds back into the controller.
2. Adaptive Allocation Based on Workload Change
If workload spikes → controller increases scheduling rate, allocates more VMs/containers.
If workload drops → controller scales down to save cost/energy.
3. Priority-Based Scheduling
Controllers use weighted feedback rules to assign:
● more resources to high-priority tasks
● limited resources to lower-priority jobs
This ensures fairness and SLA-driven priority handling.
4. Predictive Behavior
Model-based or predictive controllers can anticipate workload changes and adjust scheduling before bottlenecks occur.
Resource Bundling
Resource bundling means grouping multiple cloud resources (compute, storage, network, software) into a single
package or “bundle” that can be allocated as one unit. This helps providers manage resources efficiently and meet user
performance or SLA requirements.
Purpose
● Simplifies allocation and management.
● Improves utilization by combining underused resources.
● Helps meet user-specific performance/SLA needs.
2. Storage Resources
● Definition: Store and manage application data.
● Examples:
○ Block Storage: Fast access; suitable for databases/filesystems.
○ Object Storage: Scalable; used for media, backups, cloud-native apps.
● Purpose: Provides reliable, accessible data storage.
3. Network Resources
● Definition: Enable communication between cloud components.
● Examples:
○ Bandwidth: Speed of data transfer.
○ Virtual Network Interfaces (VNI): Allow VMs/containers to connect securely.
● Purpose: Ensures smooth data exchange within the cloud and over the internet.
4. Software Resources
● Definition: Pre-installed tools, systems, or services within the bundle.
● Examples:
○ OS packages (Linux/Windows)
○ Middleware (web servers, databases)
○ Frameworks / pre-configured apps
● Purpose: Reduces setup time and simplifies deployment with ready-to-use environments.
2. Dynamic Bundling
● Resources change automatically based on demand; taken from a shared pool.
● Example: A web app starts with 2 VMs, and when traffic increases, the cloud automatically adds more
VMs/storage.
● Pros: Efficient, adapts to workload changes.
● Cons: More complex to manage.
3. User-Defined Bundling
● Users custom-select the resources for their own bundle.
● Example: A user chooses 4 VMs, 200 GB storage, 2 Gbps network, and a pre-installed database for their app.
● Pros: Highly flexible, avoids resource wastage.
● Cons: Requires user knowledge; wrong choices may cause inefficiency.
Combinatorial Auctions
Combinatorial auctions let users bid on bundles of resources instead of single items. Cloud systems use them for fair
and efficient resource allocation.
The evolution of storage technology is the gradual improvement of data-storage methods—from punch cards to cloud
storage—aimed at increasing capacity, speed, reliability, and reducing cost.
1950s–1970s (Early Punch Cards Paper cards with punched Used in early tabulating
Mechanical & Magnetic holes; slow, bulky, very low machines
Storage) capacity
Magnetic Tape Sequential data storage; Tape reels used for archival
suitable for backup and archival storage
1970s–1990s (Magnetic Hard Disk Drive Introduced in 1956; random Hard disks used in early
Storage Era) access, large storage capacity computers
Floppy Disk Portable storage with small Used for software installation
capacity (80 KB–1.44 MB) and file transfer
1980s–2000s (Optical Compact Disc Stores around 700 MB; used for Audio CDs, software discs
Storage Era) music and software
Digital Versatile Stores about 4.7 GB; suitable Movie discs, software
Disc for movies and multimedia distribution
2000s–Present (Solid-State Flash Drive (USB Portable, durable, fast data USB pen drives
Storage) Drive) transfer
Solid-State Drive No moving parts; very high SSDs in modern laptops and
speed, low power consumption servers
2000s–Present (Network & Network Attached Storage accessible over a NAS devices in homes/offices
Cloud Storage) Storage network for file sharing
Storage Area High-speed network storage for SAN used in data centers
Network enterprises
Storage model
A storage model explains how data is stored, organized, and accessed in computer systems or databases. It decides
the structure, efficiency, and performance of data storage.
Disadvantages:
Complex to manage.
Hierarchical Model
Network Model •
Data is stored as records connected by links (pointers). •A
record can have multiple parents and children. •
More flexible than the hierarchical model. •
Example: A student can take many courses, and each course can have many
students.
Advantages: Handles complex relationships.
Disadvantages: Difficult to design and maintain.
Relational Model
• Data is stored in tables (rows and columns).
• Each table represents one entity (such as Student or
Employee).
• Relationships are created using primary and foreign keys.
• Most widely used model.
• Example: A Student table and a Course table linked through
Student_ID.
Applications (Examples)
• Weather and physics simulations in supercomputers.
• Genome analysis and scientific computing.
• Large enterprise databases and big-data analytics.
• High-performance data centers.
Reliability Features
● Master keeps metadata in an operation log for crash recovery.
● Periodic heartbeats exchange chunk lists and detect failures.
● Checksums verify data integrity.
● Garbage collection cleans deleted files lazily.
SAN:
A Storage Area Network (SAN) is a high-speed, dedicated network that connects cloud servers to a centralized pool of
storage devices. It provides block-level storage that can be accessed as if it were locally attached to the server.
Reason for Introduction
● To handle growing data storage needs in data centers and cloud environments.
● To provide high performance, scalable, and reliable storage independent of servers.
● To overcome limitations of traditional direct-attached storage (low scalability, difficult management).
Properties / Features
● High-speed data transfer using Fibre Channel or iSCSI.
● Block-level storage (appears as local disk to servers).
● Centralized management and easy scalability.
● Supports data redundancy and backup.
● Low latency and high throughput.
● Supports virtualization and cloud workloads.
Advantages
● Very fast access to large volumes of data.
● Highly scalable and easy to expand.
● Improves reliability through redundancy.
● Centralized storage simplifies management and backup.
● Enables server clustering and virtualization in cloud.
Disadvantages
● Expensive to deploy and maintain.
● Requires specialized hardware and expertise.
● Complex setup and configuration.
● Can have single-point failures if not properly designed.
● Fibre Channel networks can be costly.
Applications
● Cloud data centers and enterprise storage environments.
● Virtualized servers (VMware, Hyper-V).
● High-performance databases (Oracle, SQL Server).
● Backup, disaster recovery, and archival systems.
● Large-scale applications requiring fast I/O (ERP, analytics).
Parallel File System (PFS)
Definition
A Parallel File System is a high-performance file system that splits data across multiple storage servers and allows
many clients to read/write data simultaneously in parallel.
It is used in cloud computing, HPC, and big data environments to achieve very high throughput.
Advantages
● High throughput due to parallel read/write operations.
● Scales easily by adding more storage and compute nodes.
● Supports large files and massive datasets.
● Fault-tolerant with data replication/striping.
● Efficient for HPC, AI, big data workloads.
Disadvantages
● Complex setup and management.
● Requires high-speed interconnect (InfiniBand, high-bandwidth network).
● Expensive infrastructure.
● Application must be designed to take advantage of parallelism.
Applications
● High-Performance Computing (HPC).
● Scientific simulations and research computing.
● Big data analytics.
● Machine learning / AI training.
● Cloud computing platforms (AWS, Azure, GCP) for large-scale workloads.
Challenges in DFS
1. Data Consistency
● Ensuring all replicas of a file remain up-to-date and synchronized.
● Difficult due to concurrent access, network delays, and update conflicts.
● Requires consistency protocols like locking, versioning, or quorum mechanisms.
2. Fault Tolerance
● Nodes, disks, or networks can fail anytime.
● DFS must detect failures, recover lost data, and continue operation without stopping.
● Achieved through replication, logging, checkpointing, and self-healing mechanisms.
3. Scalability
● System must handle growing data, more clients, and more nodes.
● Metadata management becomes a bottleneck as file count increases.
● Load balancing, distributed metadata servers, and efficient indexing are required.
Benefits of Replication
● Protects against node/server failures.
● Ensures high availability and durability.
● Allows faster read operations (nearest replica serves the request).
● Supports load balancing by spreading reads across replicas.
2. DataNodes (Slaves)
● Store actual data blocks.
● Periodically send heartbeat + block reports to the NameNode.
● Handle read/write requests from clients.
3. Client
● Interacts with NameNode to get metadata and DataNode locations.
● Reads/writes data directly to DataNodes (not through NameNode), improving performance.
Read Operation
1. Client asks NameNode for block locations.
2. NameNode returns nearest DataNode.
3. Client reads block directly from that DataNode.