0% found this document useful (0 votes)

493 views182 pages

Performance Tuning Guidelines Windows Server 2012

TUNING WINDOWS

Uploaded by

Leonardo Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

493 views182 pages

Performance Tuning Guidelines Windows Server 2012

TUNING WINDOWS

Uploaded by

Leonardo Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 182

Performance Tuning Guidelines for

Windows Server 2012

April 12, 2013

Abstract

This guide describes important tuning parameters and settings that

you can adjust to improve the performance and energy efficiency of
the Windows Server 2012 operating system. It describes each setting
and its potential effect to help you make an informed decision about
its relevance to your system, workload, and performance goals.
The guide is for information technology (IT) professionals and system
administrators who need to tune the performance of a server that is
running Windows Server 2012.
For the most current version of this guide, see Performance Tuning
Guidelines for Windows Server 2012.

Disclaimer: This document is provided as-is. Information and views expressed in this document, including
URL and other Internet website references, may change without notice. Some information relates to prereleased product which may be substantially modified before its commercially released. Microsoft makes no
warranties, express or implied, with respect to the information provided here. You bear the risk of using it.
Some examples depicted herein are provided for illustration only and are fictitious. No real association or
connection is intended or should be inferred.
This document does not provide you with any legal rights to any intellectual property in any Microsoft
product. You may copy and use this document for your internal, reference purposes.
2013 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 2

Document History
Date
Change

April 12, 2013

October 12,
2012

Added note in the Performance Tuning for TPC-E

workload section that the tunings are specifically for
OLTP benchmarking and should not be perceived
as general SQL tuning guidance.
Updated Server Core Installation Option, Correct
Memory Sizing for Child Partitions, and Correct
Memory Sizing for Root Partition

Contents
Introduction........................................................................................ 7
In This Guide....................................................................................... 8
Choosing and Tuning Server Hardware...............................................9
Choosing Server Hardware: Performance Considerations...............9
Choosing Server Hardware: Power Considerations........................12
Processor Terminology...................................................................13
Power and Performance Tuning.....................................................14
Calculating Server Energy Efficiency.........................................14
Measuring System Energy Consumption...................................15
Diagnosing Energy Efficiency Issues..........................................16
Using Power Plans in Windows Server........................................16
Tuning Processor Power Management Parameters.....................17
Performance Tuning for the Networking Subsystem.........................21
Choosing a Network Adapter.........................................................22
Offload Capabilities....................................................................22
Receive-Side Scaling (RSS)........................................................22
Receive-Segment Coalescing (RSC)...........................................25
Network Adapter Resources.......................................................26
Message-Signaled Interrupts (MSI/MSI-X)..................................26
Interrupt Moderation..................................................................27
Tuning the Network Adapter..........................................................27
Enabling Offload Features..........................................................27
Increasing Network Adapter Resources.....................................28
Workload Specific Tuning...........................................................29
System Management Interrupts................................................29
Tuning TCP................................................................................. 30
Network-Related Performance Counters....................................31
Performance Tools for Network Workloads........................................33
Tuning for NTttcp...........................................................................33
TCP/IP Window Size....................................................................33
Server Performance Advisor 3.0....................................................34
Performance Tuning for the Storage Subsystem...............................35
Choosing Storage..........................................................................35
Estimating the Amount of Data to Be Stored.............................36
Choosing a Storage Solution......................................................37
Hardware Array Capabilities......................................................39
Choosing the Right Resiliency Scheme......................................43
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 3

Selecting a Stripe Unit Size........................................................49

Determining the Volume Layout................................................49
Choosing and Designing Storage Tiers......................................50
Storage Spaces............................................................................. 50
Storage Spaces Configuration Options......................................50
Deployment Elements: A New Unit of Scale...............................51
Storage-Related Parameters and Performance Counters..............53
I/O Priorities............................................................................... 53
Logical Disks and Physical Disks................................................54
Processor Information................................................................56
Power Protection and Advanced Performance Option................56
Block Alignment (DISKPART)......................................................57
Solid-State Drives......................................................................57
Trim and Unmap Capabilities.....................................................59
Response Times.........................................................................59
Queue Lengths...........................................................................60
Performance Tuning for Web Servers................................................62
Selecting the Proper Hardware for Performance...........................62
Operating System Practices..........................................................62
Tuning IIS 8.0................................................................................. 62
Kernel-Mode Tunings.....................................................................63
Cache Management Settings.....................................................63
Request and Connection Management Settings........................64
User-Mode Settings.......................................................................65
User-Mode Cache Behavior Settings..........................................65
Compression Behavior Settings.................................................66
Tuning the Default Document List..............................................67
Central Binary Logging..............................................................67
Application and Site Tunings......................................................68
Managing IIS 8.0 Modules..........................................................69
Classic ASP Settings...................................................................69
ASP.NET Concurrency Setting....................................................70
Worker Process and Recycling Options......................................71
Secure Sockets Layer Tuning Parameters..................................71
ISAPI.......................................................................................... 72
Managed Code Tuning Guidelines..............................................72
Other Issues that Affect IIS Performance.......................................72
NTFS File System Setting...............................................................73
Networking Subsystem Performance Settings for IIS....................73
Performance Tuning for File Servers.................................................74
Selecting the Proper Hardware for Performance...........................74
Server Message Block Model.........................................................74
SMB Model Overview.................................................................74
SMB Configuration Considerations................................................75
Tuning Parameters for SMB File Servers........................................75
SMB Server Tuning Example......................................................78
Services for NFS Model..................................................................78
Services for NFS Model Overview..............................................78
Tuning Parameters for NFS File Servers.....................................79
General Tuning Parameters for Client Computers.........................82
File Client Tuning Example.........................................................85
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 4

Performance Tuning for a File Server Workload (FSCT).....................87

Registry Tuning Parameters for Servers........................................87
Registry Tuning Parameters for Client Computers.........................88
Performance Counters for SMB 3.0...................................................89
Performance Tuning for File Server Workload (SPECsfs2008)............90
Registry-Tuning Parameters for NFS File Servers...........................90
Performance Tuning for Active Directory Servers..............................91
Considerations for Read-Heavy Scenarios.....................................91
Considerations for Write-Heavy Scenarios....................................92
Using Indexing to Improve Query Performance.............................92
Optimizing Trust Paths...................................................................92
Active Directory Performance Counters........................................93
Performance Tuning for Remote Desktop Session Host (Formerly
Terminal Server)...............................................................................94
Selecting the Proper Hardware for Performance...........................94
CPU Configuration......................................................................94
Processor Architecture...............................................................94
Memory Configuration...............................................................94
Disk............................................................................................ 95
Network..................................................................................... 95
Tuning Applications for Remote Desktop Session Host..................96
Remote Desktop Session Host Tuning Parameters........................96
Page file..................................................................................... 96
Antivirus and Antispyware.........................................................97
Task Scheduler...........................................................................97
Desktop Notification Icons.........................................................97
RemoteFX data compression.....................................................97
Device redirection......................................................................98
Client Experience Settings.........................................................98
Desktop Size............................................................................100
Windows System Resource Manager...........................................100
Performance Tuning for Remote Desktop Virtualization Host..........101
General Considerations...............................................................101
Storage.................................................................................... 101
Memory....................................................................................101
CPU.......................................................................................... 101
Virtual GPU..............................................................................102
RemoteFX GPU Processing Power.....................................103
Performance Optimizations.........................................................105
Dynamic Memory.....................................................................105
Tiered Storage.........................................................................106
CSV Cache............................................................................... 106
Pooled Virtual Desktops...........................................................106
Performance Tuning for Remote Desktop Gateway.........................108
Monitoring and Data Collection...................................................109
Performance Tuning Remote Desktop Services Workload for
Knowledge Workers........................................................................110
Recommended Tunings on the Server.........................................111
Monitoring and Data Collection...................................................113
Performance Tuning for Virtualization Servers................................114
Terminology................................................................................. 114
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 5

Hyper-V Architecture...................................................................115
Server Configuration...................................................................116
Hardware Selection..................................................................116
Server Core Installation Option................................................117
Dedicated Server Role.............................................................118
Guest Operating Systems........................................................118
CPU Statistics...........................................................................118
Processor Performance................................................................119
Virtual Machine Integration Services.......................................119
Enlightened Guests..................................................................119
Virtual Processors....................................................................119
Background Activity.................................................................120
Weights and Reserves..............................................................120
Tuning NUMA Node Preference.................................................121
Memory Performance..................................................................121
Enlightened Guests..................................................................121
Correct Memory Sizing for Child Partitions...............................121
Correct Memory Sizing for Root Partition.................................122
Storage I/O Performance.............................................................122
Virtual Controllers....................................................................122
Virtual Disks............................................................................. 123
Block Size Considerations........................................................125
Sector Size Implications...........................................................126
Block Fragmentation................................................................127
Pass-through Disks...................................................................128
Advanced Storage Features.....................................................128
NUMA I/O................................................................................. 129
Offloaded Data Transfer Integration.........................................129
Unmap Integration...................................................................130
Network I/O Performance............................................................130
Hyper-V-specific Network Adapter............................................130
Install Multiple Hyper-V-specific Network Adapters on
Multiprocessor virtual machines..............................................131
Offload Hardware.....................................................................131
Network Switch Topology.........................................................131
VLAN Performance...................................................................131
Dynamic VMQ..........................................................................131
MAC Spoofing Guidance...........................................................133
Single Root I/O Virtualization...................................................134
Live Migration..........................................................................134
Performance Tuning for SAP Sales and Distribution........................136
Operating System Tunings on the Server....................................136
Tunings on the Database Server.................................................137
Tunings on SAP Application Server..............................................138
Monitoring and Data Collection...................................................139
Performance Tuning for OLTP Workloads.........................................141
Server Under Test Tunings...........................................................141
SQL Server Tunings for OLTP Workloads......................................142
Disk Storage Tunings...................................................................144
TPC-E Database Size and Layout.................................................144
Client Systems Tunings...............................................................145
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 6

Monitoring and Data Collection...................................................145

Root Counters.............................................................................147
Resources....................................................................................... 149

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 7

Introduction
When you run a server system in your organization, you might have
business needs that are not met by using the default settings. For
example, you might need the lowest possible energy consumption,
or the lowest possible latency, or the maximum possible throughput
on your server. This guide describes how you can tune the server
settings in Windows Server 2012 and obtain incremental
performance or energy efficiency gains, especially when the nature
of the workload varies little over time.
To have the most impact, your tuning changes should consider the
hardware, the workload, the power budgets, and the performance
goals of your server. This guide describes important tuning
considerations and settings that can result in improved performance
or energy efficiency. This guide describes each setting and its
potential effect to help you make an informed decision about its
relevance to your system, workload, performance, and energy usage
goals.
Since the release of Windows Server 2008, customers have become
increasingly concerned about energy efficiency in the datacenter. To
address this need, Microsoft and its partners invested a large
amount of engineering resources to develop and optimize the
features, algorithms, and settings in Windows Server 2012 and
Windows Server 2008 R2 to maximize energy efficiency with minimal
effects on performance. This guide describes energy consumption
considerations for servers and provides guidelines for meeting your
energy usage goals. Although power consumption is a more
commonly used term, energy consumption is more accurate
because power is an instantaneous measurement (Energy = Power *
Time). Power companies typically charge datacenters for both the
energy consumed (megawatt-hours) and the peak power draw
required (megawatts).
Note Registry settings and tuning parameters changed significantly
from Windows Server 2003, Windows Server 2008, and Windows
Server 2008 R2 to Windows Server 2012. Be sure to use the latest
tuning guidelines to avoid unexpected results.
As always, be careful when you directly manipulate the registry. If
you must edit the registry, back it up before you make any changes.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 8

In This Guide
This guide contains key performance recommendations for the
following components:

Server Hardware

Networking Subsystem

Storage Subsystem

This guide also contains performance tuning considerations for the

following server roles:

Web Servers

File Servers

Active Directory Servers

Remote Desktop Session Host

Remote Desktop Virtualization Host

Remote Desktop Gateway

Virtualization Servers (Hyper-V)

Performance Tools for Network Workloads

SAP Sales and Distribution

TCP-E Workload

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 9

Choosing and Tuning Server Hardware

It is important to select the proper hardware to meet your expected
performance and power goals. Hardware bottlenecks limit the
effectiveness of software tuning. This section provides guidelines for
hardware to provide a good foundation for the role that a server will
play.
It is important to note that there is a tradeoff between power and
performance when choosing hardware. For example, faster
processors and more disks will yield better performance, but they
can also consume more energy.
See Choosing Server Hardware: Power Considerations later in this
guide for more details about these tradeoffs. Later sections of this
guide provide tuning guidelines that are specific to a server role and
include diagnostic techniques for isolating and identifying
performance bottlenecks for certain server roles.

Choosing Server Hardware: Performance Considerations

Table 1 lists important items that you should consider when you
choose server hardware. Following these guidelines can help remove
performance bottlenecks that might impede the servers
performance.
Table 1. Server Hardware Recommendations
Compon Recommendation
ent

Processo
rs

Choose 64-bit processors for servers. 64-bit processors

have significantly more address space, and are required
for Windows Server 2012. No 32-bit editions of the
operating system will be provided, but 32-bit applications
will run on the 64-bit Windows Server 2012 operating
system.
To increase the computing resources in a server, you can
use a processor with higher-frequency cores, or you can
increase the number of processor cores. If CPU is the
limiting resource in the system, a core with 2x frequency
typically provides a greater performance improvement
than two cores with 1x frequency. Multiple cores are not
expected to provide a perfect linear scaling, and the
scaling factor can be even less if hyperthreading is
enabled because hyperthreading relies on sharing
resources of the same physical core.
It is important to match and scale the memory and I/O
subsystem with the CPU performance and vice versa.
Do not compare CPU frequencies across manufacturers
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 10

Cache
Memory
(RAM)
and
paging
storage

Peripher
al bus

Disks

and generations of processors because the comparison

can be a misleading indicator of speed.
Choose large L2 or L3 processor caches. The larger
caches generally provide better performance, and they
often play a bigger role than raw CPU frequency.
Increase the RAM to match your memory needs.
When your computer runs low on memory and it needs
more immediately, modern operating systems use hard
disk space to supplement system RAM through a
procedure called paging. Too much paging degrades the
overall system performance.
You can optimize paging by using the following guidelines
for page file placement:

Isolate the page file on its own storage device(s), or at

least make sure it doesnt share the same storage
devices as other frequently accessed files. For
example, place the page file and operating system
files on separate physical disk drives.

Place the page file on a drive that is not fault-tolerant.

Note that, if the disk fails, a system crash is likely to
occur. If you place the page file on a fault-tolerant
drive, remember that fault-tolerant systems are often
slower to write data because they write data to
multiple locations.

Use multiple disks or a disk array if you need

additional disk bandwidth for paging. Do not place
multiple page files on different partitions of the same
physical disk drive.

In Windows Server 2012, it is highly recommended that

the primary storage and network interfaces are PCI
Express (PCIe), and that servers with PCIe buses are
chosen. Also, to avoid bus speed limitations, use PCIe x8
and higher slots for 10 Gigabit Ethernet adapters.
Choose disks with higher rotational speeds to reduce
random request service times (~2 ms on average when
you compare 7,200- and 15,000-RPM drives) and to
increase sequential request bandwidth. However, there
are cost, power, and other considerations associated with
disks that have high rotational speeds.
2.5-inch enterprise-class disks can service a significantly
larger number of random requests per second compared
to equivalent 3.5-inch drives.
Store frequently accessed data (especially sequentially
accessed data) near the beginning of a disk because
this roughly corresponds to the outermost (fastest)
tracks.
Be aware that consolidating small drives into fewer highcapacity drives can reduce overall storage performance.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 11

Fewer spindles mean reduced request service

concurrency; and therefore, potentially lower throughput
and longer response times (depending on the workload
intensity).
Table 2 lists the recommended characteristics for network and
storage adapters for high-performance servers. These settings can
help prevent your networking or storage hardware from being a
bottleneck when they are under heavy load.
Table 2. Networking and Storage Adapter Recommendations
Recommend
Description
ation

WHQL
certified
64-bit
capability

Copper and
fiber (glass)
adapters

Dual- or
quad-port
adapters

Interrupt
moderation

The adapter has passed the Windows Hardware

Quality Labs (WHQL) certification test suite.
Adapters that are 64-bit-capable can perform direct
memory access (DMA) operations to and from high
physical memory locations (greater than 4 GB). If the
driver does not support DMA greater than 4 GB, the
system double-buffers the I/O to a physical address
space of less than 4 GB.
Copper adapters generally have the same
performance as their fiber counterparts, and both
copper and fiber are available on some Fibre Channel
adapters. Certain environments are better suited to
copper adapters, whereas other environments are
better suited to fiber adapters.
Multiport adapters are useful for servers that have a
limited number of PCI slots.
To address SCSI limitations on the number of disks
that can be connected to a SCSI bus, some adapters
provide two or four SCSI buses on a single adapter
card. Fibre Channel disks generally have no limits to
the number of disks that are connected to an adapter
unless they are hidden behind a SCSI interface.
Serial Attached SCSI (SAS) and Serial ATA (SATA)
adapters also have a limited number of connections
because of the serial nature of the protocols, but you
can attach more disks by using switches.
Network adapters have this feature for load-balancing
or failover scenarios. Using two single-port network
adapters usually yields better performance than using
a single dual-port network adapter for the same
workload.
PCI bus limitation can be a major factor in limiting
performance for multiport adapters. Therefore, it is
important to consider placing them in a highperforming PCIe slot that provides enough bandwidth.
Some adapters can moderate how frequently they
interrupt the host processors to indicate activity or its
completion. Moderating interrupts can often result in
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 12

Receive Side
Scaling (RSS)
support
Offload
capability
and other
advanced
features such
as messagesignaled
interrupt
(MSI)-X
Dynamic
interrupt and
deferred
procedure
call (DPC)
redirection

reduced CPU load on the host, but unless interrupt

moderation is performed intelligently, the CPU
savings might increase latency.
RSS is a technology that enables packet receiveprocessing to scale with the number of available
computer processors. Particularly important with faster
Ethernet (10 GB or more).
Offload-capable adapters offer CPU savings that yield
improved performance. For more information, see
Choosing a Network Adapter later in this guide.

Windows Server 2012 has functionality that enables

PCIe storage adapters to dynamically redirect
interrupts and DPCs. This capability, originally called
NUMA I/O, can help any multiprocessor system by
improving workload partitioning, cache hit rates, and
on-board hardware interconnect usage for I/Ointensive workloads.

Choosing Server Hardware: Power Considerations

Although much of this guide focuses on how to obtain the best
performance from Windows Server 2012, it is also important to
recognize the increasing importance of energy efficiency in
enterprise and data center environments. High performance and lowenergy usage are often conflicting goals, but by carefully selecting
server components, you can achieve the correct balance between
them.
Table 3 contains guidelines for power characteristics and capabilities
of server hardware components.
Table 3. Server Hardware Energy Saving Recommendations
Compon
Recommendation
ent

Processo
rs

Frequency, operating voltage, cache size, and process

technology affect the energy consumption of processors.
Processors have a thermal design point (TDP) rating that
gives a basic indication of energy consumption relative to
other models. In general, opt for the lowest TDP
processor that will meet your performance goals. Also,
newer generations of processors are generally more
energy efficient, and they may expose more power states
for the Windows power management algorithms, which
enables better power management at all levels of
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 13

Memory
(RAM)

Disks

Network
and
storage
adapters
Power
supplies
Fans

USB
devices

Remotel
y
manage
d power
strips

performance. Or they may use some of the new

cooperative power management techniques that
Microsoft has developed in partnership with hardware
manufacturers.1
Memory accounts for an increasing fraction of the total
system power. Many factors affect the energy
consumption of a memory DIMM, such as memory
technology, error correction code (ECC), bus frequency,
capacity, density, and number of ranks. Therefore, it is
best to compare expected power ratings before
purchasing large quantities of memory. Low-power
memory is now available, but you must consider the
performance and cost trade-offs. If your server will be
paging, you should also factor in the energy cost of the
paging disks.
Higher RPM means increased energy consumption. Also,
2.5-inch drives generally require less power than 3.5-inch
drives. For more information about the energy costs for
different RAID configurations, see Performance Tuning for
Storage Subsystem later in this guide.
Some adapters decrease energy consumption during idle
periods. This is an important consideration for 10 Gb
networking adapters and high-bandwidth (4-8 Gb)
storage links. Such devices can consume significant
amounts of energy.
Increasing power supply efficiency is a great way to
reduce energy consumption without affecting
performance. High-efficiency power supplies can save
many kilowatt-hours per year, per server.
Fans, like power supplies, are an area where you can
reduce energy consumption without affecting system
performance. Variable-speed fans can reduce RPM as the
system load decreases, eliminating otherwise
unnecessary energy consumption.
Windows Server 2012 enables selective suspend for USB
devices by default. However, a poorly written device
driver can still disrupt system energy efficiency by a
sizeable margin. To avoid potential issues, disconnect
USB devices, disable them in the BIOS, or choose servers
that do not require USB devices.
Power strips are not an integral part of server hardware,
but they can make a large difference in the data center.
Measurements show that volume servers that are
plugged in, but have been ostensibly powered off, may
still require up to 30 watts of power. To avoid wasting
electricity, you can deploy a remotely managed power
strip for each rack of servers to programmatically
disconnect power from specific servers.

1 See Collaborative Processor Performance Control in the Advanced

Configuration and Power Interface.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 14

Processor Terminology
The processor terminology used throughout this guide reflects the
hierarchy of components available in Figure 1. Terms used from
largest to smallest granularity of components are the following:
Processor Socket
NUMA node
Core
Logical Processor

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 15

Processor
NUMA Nodes
Cores
LPs

Figure 1. Processor terminology

Power and Performance Tuning

Energy efficiency is increasingly important in enterprise and data
center environments, and it adds another set of tradeoffs to the mix
of configuration options.
Windows Server 2012 is optimized for excellent energy efficiency
with minimum performance impact across a wide range of customer
workloads. This section describes energy-efficiency tradeoffs to help
you make informed decisions if you need to adjust the default power
settings on your server. However, the majority of server hardware
and workloads should not require administrator power tuning when
running Windows Server 2012.

Calculating Server Energy Efficiency

When you tune your server for energy savings, you must also
consider performance. Tuning affects performance and power,
sometimes in disproportionate amounts. For each possible
adjustment, consider your power budget and performance goals to
determine whether the trade-off is acceptable.
You can calculate your server's energy efficiency ratio for a useful
metric that incorporates power and performance information. Energy
efficiency is the ratio of work that is done to the average power that
is required during a specified amount of time. In equation form:

Energy Efficiency=

Rate of Work Done

Average Watts Of Power Required

You can use this metric to set practical goals that respect the
tradeoff between power and performance. In contrast, a goal of 10
percent energy savings across the data center fails to capture the
corresponding effects on performance and vice versa. Similarly, if
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 16

you tune your server to increase performance by 5 percent, and that

results in 10 percent higher energy consumption, the total result
might or might not be acceptable for your business goals. The
energy efficiency metric allows for more informed decision making
than power or performance metrics alone.

Measuring System Energy Consumption

You should establish a baseline power measurement before you tune
your server for energy efficiency.
If your server has the necessary support, you can use the power
metering and budgeting features in Windows Server 2012 to view
system-level energy consumption through Performance Monitor
(Perfmon). One way to determine whether your server has support
for metering and budgeting is to review the Windows Server Catalog.
If your server model qualifies for the new Enhanced Power
Management qualification in the Windows Logo Program, it is
guaranteed to support the metering and budgeting functionality.
Another way to check for metering support is to manually look for
the counters in Performance Monitor. Open Performance Monitor,
select Add Counters, and locate the Power Meter counter group.
If named instances of power meters appear in the box labeled
Instances of Selected Object, your platform supports metering.
The Power counter that shows power in watts appears in the
selected counter group. The exact derivation of the power data value
is not specified. For example, it could be an instantaneous power
draw or an average power draw over some time interval.
If your server platform does not support metering, you can use a
physical metering device connected to the power supply input to
measure system power draw or energy consumption.
To establish a baseline, you should measure the average power
required at various system load points, from idle to 100 percent
(maximum throughput). Such a baseline generates a load line.
Figure 2 shows load lines for three sample configurations.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 17

100%
90%
Configuration 1

Configuration 2
80%
Power (% of max watts)
70%
Configuration 3

60%

Workload (% of max)

Figure 2. Sample load lines

You can use load lines to evaluate and compare the performance and
energy consumption of configurations at all load points. In this
particular example, it is easy to see what is the best configuration.
However, there can easily be scenarios where one configuration
works best for heavy workloads and one works best for light
workloads. You need to thoroughly understand your workload
requirements to choose an optimal configuration. Dont assume that
when you find a good configuration, it will always remain optimal.
You should measure system utilization and energy consumption on a
regular basis and after changes in workloads, workload levels, or
server hardware.

Diagnosing Energy Efficiency Issues

The Windows PowerCfg tool supports a command-line option that
you can use to analyze the idle energy efficiency of your server.
When you run the powercfg command with the /energy option, the
tool performs a 60-second test to detect potential energy efficiency
issues. The tool generates a simple HTML report in the current
directory. To ensure an accurate analysis, make sure that all local
applications are closed before you run the powercfg command.
Note Windows PowerCfg is not available in operating systems
earlier than Windows 7 and Windows Server 2008 R2.
Shortened timer tick rates, drivers that lack power management
support, and excessive CPU utilization are a few of the behavioral
issues that are detected by the powercfg /energy command. This
tool provides a simple way to identify and fix power management
issues, potentially resulting in significant cost savings in a large
datacenter.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 18

For more information on the powercfg /energy option, see

Resources later in this guide.

Using Power Plans in Windows Server

Windows Server 2012 has three built-in power plans designed to
meet different sets of business needs. These plans provide a simple
way for an administrator to customize a server to meet power or
performance goals. Table 4 describes the plans, lists common
scenarios in which to use each plan, and gives some implementation
details for each plan.
Table 4. Built-in Server Power Plans
Plan
Description
Common
applicable
scenarios
Balanced
Default setting.
General
(recommend Targets good
computing
ed)
energy efficiency
with minimal
performance
impact.
High
Increases
Low
Performance performance at
latency
the cost of high
application
energy
s
consumption.
Application
Power and
code that
thermal
is sensitive
limitations,
to
operating
processor
expenses, and
performan
reliability
ce
considerations
changes
apply.
Power Saver
Limits
Deployme
performance to
nts with
save energy and
limited
reduce operating
power
cost.
budgets
Thermal
constraints

Implementation
highlights
Matches capacity
to demand.
Energy-saving
features balance
power and
performance.
Processors are
always locked at
the highest
performance
state (including
turbo
frequencies). All
cores are
unparked.

Caps processor
frequency at a
percentage of
maximum (if
supported), and
enables other
energy-saving
features.

These plans exist in the Windows operating system for alternating

current (AC) and direct current (DC) powered systems, but in this
guide we assume that servers are using an AC power source.
For more information on power plans, power policies, and power
policy configurations, see Resources later in this guide.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 19

Tuning Processor Power Management Parameters

Each power plan shown in Table 4 represents a combination of
numerous underlying power management parameters. The built-in
plans are three collections of recommended settings that cover a
wide variety of workloads and scenarios. However, we recognize that
these plans will not meet every customers needs.
The following sections describe ways to tune some specific processor
power management parameters to meet goals not addressed by the
three built-in plans. If you need to understand a wider array of power
parameters, see Power Policy Configuration and Deployment in
Windows. This document provides a detailed explanation of power
plans and parameters, and it includes instructions for adjusting
parameter values by using the PowerCfg tool.
Processor Performance Boost Mode

Intel Turbo Boost and AMD Turbo CORE technologies are features that
allow processors to achieve additional performance when it is most
useful (that is, at high system loads). However, this feature increases
CPU core energy consumption, so Windows Server 2012 configures
Turbo technologies based on the power policy that is in use and the
specific processor implementation.
Turbo is enabled for High Performance power plans on all Intel and
AMD processors and it is disabled for Power Saver power plans. For
Balanced power plans on systems that rely on traditional P-statebased frequency management, Turbo is enabled by default only if
the platform supports the EPB register.
Note At the time of writing this guide, the EPB register is only
supported in Intel Westmere and later processors.
For Intel Nehalem and AMD processors, Turbo is disabled by default
on P-state-based platforms. However, if a system supports
Collaborative Processor Performance Control (CPPC), which is a new
alternative mode of performance communication between the
operating system and the hardware (defined in ACPI 5.0), Turbo may
be engaged if the Windows operating system dynamically requests
the hardware to deliver the highest possible performance levels.
To enable or disable the Turbo Boost feature, you must configure the
Processor Performance Boost Mode parameter. Processor
Performance Boost Mode has five allowable values, as shown in
Table 5. For P-state-based control, the choices are Disabled, Enabled
(Turbo is available to the hardware whenever nominal performance is
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 20

requested), and Efficient (Turbo is available only if the EPB register is

implemented). For CPPC-based control, the choices are Disabled,
Efficient Enabled (Windows specifies the exact amount of Turbo to
provide), and Aggressive (Windows asks for maximum
performance to enable Turbo). In Windows Server 2012, the default
value for Boost Mode is 3.
Table 5. Processor Performance Boost Mode parameter values
Value (Name)
P-state-based
CPPC
Behavior
Behavior

0 (Disabled)
1 (Enabled)

Disabled
Enabled

2 (Aggressive)
Enabled
3
(Efficient Efficient
Enabled)
4
(Efficient Efficient
Aggressive)

Disabled
Efficient
Enabled
Aggressive
Efficient
Enabled
Aggressive

The following commands set Processor Performance Boost Mode

to Enabled on the current power plan (specify the policy by using a
GUID alias):
Powercfg -setacvalueindex scheme_current sub_processor PERFBOOSTMODE 1
Powercfg -setactive scheme_current

Note You must run the powercfg -setactive command to enable

the new settings. You do not need to reboot the server.
To set this value for power plans other than the currently selected
plan, you can use aliases such as SCHEME_MAX (Power Saver),
SCHEME_MIN (High Performance), and SCHEME_BALANCED
(Balanced) in place of SCHEME_CURRENT. Replace scheme current
in the powercfg -setactive commands previously shown with the
desired alias to enable that power plan. For example, to adjust the
Boost Mode in the Power Saver plan and make Power Saver the
current plan, run the following commands:
Powercfg -setacvalueindex scheme_max sub_processor PERFBOOSTMODE 1
Powercfg -setactive scheme_max
Minimum and Maximum Processor Performance State

Processors change between performance states (P-states) very

quickly to match supply to demand, delivering performance where
necessary and saving energy when possible. If your server has
specific high-performance or minimum-power-consumption
requirements, you might consider configuring the Minimum

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 21

Processor Performance State parameter or the Maximum

Processor Performance State parameter.
The values for the Minimum and Maximum Processor
Performance State parameters are expressed as a percentage of
maximum processor frequency, with a value in the range 0 100.
If your server requires ultra-low latency, invariant CPU frequency, or
the highest performance levels, you might not want the processors
switching to lower-performance states. For such a server, you can
cap the minimum processor performance state at 100 percent by
using the following commands:
Powercfg -setacvalueindex scheme_current sub_processor PROCTHROTTLEMIN
100
Powercfg -setactive scheme_current

If your server requires lower energy consumption, you might want to

cap the processor performance state at a percentage of maximum.
For example, you can restrict the processor to 75 percent of its
maximum frequency by using the following commands:
Powercfg -setacvalueindex scheme_current sub_processor PROCTHROTTLEMAX
75
Powercfg -setactive scheme_current

Note Capping processor performance at a percentage of maximum

requires processor support. Check the processor documentation to
determine whether such support exists, or view the Perfmon counter
% of maximum frequency in the Processor group to see if any
frequency caps were applied.
Processor Performance Core Parking Maximum and Minimum Cores

Core parking is a feature that was introduced in Windows

Server 2008 R2. The processor power management (PPM) engine
and the scheduler work together to dynamically adjust the number
of cores that are available to run threads. The PPM engine chooses a
minimum number of cores for the threads that will be scheduled.
Cores that are chosen to park generally do not have any threads
scheduled, and they will drop into very low power states when they
are not processing interrupts, DPCs, or other strictly affinitized work.
The remaining set of unparked cores are responsible for the
remainder of the workload. Core parking can potentially increase
energy efficiency during lower usage periods on the server because
parked cores can drop into deep low-power states.
For most servers, the default core-parking behavior provides a
reasonable balance of throughput and energy efficiency. On
processors where core parking may not show as much benefit on
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 22

generic workloads, it can be disabled by default. If your server has

specific core parking requirements, you can control the number of
cores that are available to park by using the Processor
Performance Core Parking Maximum Cores parameter or the
Processor Performance Core Parking Minimum Cores
parameter in Windows Server 2012.
One scenario that core parking has difficulty with is when there are
one or more active threads affinitized to a non-trivial subset of CPUs
in a NUMA node (that is, more than 1 CPU, but less than the entire
set of CPUs on the node). When the core parking algorithm is picking
cores to unpark (assuming an increase in workload intensity occurs),
it does not know to pick the cores within the active affinitized subset
(or subsets) to unpark, and thus may end up unparking cores that
wont actually be utilized.
The values for these parameters are percentages in the range 0
100. The Processor Performance Core Parking Maximum Cores
parameter controls the maximum percentage of cores that can be
unparked (available to run threads) at any time, while the
Processor Performance Core Parking Minimum Cores
parameter controls the minimum percentage of cores that can be
unparked. To turn off core parking, set the Processor Performance
Core Parking Minimum Cores parameter to 100 percent by using
the following commands:
Powercfg -setacvalueindex scheme_current sub_processor CPMINCORES 100
Powercfg -setactive scheme_current

To reduce the number of schedulable cores to 50 percent of the

maximum count, set the Processor Performance Core Parking
Minimum Cores parameter to 50 as follows:
Powercfg -setacvalueindex scheme_current sub_processor CPMAXCORES 50
Powercfg -setactive scheme_current
Processor Performance Core Parking Utility Distribution

Utility Distribution is an algorithmic optimization in Windows Server

2012 that is designed to improve power efficiency for some
workloads. It tracks unmovable CPU activity (that is, DPCs,
interrupts, or strictly affinitized threads), and it predicts the future
work on each processor based on the assumption that any movable
work can be distributed equally across all unparked cores. Utility
Distribution is enabled by default for the Balanced power plans for
some processors. It can reduce processor power consumption by
lowering the requested CPU frequencies of workloads that are in a
reasonably steady state. However, Utility Distribution is not
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 23

necessarily a good algorithmic choice for workloads that are subject

to high activity bursts or for programs where the workload quickly
and randomly shifts across processors. For such workloads, we
recommend disabling Utility Distribution by using the following
commands:
Powercfg -setacvalueindex scheme_current sub_processor DISTRIBUTEUTIL 0
Powercfg -setactive scheme_current

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 24

Performance Tuning for the Networking Subsystem

Figure 3 shows the network architecture, which includes many
components, interfaces, and protocols. The following sections
discuss tuning guidelines for some of the components involved in
server workloads.
Figure 3. Network stack components

User-Mode Applications

WMS

System Drivers
Protocol Stack

DNS

AFD.SYS
TCP/IP

NDIS

UDP/IP

IIS
HTTP.SYS
VPN

NDIS

Network Interface

Network Driver

The network architecture is layered, and the layers can be broadly

divided into the following sections:

The network driver and Network Driver Interface Specification

(NDIS)

These are the lowest layers. NDIS exposes interfaces for the
driver below it and for the layers above it, such as TCP/IP.

The protocol stack

This implements protocols such as TCP/IP and UDP/IP. These

layers expose the transport layer interface for layers above them.

System drivers

These are typically clients that use a transport data extension

(TDX) or Winsock Kernel (WSK) interface to expose interfaces to
user-mode applications. The WSK interface was introduced in
Windows Server 2008 and Windows Vista, and it is exposed by
AFD.sys. The interface improves performance by eliminating the
switching between user mode and kernel mode.

User-mode applications

These are typically Microsoft solutions or custom applications.

Tuning for network-intensive workloads can involve each layer. The
following sections describe some tuning recommendations.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 25

Choosing a Network Adapter

Network-intensive applications require high-performance network
adapters. This section explores some considerations for choosing
network adapters.

Offload Capabilities
Offloading tasks can reduce CPU usage on the server, which
improves the overall system performance. The network stack in
Microsoft products can offload one or more tasks to a network
adapter if you choose one that has the appropriate offload
capabilities. Table 6 provides details about each offload capability.
Table 6. Offload Capabilities for Network Adapters
Offload type

Description

Checksum
calculation

The network stack can offload the calculation and

validation of Transmission Control Protocol (TCP) and
User Datagram Protocol (UDP) checksums on send
and receive code paths. It can also offload the
calculation and validation of IPv4 and IPv6
checksums on send and receive code paths.
The TCP/IP transport layer can offload the
calculation and validation of encrypted checksums
for authentication headers and Encapsulating
Security Payloads (ESPs). The TCP/IP transport layer
can also offload the encryption and decryption of
ESPs.
The TCP/IP transport layer supports Large Send
Offload v2 (LSOv2). With LSOv2, the TCP/IP
transport layer can offload the segmentation of
large TCP packets to the hardware.
RSC is the ability to group packets together to
minimize the header processing that is necessary
for the host to perform. A maximum of 64 KB of
received payload can be coalesced into a single
larger packet for processing.
Receive-side scaling (RSS) is a network driver
technology that enables the efficient distribution of
network receive processing across multiple CPUs in
multiprocessor systems.

IP security
authentication
and
encryption
Segmentation
of large TCP
packets
Receive
Segment
Coalescing
(RSC)
Receive-Side
Scaling (RSS)

Receive-Side Scaling (RSS)

Windows Server 2012, Windows Server 2008 R2, and Windows
Server 2008 support Receive Side Scaling (RSS). A server may have
multiple logical processors that share hardware resources (such as a
physical core) and are treated as Simultaneous Multi-Threading
(SMT) peers. Intel Hyper-Threading Technology is an example. RSS
directs network processing to up to one logical processor per core.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 26

For example, given a server with Intel Hyper-Threading and 4 cores

(8 logical processors), RSS will use no more than 4 logical processors
for network processing.
RSS distributes incoming network I/O packets among logical
processors so that packets that belong to the same TCP connection
are processed on the same logical processor, which preserves
ordering. RSS also load balances UDP unicast and multicast traffic
from Windows Server 2012, and it routes related flows (as
determined by hashing the source and destination addresses) to the
same logical processor, thereby preserving the order of related
arrivals. This helps improve scalability and performance for receiveintensive scenarios that have fewer network adapters than eligible
logical processors.
Windows Server 2012 provides the following ways to tune RSS
behavior:

Windows PowerShell cmdlets: Get-NetAdapterRSS, SetNetAdapterRSS, Enable-NetAdapterRss, DisableNetAdapterRss

For more information, see Network Adapter Cmdlets in Windows
PowerShell in the Windows Server Library.
These cmdlets allow you to see and modify RSS parameters per
network adapter. Pass the cmdlet name to Get-Help for details.

RSS Profiles: One of the parameters that is available is the RSS

Profile, which is used to determine which logical processors are
assigned to which network adapter. Possible profiles include:
o

Closest. Logical processor numbers near the network

adapters base RSS processor are preferred. Windows may
rebalance logical processors dynamically based on load.

ClosestStatic. Logical processor numbers near the

network adapters base RSS processor are preferred.
Windows will not rebalance logical processors dynamically
based on load.

NUMA. Logical processor numbers will tend to be selected

on different NUMA nodes to distribute the load. Windows
may rebalance logical processors dynamically based on
load.

NUMAStatic. This is the default profile. Logical

processor numbers will tend to be selected on different
NUMA nodes to distribute the load. Windows will not
rebalance logical processors dynamically based on load.

Conservative: RSS uses as few processors as possible to

sustain the load. This option helps reduce the number of
interrupts.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 27

Depending on the scenario and the workload characteristics, you can

use the following Windows PowerShell cmdlet to choose how many
logical processors can be used for RSS on a per-network adapter
basis, the starting offset for the range of logical processors, and
which node the network adapter allocates memory from:

* MaxProcessors: Sets the maximum number of RSS

processors to be used. This ensures that application traffic is
bound to a maximum number of processors on a given interface.
set-netadapterRSS Name Ethernet MaxProcessors <value>

* BaseProcessorGroup: Sets the base processor group of a

NUMA node. This impacts the processor array that is used by RSS.

set-netadapterRSS Name Ethernet BaseProcessorGroup <value>

* MaxProcessorGroup: Sets the Max processor group of a

NUMA node. This impacts the processor array that is used by RSS.
Setting this would restrict a maximum processor group so that
load balancing is aligned within a k-group.
set-netadapterRSS Name Ethernet MaxProcessorGroup <value>

* BaseProcessorNumber: Sets the base processor number

of a NUMA node. This impacts the processor array that is used by
RSS. This allows partitioning processors across network adapters.
This is the first logical processor in the range of RSS processors
that is assigned to each adapter.
set-netadapterRSS Name Ethernet BaseProcessorNumber <Byte Value>

* NumaNode: The NUMA node that each network adapter

can allocate memory from. This can be within a k-group or from
different k-groups.

set-netadapterRSS Name Ethernet NumaNodeID <value>

* NumberofReceiveQueues: If your logical processors

seem to be underutilized for receive traffic (for example, as
viewed in Task Manager), you can try increasing the number
of RSS queues from the default of 2 to the maximum that is
supported by your network adapter. Your network adapter may
have options to change the number of RSS queues as part of the
driver.
set-netadapterRSS Name Ethernet NumberOfReceiveQueues <value>

For more information, see Scalable Networking: Eliminating

the Receive Processing BottleneckIntroducing RSS.
Understanding RSS Performance

Tuning RSS requires understanding the configuration and the loadbalancing logic. To verify that the RSS settings have taken effect, the
Get-NetAdapterRss Windows PowerShell cmdlet gives better
insight.
PS C:\Users\Administrator> get-netadapterrss
Name

: testnic 2
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 28

InterfaceDescription
: Broadcom BCM5708C NetXtreme II GigE
(NDIS VBD Client) #66
Enabled
: True
NumberOfReceiveQueues
:2
Profile
: NUMAStatic
BaseProcessor: [Group:Number]
: 0:0
MaxProcessor: [Group:Number]
: 0:15
MaxProcessors
:8
IndirectionTable: [Group:Number]
:
0:0 0:4 0:0 0:4 0:0 0:4 0:0 0:4

(# indirection table entries are a power of 2 and based on #

of processors)

0:0 0:4 0:0 0:4 0:0 0:4 0:0 0:4

In addition to echoing parameters that were set, the key aspect of

the output is to understand indirection table output. The indirection
table displays the hash table buckets that are used to distribute
incoming traffic. In this example, the n:c notation designates the
Numa K-Group:CPU index pair that is used to direct incoming
traffic. We see exactly 2 unique entries (0:0 and 0:4), which
represent k-group 0/cpu0 and k-group 0/cpu 4, respectively.
We further see only one k-group for this system (k-group 0) and a n
(where n <= 128) indirection table entry. Because the number of
receive queues is set to 2, only 2 processors (0:0, 0:4) are chosen
even though maximum processors is set to 8. In effect, the
indirection table is hashing incoming traffic to only use 2 CPUs out of
the 8 that are available.
To fully utilize the CPUs, the number of RSS Receive Queues should
be equal to or greater than Max Processors. For the previous
example, the Receive Queue should be set to 8 or greater.
RSS and virtualization

RSS provides hashing and scalability to host interface only. RSS does
not provide any interaction with virtual machines, instead users can
configure VMQ in those scenarios.
RSS can be enabled for guest virtual machines in the case of SR-IOV
because the virtual function driver supports RSS capability. In this
case, the guest and the host will have the benefit of RSS. Note that
the host does not get RSS capability because the virtual switch is
enabled with SR-IOV.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 29

LBFO and RSS

RSS can be enabled on a network adapter that is teamed. In this

scenario, only the underlying physical network adapter can be
configured to use RSS. A user cannot set RSS cmdlets on the teamed
network adapter.

Receive-Segment Coalescing (RSC)

Receive Segment Coalescing (RSC) helps performance in Windows
Server 2012 by reducing the number of IP headers that are
processed for a given amount of received data. It should be used to
help scale the performance of received data by grouping (or
coalescing) the smaller packets into larger units. This approach can
affect latency with benefits mostly seen in throughput gains. RSC is
recommended to increase throughput for received heavy workloads.
Consider deploying network adapters that support RSC. On these
network adapters, ensure that RSC is on (this is the default setting),
unless you have specific workloads (for example, low latency, low
throughput networking) that show benefit from RSC being off.
In Windows Server 2012, the following Windows PowerShell cmdlets
allow you to configure RSC capable network adapters: EnableNetAdapterRsc, Disable-NetRsc, GetNetAdapterAdvancedProperty, and SetNetAdapterAdvancedProperty.
Understanding RSC diagnostics

RSC can be diagnosed through the following cmdlets.

PS C:\Users\Administrator> Get-NetAdapterRsc
Name
IPv4FailureReason

IPv4Enabled IPv6Enabled IPv4Operational IPv6Operational

IPv6Failure
Reason

-------------- ----------- --------------- ------------------------------- -----------Ethernet

NoFailure

True
NicProperties

False

True

False

The Get cmdlet shows whether RSC is enabled in the interface and if
TCP enables RSC to be in operational state. The failure reason
provides details about the failure to enable RSC on that interface.
In the previous scenario, IPv4 RSC is supported and operational in
the interface. To understand diagnostic failures, one can see the
coalesced bytes or exceptions caused. This gives an indication of
the coalescing issues.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 30

PS C:\Users\Administrator> $x = Get-NetAdapterStatistics myAdapter

PS C:\Users\Administrator> $x.rscstatistics
CoalescedBytes
:0
CoalescedPackets
:0
CoalescingEvents
:0
CoalescingExceptions : 0
RSC and virtualization

RSC is only supported in the physical host when the host network
adapter is not bound to the virtual switch. RSC is disabled by the
operating system when host is bound to the virtual switch. Also,
virtual machines do not get the benefit of RSC because virtual
network adapters do not support RSC.
RSC can be enabled for a virtual machine when SR-IOV is enabled. In
this case, virtual functions will support RSC capability; hence, virtual
machines will also get the benefit of RSC.

Network Adapter Resources

A few network adapters actively manage their resources to achieve
optimum performance. Several network adapters let the
administrator manually configure resources by using the Advanced
Networking tab for the adapter. For such adapters, you can set the
values of a number of parameters including the number of receive
buffers and send buffers.
In Windows Server 2012, configuration has been simplified by the
use of the following Windows PowerShell cmdlets:

Get-NetAdapterAdvancedProperty

SetNetAdapterAdvancedProperty

Enable-NetAdapter

Enable-NetAdapterBinding

Enable-NetAdapterChecksumOffload

Enable-NetAdapterLso

Enable-NetAdapterIPSecOffload

Enable-NetAdapterPowerManagemetn

Enable-NetAdapterQos

Enable-NetAdapterRDMA
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 31

Enable-NetAdapter

Enable-NetAdapterSriov

Message-Signaled Interrupts (MSI/MSI-X)

Network adapters that support MSI/MSI-X can target their interrupts
to specific logical processors. If the adapters also support RSS, then
a logical processor can be dedicated to servicing interrupts and
deferred procedure calls (DPCs) for a given TCP connection. This
preserves the cache locality of TCP structures and greatly improves
performance.

Interrupt Moderation
To control interrupt moderation, some network adapters expose
different interrupt moderation levels, or buffer coalescing
parameters (sometimes separately for send and receive buffers), or
both. You should consider buffer coalescing or batching when the
network adapter does not perform interrupt moderation. Interrupt
moderation helps reduce overall CPU utilization by minimizing the
per-buffer processing cost, but the moderation of interrupts and
buffer batching can have a negative impact on latency-sensitive
scenarios.
Suggested Network Adapter Features for Server Roles

Table 7 lists high-performance network adapter features that can

improve performance in terms of throughput, latency, or scalability
for some server roles.
Table 7. Benefits from Network Adapter Features for Different
Server Roles
Server role
Checksum Large Send
Receive-side Receive
offload
Offload (LSO) scaling (RSS) Segment
Coalescing
(RSC)

File server

Web server

Mail server
(short-lived
connections)

Database
server

FTP server

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 32

Media server

Disclaimer The recommendations in Table 7 are intended to serve

as guidance only for choosing the most suitable technology for
specific server roles under a predetermined traffic pattern. The
users experience can be different, depending on workload
characteristics and the hardware that is used.

Tuning the Network Adapter

You can optimize network throughput and resource usage by tuning
the network adapter, if any tuning options are exposed by the
adapter. Remember that the correct tuning settings depend on the
network adapter, the workload, the host computer resources, and
your performance goals.

Enabling Offload Features

Turning on network adapter offload features is usually beneficial.
Sometimes, however, the network adapter is not powerful enough to
handle the offload capabilities with high throughput. For example,
enabling segmentation offload can reduce the maximum sustainable
throughput on some network adapters because of limited hardware
resources. However, if the reduced throughput is not expected to be
a limitation, you should enable offload capabilities, even for such
network adapters.
Note Some network adapters require offload features to be
independently enabled for send and receive paths.
Enabling RSS for Web Scenarios

RSS can improve web scalability and performance when there are
fewer network adapters than logical processors on the server. When
all the web traffic is going through the RSS-capable network
adapters, incoming web requests from different connections can be
simultaneously processed across different CPUs. It is important to
note that due to the logic in RSS and HTTP for load distribution,
performance can be severely degraded if a non-RSS-capable network
adapter accepts web traffic on a server that has one or more RSScapable network adapters. We recommend that you use RSS-capable
network adapters or disable RSS from the Advanced Properties
tab. To determine whether a network adapter is RSS-capable, view
the RSS information on the Advanced Properties tab for the
device.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 33

RSS Profiles and RSS Queues

RSS Profiles are new in Windows Server 2012. The default profile is
NUMA Static, which changes the default behavior from previous
versions of Windows. We suggest reviewing the available profiles and
understanding when they are beneficial. If your logical processors
seem to be underutilized for receive traffic, for example, as viewed
in Task Manager, you can try increasing the number of RSS queues
from the default of 2 to the maximum that is supported by your
network adapter. Your network adapter may have options to change
the number of RSS queues as part of the driver.

Increasing Network Adapter Resources

For network adapters that allow manual configuration of resources,
such as receive and send buffers, you should increase the allocated
resources. Some network adapters set their receive buffers low to
conserve allocated memory from the host. The low value results in
dropped packets and decreased performance. Therefore, for receiveintensive scenarios, we recommend that you increase the receive
buffer value to the maximum. If the adapter does not expose manual
resource configuration, it dynamically configures the resources, or it
is set to a fixed value that cannot be changed.
Enabling Interrupt Moderation

To control interrupt moderation, some network adapters expose

different interrupt moderation levels, buffer coalescing parameters
(sometimes separately for send and receive buffers), or both. You
should consider interrupt moderation for CPU-bound workloads, and
consider the trade-off between the host CPU savings and latency
versus the increased host CPU savings because of more interrupts
and less latency. If the network adapter does not perform interrupt
moderation, but it does expose buffer coalescing, increasing the
number of coalesced buffers allows more buffers per send or receive,
which improves performance.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 34

Workload Specific Tuning

Tuning for Low Latency Packet Processing within the operating system

The network adapter has a number of options to optimize operating

system-induced latency. This is the elapsed time between the
network driver processing an incoming packet and the network
driver sending the packet back. This time is usually measured in
microseconds. For comparison, the transmission time for packet
transmissions over long distances is usually measured in
milliseconds (an order of magnitude larger). This tuning will not
reduce the time a packet spends in transit.
Some tuning suggestions for microsecond-sensitive networks
include:

Set the computer BIOS to High Performance, with C-states

disabled. However, note that this is system and BIOS
dependent, and some systems will provide higher
performance if the operating system controls power
management. You can check and adjust your power
management settings from Control Panel or by using the
powercfg command.

Set the operating system power management profile to High

Performance System. Note that this will not work properly
if the system BIOS has been set to disable operating system
control of power management.

Enable Static Offloads, for example, UDP Checksums, TCP

Checksums, and Send Large Offload (LSO)

Enable RSS if the traffic is multi-streamed, such as highvolume multicast receive

Disable the Interrupt Moderation setting for network card

drivers that require the lowest possible latency. Remember,
this can use more CPU time and it represents a tradeoff.

Handle network adapter interrupts and DPCs on a core

processor that shares CPU cache with the core that is being
used by the program (user thread) that is handling the
packet. CPU affinity tuning can be used to direct a process to
certain logical processors in conjunction with RSS
configuration to accomplish this. Using the same core for the
interrupt, DPC, and user mode thread exhibits worse
performance as load increases because the ISR, DPC, and
thread contend for the use of the core.

System Management Interrupts

Many hardware systems use System Management Interrupts (SMI)
for a variety of maintenance functions, including reporting of error
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 35

correction code (ECC) memory errors, legacy USB compatibility, fan

control, and BIOS controlled power management. The SMI is the
highest priority interrupt on the system and places the CPU in a
management mode, which preempts all other activity while it runs
an interrupt service routine, typically contained in BIOS.
Unfortunately, this can result in latency spikes of 100 microseconds
or more. If you need to achieve the lowest latency, you should
request a BIOS version from your hardware provider that reduces
SMIs to the lowest degree possible. These are frequently referred to
as low latency BIOS or SMI free BIOS. In some cases, it is not
possible for a hardware platform to eliminate SMI activity altogether
because it is used to control essential functions (for example, cooling
fans).
Note The operating system can exert no control over SMIs because
the logical processor is running in a special maintenance mode,
which prevents operating system intervention.

Tuning TCP
TCP Receive Window Auto-Tuning

Prior to Windows Server 2008, the network stack used a fixed-size

receive-side window that limited the overall potential throughput for
connections. One of the most significant changes to the TCP stack is
TCP receive window auto-tuning. You can calculate the total
throughput of a single connection when you use this fixed size
default as:
Total achievable throughput in bytes = TCP window * (1 /
connection latency)
For example, the total achievable throughput is only 51 Mbps on a
1 GB connection with 10 ms latency (a reasonable value for a large
corporate network infrastructure). With auto-tuning, however, the
receive-side window is adjustable, and it can grow to meet the
demands of the sender. It is entirely possible for a connection to
achieve a full line rate of a 1 GB connection. Network usage
scenarios that might have been limited in the past by the total
achievable throughput of TCP connections can now fully use the
network.

Windows Filtering Platform

The Windows Filtering Platform (WFP) that was introduced in
Windows Vista and Windows Server 2008 provides APIs to nonApril 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 36

Microsoft independent software vendors (ISVs) to create packet

processing filters. Examples include firewall and antivirus software.
Note A poorly written WFP filter can significantly decrease a
servers networking performance.
For more information, see Windows Filtering Platform in the Windows
Dev Center.
TCP Parameters

The following registry keywords from Windows Server 2003 are no

longer supported, and they are ignored in Windows Server 2012,
Windows Server 2008 R2, and Windows Server 2008:

TcpWindowSize

HKLM\System\CurrentControlSet\Services\Tcpip\Parameters

NumTcbTablePartitions

HKLM\system\CurrentControlSet\Services\Tcpip\Parameters

MaxHashTableSize

HKLM\system\CurrentControlSet\Services\Tcpip\Parameters

Network-Related Performance Counters

This section lists the counters that are relevant to managing network
performance.
Resource Utilization

IPv4, IPv6
o

Datagrams Received/sec

Datagrams Sent/sec

TCPv4, TCPv6
o

Segments Received/sec

Segments Sent/sec

Segments Retransmitted/sec

Network Interface(), Network Adapter()

Bytes Received/sec

Bytes Sent/sec

Packets Received/sec
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 37

Packets Sent/sec

Output Queue Length

This counter is the length of the output packet queue (in

packets). If this is longer than 2, delays occur. You should
find the bottleneck and eliminate it if you can. Because
NDIS queues the requests, this length should always be 0.

Processor Information
o

% Processor Time

Interrupts/sec

DPCs Queued/sec

This counter is an average rate at which DPCs were added

to the logical processor's DPC queue. Each logical
processor has its own DPC queue. This counter measures
the rate at which DPCs are added to the queue, not the
number of DPCs in the queue. It displays the difference
between the values that were observed in the last two
samples, divided by the duration of the sample interval.
Potential Network Problems

Network Interface(), Network Adapter()

Packets Received Discarded

Packets Received Errors

Packets Outbound Discarded

Packets Outbound Errors

WFPv4, WFPv6
o

UDPv4, UDPv6
o

Packets Discarded/sec

Datagrams Received Errors

TCPv4, TCPv6
o

Connection Failures

Connections Reset

Network QoS Policy

Packets dropped

Packets dropped/sec
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 38

Per Processor Network Interface Card Activity

Low Resource Receive Indications/sec

Low Resource Received Packets/sec

Microsoft Winsock BSP

Dropped Datagrams

Dropped Datagrams/sec

Rejected Connections

Rejected Connections/sec

Receive Side Coalescing (RSC) performance

Network Adapter(*)
o

TCP Active RSC Connections

TCP RSC Average Packet Size

TCP RSC Coalesced Packets/sec

TCP RSC Exceptions/sec

Performance Tools for Network Workloads

Tuning for NTttcp
NTttcp is a Winsock-based port of ttcp to Windows. It helps measure
network driver performance and throughput on different network
topologies and hardware setups. It provides the customer with a
multithreaded, asynchronous performance workload for measuring
an achievable data transfer rate on an existing network setup.
For more information, see How to Use NTttcp to Test Network
Performance in the Windows Dev Center.
When setting up NTttcp, consider the following:

A single thread should be sufficient for optimal throughput.

Multiple threads are required only for single-to-many clients.

Posting enough user receive buffers (by increasing the value

passed to the -a option) reduces TCP copying.

You should not excessively post user receive buffers because

the first buffers that are posted would return before you need
to use other buffers.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 39

It is best to bind each set of threads to a logical processor

(the second delimited parameter in the -m option).

Each thread creates a logical processor that connects to

(listens) a different port.

Table 8. Example Syntax for NTttcp Sender and Receiver

Syntax
Details

Example Syntax for a

Sender
NTttcps m 1,0,10.1.2.3
a2

Example Syntax for

a Receiver
NTttcpr m 1,0,10.1.2.3 a
6 fr

Single thread.
Bound to CPU 0.
Connects to a computer that uses IP
10.1.2.3.
Posts two send-overlapped buffers.
Default buffer size: 64 K.
Default number of buffers to
send: 20 K.
Single thread.
Bound to CPU 0.
Binds on local computer to IP
10.1.2.3.
Posts six receive-overlapped buffers.
Default buffer size: 64 KB.
Default number of buffers to
receive: 20 K.
Posts full-length (64 K) receive
buffers.

Note Make sure that you enable all offloading features on the
network adapter.

TCP/IP Window Size

For 1 GB adapters, the settings shown in Table 8 should provide good
throughput because NTttcp sets the default TCP window size to 64 K
through a specific logical processor option (SO_RCVBUF) for the
connection. This provides good performance on a low-latency
network. In contrast, for high-latency networks or for 10 GB
adapters, the default TCP window size value for NTttcp yields less
than optimal performance. In both cases, you must adjust the TCP
window size to allow for the larger bandwidth delay product. You can
statically set the TCP window size to a large value by using the -rb
option. This option disables TCP Window Auto-Tuning, and we
recommend using it only if the user fully understands the resultant
change in TCP/IP behavior. By default, the TCP window size is set at a
sufficient value and adjusts only under heavy load or over highlatency links.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 40

Server Performance Advisor 3.0

Microsoft Server Performance Advisor (SPA) 3.0 helps IT
administrators collect metrics to identify, compare, and diagnose
potential performance issues in a Windows Server 2012, Windows
Server 2008 R2, or Windows Server 2008 deployment. SPA generates
comprehensive diagnostic reports and charts, and it provides
recommendations to help you quickly analyze issues and develop
corrective actions.
For more information, see Server Performance Advisor 3.0 in the
Windows Dev Center.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 41

Performance Tuning for the Storage Subsystem

Decisions about how to design or configure storage software and
hardware usually consider performance. Performance is improved or
degraded as a result of trade-offs between multiple factors such as
cost, reliability, availability, power, or ease-of-use. There are many
components involved in handling storage requests as they work their
way through the storage stack to the hardware, and trade-offs are
made between such factors at each level. File cache management,
file system architecture, and volume management translate
application calls into individual storage access requests. These
requests traverse the storage driver stack and generate streams of
commands that are presented to the disk storage subsystem. The
sequence and quantity of calls and the subsequent translation can
improve or degrade performance.
Figure 4 shows the storage architecture, which includes many
components in the driver stack.
File System Drivers

NTFS

FASTFAT

Volume Snapshot and Management Drivers

VOLSNAP

VOLMGR

VOLMGRX

Partition and Class Drivers

PARTMGR CLASSPNP

Port Driver

STORPORT

DISK

SPACEPORT

Adapter Interface

Miniport Driver

Figure 4. Storage driver stack

The layered driver model in Windows sacrifices some performance

for maintainability and ease-of-use (in terms of incorporating drivers
of varying types into the stack). The following sections discuss
tuning guidelines for storage workloads.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 42

Choosing Storage
The most important considerations in choosing storage systems
include:

Understanding the characteristics of current and future

storage workloads.

Understanding that application behavior is essential for

storage subsystem planning and performance analysis.

Providing necessary storage space, bandwidth, and latency

characteristics for current and future needs.

Selecting a data layout scheme (such as striping),

redundancy architecture (such as mirroring), and backup
strategy.

Using a procedure that provides the required performance

and data recovery capabilities.

Using power guidelines; that is, calculating the expected

average power required in total and per-unit volume (such as
watts per rack).

For example, when compared to 3.5-inch disks, 2.5-inch disks

have greatly reduced power requirements, but they can also be
packed more compactly into racks or servers, which can increase
cooling requirements per rack or per server chassis.
The better you understand the workloads on a specific server or set
of servers, the more accurately you can plan. The following are some
important workload characteristics:

Read vs. Write ratio

Sequential vs. random access

Typical request sizes

Request concurrency, interarrival rates, and patterns of

request arrival rates

Estimating the Amount of Data to Be Stored

When you estimate how much data will be stored on a new server,
consider these issues:

How much data you will move to the new server from existing
servers

How much data you will store on the server in the future

A general guideline is to assume that growth will be faster in the

future than it was in the past. Investigate whether your organization
plans to hire many employees, whether any groups in your
organization are planning large projects that will require additional
storage, and so on.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 43

You must also consider how much space is used by operating system
files, applications, redundancy, log files, and other factors. Table 9
describes some factors that affect server storage capacity.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 44

Table 9. Factors that Affect Server Storage Capacity

Factor
Required storage capacity

Operating
system
files

Page file

Memory
dump

Application
s
Log files

Data layout
and
redundanc
y
Shadow
copies

At least 15 GB.
To provide space for optional components, future
service packs, and other items, plan for an additional
3 to 5 GB for the operating system volume. A
Windows Server installation can require even more
space for temporary files.
For smaller servers, 1.5 times the amount of RAM, by
default.
For servers that have hundreds of gigabytes of
memory, you might be able to eliminate the page
file; otherwise, the page file might be limited
because of space constraints (available disk
capacity). The benefit of a page file of larger than
50 GB is unclear.
Depending on the memory dump file option that you
have chosen, use an amount as large as the physical
memory plus 1 MB.
On servers that have very large amounts of memory,
full memory dumps become intractable because of
the time that is required to create, transfer, and
analyze the dump file.
Varies according to the application.
Example applications include backup and disk quota
software, database applications, and optional
components.
Varies according to the applications that create the
log file.
Some applications let you configure a maximum log
file size. You must make sure that you have enough
free space to store the log files.
Varies depending on cost, performance, reliability,
availability, and power goals.
For more information, see Choosing the Raid Level
later in this guide.
10 percent of the volume, by default, but we
recommend increasing this size based on frequency
of snapshots and rate of disk data updates.

Choosing a Storage Solution

There are many considerations in choosing a storage solution that
matches the expected workload. The range of storage solutions that
are available to enterprises is immense.
Some administrators will choose to deploy a traditional storage
array, backed by SAS or SATA hard drives and directly attached or
accessed through a separately managed Fibre Channel or iSCSI
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 45

fabric. The storage array typically manages the redundancy and

performance characteristics internally. Figure 5 illustrates some
storage deployment models that are available in Windows Server
2012.

Figure 5. Storage deployment models

Alternatively, Windows Server 2012 introduces a new technology
called Storage Spaces, which provides platform storage
virtualization. This enables customers to deploy storage solutions
that are cost-efficient, highly-available, resilient, and performant by
using commodity SAS/SATA hard drives and JBOD enclosures. For
more information, see Storage Spaces later in this guide.
Table 10 describes some of the options and considerations for a
traditional storage array solution.
Table 10. Options for Storage Array Selection
Option
Description

SAS or SATA

Hardware
RAID
capabilities
Maximum
storage
capacity
Storage
bandwidth

These serial protocols improve performance, reduce

cable length limitations, and reduce cost. SAS and
SATA drives are replacing much of the SCSI market.
In general, SATA drives are built with higher capacity
and lower cost targets than SAS drives. The
premium benefit associated with SAS is typically
attributed to performance.
For maximum performance and reliability, the
enterprise storage controllers should offer resiliency
capabilities. RAID levels 0, 1, 0+1, 5, and 6 are
described in Table 11.
Total usable storage space.
The maximum peak and sustained bandwidths at
which storage can be accessed are determined by
the number of physical disks in the array, the speed
of the controllers, the type of bus protocol (such as
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 46

SAS or SATA), the hardware-managed or softwaremanaged RAID, and the adapters that are used to
connect the storage array to the system. The more
important values are the achievable bandwidths for
the specific workloads to be run on servers that
access the storage.

Hardware Array Capabilities

Most storage solutions provide some resiliency and performanceenhancing capabilities. In particular, storage arrays may contain
varying types and capacities of caches that can serve to boost
performance by servicing reads and writes at memory speeds rather
than storage speeds. In some cases, the addition of noninterruptible
power supplies or batteries are required to keep the additional
performance from coming at a reliability cost.
A hardware-managed array is presented to the operating system as
a single drive, which can be termed a logical unit number (LUN),
virtual disk, or any number of other names for a single contiguously
addressed block storage device.
Table 11 lists some common options for the storage arrays.
Table 11. Storage Array Performance and Resiliency Options (RAID
levels)
Option
Description

Just a bunch
of disks
(JBOD)

This is not a RAID level. It provides a baseline for

measuring the performance, reliability, availability,
cost, capacity, and energy consumption of various
resiliency and performance configurations. Individual
disks are referenced separately, not as a combined
entity.
In some scenarios, a JBOD configuration actually
provides better performance than striped data layout
schemes. For example, when serving multiple lengthy
sequential streams, performance is best when a
single disk services each stream. Also, workloads that
are composed of small, random requests do not
experience performance improvements when they
are moved from a JBOD configuration to a striped
data layout.
A JBOD configuration is susceptible to static and
dynamic hot spots (frequently accessed ranges of
disk blocks) that reduce available storage bandwidth
due to the resulting load imbalance between the
physical drives.
Any physical disk failure results in data loss in a JBOD
configuration. However, the loss is limited to the
failed drives. In some scenarios, a JBOD configuration
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 47

provides a level of data isolation that can be

interpreted as offering greater reliability than striped
configurations.
Spanning

This is not a RAID level. It is the concatenation of

multiple physical disks into a single logical disk. Each
disk contains one continuous set of sequential logical
blocks. Spanning has the same performance and
reliability characteristics as a JBOD configuration.

Striping
(RAID 0)

Striping is a data layout scheme in which sequential

logical blocks of a specified size (the stripe unit) are
distributed in a circular fashion across multiple disks.
It presents a combined logical disk that stripes disk
accesses over a set of physical disks. The overall
storage load is balanced across all physical drives.
For most workloads, a striped data layout provides
better performance than a JBOD configuration if the
stripe unit is appropriately selected based on server
workload and storage hardware characteristics. The
overall storage load is balanced across all physical
drives.
This is the least expensive RAID configuration
because all of the disk capacity is available for storing
the single copy of data.
Because no capacity is allocated for redundant data,
striping does not provide data recovery mechanisms
such as those provided in the other resiliency
schemes. Also, the loss of any disk results in data loss
on a larger scale than a JBOD configuration because
the entire file system or raw volume spread across n
physical disks is disrupted; every nth block of data in
the file system is missing.

Mirroring
(RAID 1)

Mirroring is a data layout scheme in which each

logical block exists on multiple physical disks
(typically two, but sometimes three in mission-critical
environments). It presents a virtual disk that consists
of a set of two or more mirrored disks.
Mirroring often has worse bandwidth and latency for
write operations when compared to striping or JBOD.
This is because data from each write request must be
written to a pair of physical disks. Request latency is
based on the slowest of the two (or more) write
operations that are necessary to update all copies of
the updated data blocks. In more complex
implementations, write latencies may be reduced by
write logging or battery-backed write caching, or by
relaxing the requirement for dual write completions
before returning the I/O completion notification.
Mirroring has the potential to provide faster read
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 48

operations than striping because it can (with a

sufficiently intelligent controller) read from the least
busy physical disk of the mirrored pair, or the disk
that will experience the shortest mechanical
positioning delays.
Mirroring is the most expensive resiliency scheme in
terms of physical disks because half (or more) of the
disk capacity stores redundant data copies. A
mirrored array can survive the loss of any single
physical disk. In larger configurations, it can survive
multiple disk failures if the failures do not involve all
the disks of a specific mirrored disk set.
Mirroring has greater power requirements than a nonmirrored storage configuration. It doubles the number
of disks; therefore, it doubles the required amount of
idle power. Also, mirroring performs duplicate write
operations that require twice the power of nonmirrored write operations.
In the simplest implementations, mirroring is the
fastest of the resiliency schemes in terms of recovery
time after a physical disk failure. Only a single disk
(the other part of the broken mirror pair) must
participate in bringing up the replacement drive. The
second disk is typically still available to service data
requests throughout the rebuilding process. In more
complex implementations, multiple drives may
participate in the recovery phase to help spread out
the load for the duration of the rebuild.
Striped
mirroring
(RAID 0+1 or
10)

The combination of striping and mirroring is intended

to provide the performance benefits of striping and
the redundancy benefits of mirroring.
The cost and power characteristics are similar to
those of mirroring.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 49

Rotated
parity or
parity disks
(RAID 5)

An array with rotated parity (denoted as RAID 5 for

expediency) presents a logical disk that is composed
of multiple physical disks that have data striped
across the disks in sequential blocks (stripe units) in a
manner similar to simple striping (RAID 0). However,
the underlying physical disks have parity information
spread throughout the disk array, as in the example
shown in Figure 6.
For read requests, RAID 5 has characteristics that
resemble those of striping. However, small RAID 5
writes are much slower than those of other resiliency
schemes because each parity block that corresponds
to the modified data block(s) must also be updated.
This process requires three additional disk requests in
the simplest implementation, regardless of the size of
the array. Each small write requires two reads (old
data and old parity) and two writes (new data and
new parity). Because multiple physical disk requests
are generated for every logical write, bandwidth is
reduced by up to 75 percent.
RAID 5 arrays provide data recovery capabilities
because data can be reconstructed from the parity.
Such arrays can survive the loss of any one physical
disk, as opposed to mirroring, which can survive the
loss of multiple disks if the mirrored pair (or triplet) is
not lost.
RAID 5 requires additional time to recover from a lost
physical disk compared to mirroring because the data
and parity from the failed disk can be re-created only
by reading all the other disks in their entirety. In a
basic implementation, performance during the
rebuilding period is severely reduced due to the
rebuilding traffic and because the reads and writes
that target the data that was stored on the failed disk
must read all the disks (an entire stripe) to recreate the missing data. More complex
implementations incorporating multiple arrays may
take advantage of more parallelism from other disks
to help speed up recovery time.
RAID 5 is more cost efficient than mirroring because it
requires only an additional single disk per array,
instead of double (or more) the total number of disks
in an array.
Power guidelines: RAID 5 might consume more or less
energy than a mirrored configuration, depending on
the number of drives in the array, the characteristics
of the drives, and the characteristics of the workload.
RAID 5 might use less energy if it uses significantly
fewer drives. The additional disk adds to the required
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 50

amount of idle power as compared to a JBOD array,

but it requires less additional idle power versus a full
mirrored set of drives. However, RAID 5 requires four
accesses for every random write request (in the basic
implementation) to read the old data, read the old
parity, compute the new parity, write the new data,
and write the new parity.
This means that the power needed beyond idle to
perform the write operations is up to four times that
of a JBOD configuration or two times that of a
mirrored configuration.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 51

Option

Description

(RAID 5
continued)

(Depending on the workload, there may be only two

seeks, not four, that require moving the disk
actuator.) Thus, although unlikely in most
configurations, RAID 5 might have greater energy
consumption. This might happen if a heavy workload
is being serviced by a small array or an array of disks
with idle power that is significantly lower than their
active power.

Double
rotated
parity, or
double
parity disks
(RAID 6)

Traditional RAID 6 is basically RAID 5 with additional

redundancy built in. Instead of a single block of parity
per stripe of data, two blocks of redundancy are
included. The second block uses a different
redundancy code (instead of parity), which enables
data to be reconstructed after the loss of any two
disks. More complex implementations may take
advantage of algorithmic or hardware optimizations
to reduce the overhead that is associated with
maintaining the extra redundant data.
As far as power and performance, the same general
statements can be made for RAID 6 that were made
for RAID 5, but to a larger magnitude.

Rotated redundancy schemes (such as RAID 5 and RAID 6) are the

most difficult to understand and plan for. Figure 6 shows a RAID 5
example, where the sequence of logical blocks presented to the host
is A0, B0, C0, D0, A1, B1, C1, E1, and so on.

Figure 6. RAID 5 overview

Choosing the Right Resiliency Scheme

Each RAID level involves a trade-off between the following factors:

Performance

Reliability

Availability

Cost
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 52

Capacity

Power

To determine the best array configuration for your servers, evaluate

the Read and Write loads of all data types and then decide how
much you can spend to achieve the performance, availability, and
reliability that your organization requires. Table 12 describes
common configurations and their relative performance, reliability,
availability, cost, capacity, and energy consumption.
Table 12. RAID Trade-Offs
Configurat
ion

Performanc
e

JBOD

Striping
(RAID 0)

Requireme
nts:

Two-disk
minimum

Reliability

Availability

Cost,
capacity,
and power
Pros:
Pros:
Pros:
Pros:
Concurre
Data

Single

Minimum
nt
isolation;
loss does
cost
sequential
single loss
not prevent
Minimum
streams to
affects one
access to
power
separate
disk
other disks
disks
Cons:
Cons:
Data loss
Susceptib
after one
ility to load
failure
imbalance
Pros:
Pros:
Balanced

Minimum
load
cost
Potential

Minimum
for better
power
response
times,
throughput,
and
concurrency
Cons:
Cons:

Data loss
Single
Cons:
after one
loss
Difficu
failure
prevents
lt stripe unit
access to
Single
size choice
entire array
loss affects
the entire
array

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 53

Configurat
ion
Mirroring
(RAID 1)

Performanc
e

Pros:
Two
data
sources for
every read
request (up
to 100%
performanc
e
improveme
nt)

Requireme
nts:

Two-disk
minimum

Cons:

Reliability

Availability

Pros:

Single
loss and
often
multiple
losses (in
large
configuratio
ns) are
survivable

Cost,
capacity,
and power

Single
loss and
often
multiple
losses (in
large
configuratio
ns) do not
Cons:
prevent

Twice
access
the cost of
RAID 0 or
JBOD

Up to
twice the
power

Writes
must
update all
mirrors
(simplest
implementa
-tion)
Striped
Pros:
Pros:
Pros:
mirroring

Two
Single loss Single loss
(RAID 0+1
data
and often
and often
or 10)
sources for
multiple
multiple
every read
losses (in
losses (in
Requireme
request (up
large
large
nts:
to 100%
configuratio
configuratio

Fourperformanc
ns) are
ns) do not
disk
e
survivable
prevent
minimum
improveme
access
nt)

Balan
ced load

Potent
ial for
better
response
Cons:
times,

Twice
throughput,
the
cost
of
and
RAID
0
or
concurrency
JBOD
Up to twice
Cons:
the power

Writes
must
update
mirrors

Difficu
lt stripe unit
size choice

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 54

Configurat
ion

Performanc
e

Rotated
Parity or

Parity Disks
(RAID 5)

Requireme
nts:

One
additional
disk
required.

Thre
e-disk
minimum

Reliability

Availability

Cost,
capacity,
and power
Pros:
Pros:
Pros:
Pros:
Balan
Single
Single
Only
ced load
loss
loss does
one more
survivable;
not prevent
disk to
Potent
active write
access
power
ial for
requests
better read
might still
response
become
times,
corrupted
throughput,
and
concurrency
Cons:
Cons:

Multip Cons:
Cons:
Multip

Up to
le losses
Up to
le losses
prevent
four times
75% write
affect entire
access to
the power
performanc
array
entire array
for write
e reduction
requests
After a
To
because of
(excluding
Readsingle loss,
speed
the idle
Modifyarray is
reconstructi
power)
Write
vulnerable
on,
until
application
Decre
reconstruct
access
ased read
ed
might be
performanc
slowed or
e in failure
stopped
mode
All
sectors
must be
read for
reconstructi
on;
potential
major
slowdown
Dange
r of data in
invalid state
after power
loss and
recovery if
not
carefully
implemente
d

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 55

Configurat
ion

Performanc
e

Mutiple
Rotated

parity or
double

parity disks
(RAID 6)
Requireme
nts:

Two
additional
disks
required

Five-
disk
minimum

Reliability

Availability

Cost,
capacity,
and power
Pros:
Pros:
Pros:
Pros:
Balan
Single
Single
Only
ced load
loss
loss does
two more
survivable;
not prevent
disks to
Potent
active write
access
power
ial for
requests
better read
might still
response
be
times,
corrupted
throughput,
and
concurrency
Cons:
Cons:
Cons:

More
Cons:
More

Up to
than two
Up to
than two
losses
six times
83% write
losses
prevent
the power
performanc
affect entire
access to
for write
e reduction
array
entire array
requests
because of
(excluding
After two
To speed
multiple
the idle
RMW
losses, an
reconstructi
power)
array is
on,
Decre
vulnerable
application
ased read
until
access
performanc
reconstruct
might be
e in failure
ed
slowed or
mode
stopped
All
sectors
must be
read for
reconstructi
on;
potential for
major
slowdown
Danger of
data in
invalid state
after power
loss and
recovery if
not
carefully
implemente
d

The following are sample uses for various RAID levels:

JBOD configuration: Concurrent video streaming

Striping (RAID 0): Temporary or reconstructable data,

workloads that can develop hot spots in the data, and
workloads with high degrees of unrelated concurrency
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 56

Mirroring (RAID 1): Database logs, critical data, and

concurrent sequential streams

Striped mirroring (RAID 0+1): A general purpose combination

of performance and reliability for critical data, workloads with
hot spots, and high-concurrency workloads

Rotated parity or parity disks (RAID 5): Web pages,

semicritical data, workloads without small writes, scenarios in
which capital and operating costs are an overriding factor,
and read-dominated workloads

Multiple rotated parity or double parity disks (RAID 6): Data

mining, critical data (assuming quick replacement or hot
spares), workloads without small writes, scenarios in which
cost or power is a major factor, and read-dominated
workloads. RAID 6 might also be appropriate for massive
datasets, where the cost of mirroring is high and double-disk
failure is a real concern (due to the time required to complete
an array parity rebuild for disk drives greater than 1 TB).

If you use more than two disks, striped mirroring is usually a better
solution than only mirroring.
To determine the number of physical disks that you should include in
an array, consider the following information:

Bandwidth (and often response time) improves as you add

disks.

Reliability (in terms of mean time to failure for the array)

decreases as you add disks.

Usable storage capacity increases as you add disks, but so

does cost.

For striped arrays, the trade-off is between data isolation

(small arrays) and better load balancing (large arrays). For
mirrored arrays, the trade-off is between better cost per
capacity (for basic mirrors, which is a depth of two physical
disks) and the ability to withstand multiple disk failures (for
depths of three or four physical disks). Read and Write
performance issues can also affect mirrored array size. For
arrays with rotated parity (RAID 5), the trade-off is between
better data isolation and mean time between failures (MTBF)
for small arrays, versus better cost, capacity, and power for
large arrays.

Because hard disk failures are not independent, array sizes

must be limited when the array is made up of actual physical
disks (that is, a bottom-tier array). The exact amount of this
limit is very difficult to determine.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 57

The following is the array size guideline with no available hardware

reliability data:

Bottom-tier RAID 5 arrays should not extend beyond a single

desk-side storage tower or a single row in a rack-mount
configuration. This means approximately 8 to 14 physical
disks for 3.5-inch storage enclosures. Smaller 2.5-inch disks
can be racked more densely; therefore, they might require
being divided into multiple arrays per enclosure.

Bottom-tier mirrored arrays should not extend beyond two

towers or rack-mount rows, with data being mirrored between
towers or rows when possible. These guidelines help avoid or
reduce the decrease in time between catastrophic failures
that is caused by using multiple buses, power supplies, and
so on from separate storage enclosures.

Selecting a Stripe Unit Size

Hardware-managed arrays allow stripe unit sizes ranging from 4 KB
to more than 1 MB. The ideal stripe unit size maximizes the disk
activity without unnecessarily breaking up requests by requiring
multiple disks to service a single request. For example, consider the
following:

One long stream of sequential requests on a JBOD

configuration uses only one disk at a time. To keep all striped
disks in use for such a workload, the stripe unit should be at
least 1/n where n is the request size.

For n streams of small serialized random requests, if n is

significantly greater than the number of disks and if there are
no hot spots, striping does not increase performance over a
JBOD configuration. However, if hot spots exist, the stripe unit
size must maximize the possibility that a request will not be
split while it minimizes the possibility of a hot spot falling
entirely within one or two stripe units. You might choose a low
multiple of the typical request size, such as five times or ten
times, especially if the requests are aligned on some
boundary (for example, 4 KB or 8 KB).

If requests are large, and the average or peak number of

outstanding requests is smaller than the number of disks, you
might need to split some requests across disks so that all
disks are being used. You can interpolate an appropriate
stripe unit size from the previous two examples. For example,
if you have 10 disks and 5 streams of requests, split each
request in half (that is, use a stripe unit size equal to half the
request size). Note that this assumes some consistency in
alignment between the request boundaries and the stripe
unit boundaries.

Optimal stripe unit size increases with concurrency and

typical request sizes.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 58

Optimal stripe unit size decreases with sequentiality and with

good alignment between data boundaries and stripe unit
boundaries.

Determining the Volume Layout

Placing individual workloads into separate volumes has advantages.
For example, you can use one volume for the operating system or
paging space and one or more volumes for shared user data,
applications, and log files. The benefits include fault isolation, easier
capacity planning, and easier performance analysis.
You can place different types of workloads into separate volumes on
different physical disks. Using separate disks is especially important
for any workload that creates heavy sequential loads (such as log
files), where a single set of physical disks can be dedicated to
handling the updates to the log files. Placing the page file on a
separate virtual disk might provide some improvements in
performance during periods of high paging.
There is also an advantage to combining workloads on the same
physical disks, if the disks do not experience high activity over the
same time period. This is basically the partnering of hot data with
cold data on the same physical drives.
The first partition on a volume that is utilizing hard disks usually
uses the outermost tracks of the underlying disks, and therefore it
provides better performance. Obviously, this guidance does not
apply to solid-state storage.

Choosing and Designing Storage Tiers

With the cost of solid state devices dropping, it is important to
consider including multiple tiers of devices into a storage
deployment to achieve better balance between performance, cost,
and energy consumption. Traditional storage arrays offer the ability
to aggregate and tier heterogenous storage, but Storage Spaces
provides a more robust implementation.

Storage Spaces
Windows Server 2012 introduces a new technology called Storage
Spaces, which provides flexible configuration options and supports a
range of hardware choices, while providing the user with similar
sophisticated storage features that were previously available only
with more traditional storage solutions.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 59

Figure 7. Storage Spaces deployment model

Storage Spaces Configuration Options

Storage Spaces provides multiple configuration options that can
enhance performance when a storage space is created. These
include the stripe unit size and number of columns to utilize in a
storage space.
Storage Spaces allows a stripe unit size of 16 KB to 16 MB, with the
default being 256 KB. Storage Spaces also provides tiering support.
It gives administrators control over data placement or tiering at the
Space granularity. If a workload requires high-performance storage,
the space can be allocated from solid-state drives or 15 K RPM disks.
Similarly, if the workload will have less intensive access (such as
archiving), the space can be allocated from near-line disks. This level
of tiering allows organizations to easily lower costs by only
purchasing hardware that is necessary for the workload.
A storage space can also be configured at creation time with the
number of columns that constitute the space. The number of
columns corresponds to the number of physical disks among which
to stripe data. For a mirrored storage space, the number of physical
disks to use is equal to the number of copies (2 or 3) multiplied by
the number of columns specified. For example, 4 columns on a twoway mirror will utilize 8 physical disks. By default, Storage Spaces
will try to use 4 columns per copy in a two-way mirrored space if
sufficient disks are available, but it will reduce that number as
necessary. The number of columns can range from 1 column to 128
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 60

per copy. The following Windows PowerShell script can be used to

configure stripe size and number of columns when creating a storage
space:
New-Virtualdisk Interleave (XKB) NumberofColumns Y

Figure 8. Tiered Storage Spaces on different types of media

Deployment Elements: A New Unit of Scale

Storage Spaces also enables a new unit of scale in Windows Server
2012, called Deployment Elements. Combining the virtualization
capabilities of Storage Spaces, failover clustering, and cluster shared
volumes (CSV), enterprises can deploy a resilient, performant, and
elastic solution that can scale easily and quickly by using simple,
cost-effective building blocks.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 61

Figure 9. Deployment elements

Deployment Elements consist of 2 to 4 servers attached to shared
SAS JBOD configurations. By connecting each deployment element
together using a high-speed, low-latency network, Deployment
Elements can be combined into a CSV cluster consisting of up to 64
servers, supporting up to 4,000 virtual machines.

Figure 10. Components and configurations for deployment

elements
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 62

Storage Spaces provides similar types of resiliency options as

standalone array solutions, as described in the following table.
Option
Striped spaces

Description
Striping is a data layout scheme in which sequential
logical blocks of a specified size (the stripe unit) are laid
out in a circular fashion across multiple disks. It presents
a combined logical disk that stripes access over a set of
physical disks. The overall storage load is balanced
across all physical drives.

Mirrored spaces

Mirroring is a data layout scheme in which each logical

block exists on multiple physical disks. It presents a
logical virtual disk that consists of a set of two or more
mirrored disks. At a minimum, mirrored spaces can be
configured to be resilient to at least one (two-way mirror)
or two (three-way mirror) concurrent physical disk
failures. A mirrored space may survive the loss of even
more disks if all copies of a stripe unit are not
simultaneously lost.

Parity spaces

Parity spaces present a logical disk that is composed of

multiple physical disks that have data striped across the
disks in stripe units. However, the underlying physical
disks have parity information spread throughout the disk
array.
Parity spaces tends to have lower write performance than
mirrored spaces, because each parity block that
corresponds to the modified data block must also be
updated. This process requires additional disk requests;
however, parity spaces tend to be more space-efficient
than mirrored spaces.
Parity spaces provide data recovery capabilities because
data can be reconstructed from the parity. Parity spaces
can survive the loss of any one physical disk. Parity is
more cost efficient than mirroring because it requires only
one additional disk per virtual disk, instead of double (or
triple) the total number of disks in an array as with
mirroring.

Storage-Related Parameters and Performance Counters

This section describes performance counters that you can use for
workload characterization and capacity planning and to identify
potential bottlenecks.

I/O Priorities
Windows Server 2012, Windows Server 2008 R2, and Windows
Server 2008 can specify an internal priority level on individual I/Os.
The Windows Server operating system primarily uses this ability to
lower the priority of background I/O activity and to give precedence
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 63

to response-sensitive I/Os (for example, multimedia). However,

extensions to file system APIs let applications specify I/O priorities
per handle. The storage stack logic to sort out and manage I/O
priorities has overhead, so if some disks will be targeted by only a
single priority of I/Os (such as SQL Server database disks), you can
improve performance by disabling the I/O priority management for
those disks by setting the following registry entry to zero:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\DeviceClasses
\{Device GUID}\DeviceParameters\Classpnp\IdlePrioritySupported

Logical Disks and Physical Disks

On servers that have heavy I/O workloads, you should enable the
disk counters on a sampling basis or in specific scenarios to diagnose
storage-related performance issues. Continuously monitoring disk
counters can incur up to a few percent CPU overhead penalty.
The same counters are valuable in the logical disk and the physical
disk counter objects. Logical disk statistics are tracked at the volume
level, and physical disk statistics are tracked by the partition
manager.
The following counters are exposed through volume and partition
managers:

% Idle Time

This counter is of little value when multiple physical drives are

behind logical disks. Imagine a subsystem of 100 physical drives
presented to the operating system as five disks, each backed by
a 20-disk RAID 0+1 array. Now imagine that the administrator
spans the five disks to create one logical disk (volume x). One
can assume that any serious system that needs that many
physical disks has at least one outstanding request to volume x
at any given time. This makes the volume appear to be 0% idle,
when in fact the 100-disk array could be up to 99% idle with only
a single request outstanding.

% Disk Time, % Disk Read Time, % Disk Write Time

The % Disk Time counter is nothing more than the Avg. Disk
Queue Length counter multiplied by 100. It is the same value
displayed in a different scale.
If the Avg. Disk Queue Length is equal to 1, the % Disk Time will
equal 100. If the Avg. Disk Queue Length is 0.37, then the % Disk
Time will be 37. So if the Avg. Disk Queue length value is greater
than 1, the % Disk Time will be greater than 100%. The same
logic applies to the % Disk Read Time and % Disk Write Time
counters. Their data comes from the Avg. Disk Read Queue
Length and Avg. Disk Write Queue Length counters, respectively.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 64

Average Disk Bytes / { Read | Write | Transfer }

This counter collects average, minimum, and maximum request

sizes. If possible, you should observe individual or sub-workloads
separately. You cannot differentiate multimodal distributions by
using average values if the request types are consistently
interspersed.

Average Disk Queue Length, Average Disk { Read |

Write } Queue Length

These counters collect concurrency data, including peak loads

and workloads that contain significant bursts. These counters
represent the number of requests that are active below the driver
that takes the statistics. This means that the requests are not
necessarily queued; they could actually be in service or
completed and on the way back up the path. Possible active
locations include the following:

Waiting in an ATA port queue or a Storport queue

Waiting in a queue in a miniport driver

Waiting in a disk controller queue

Waiting in an array controller queue

Waiting in a hard disk queue (that is, on board a physical

disk)

Actively receiving service from a physical disk

Completed, but not yet back up the stack to where the

statistics are collected

It is important to note that these values are not strictly accurate;

rather, they are derived values. Avg. Disk Queue Length is equal
to (Disk Transfers/sec) * (Disk sec/Transfer). This is based on
Littles Law from the mathematical theory of queues. The same
derivation is performed for the Read and Write versions of this
counter. The main concern for interpreting these values is that
they make the assumption that the number of outstanding
requests is the same at the start and the end of each sampling
interval.
For guidelines, see Queue Lengths later in this guide.

Average Disk second / {Read | Write | Transfer}

These counters collect disk request response time data and

possibly extrapolate service time data. They are probably the
most straightforward indicators of storage subsystem
bottlenecks. If possible, you should observe individual or subworkloads separately. You cannot differentiate multimodal
distributions by using Perfmon if the requests are consistently
interspersed.
For guidelines, see Response Times later in this guide.

Current Disk Queue Length

April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 65

This counter instantly measures the number of active requests;

therefore, it is subject to extreme variance. This counter is of
limited use except to check for the existence of many short
bursts of activity or to validate specific instances of the Average
Disk Queue Length counter values. (As described earlier, these
values are derived rather than measured, and they rely on the
number of outstanding requests being equal at the start and end
of each sampling interval.)

Disk Bytes / second, Disk {Read | Write } Bytes /

second

This counter collects throughput data. If the sample time is long

enough, a histogram of the arrays response to specific loads
(queues, request sizes, and so on) can be analyzed. If possible,
you should observe individual or subworkloads separately.

Disk {Reads | Writes | Transfers } / second

This counter collects throughput data. If the sample time is long

enough, a histogram of the arrays response to specific loads
(queues, request sizes, and so on) can be analyzed. If possible,
you should observe individual or subworkloads separately.

Split I/O / second

This counter measures the rate of high-level I/Os split into

multiple low-level I/Os due to file fragmentation. It is useful only if
the value is not statistically significant in comparison to the disk
I/O rate. If it becomes significant, in terms of split I/Os per second
per physical disk, further investigation could be needed to
determine the size of the original requests that are being split
and the workload that is generating them.
Note If the standard stacked drivers scheme in Windows is
circumvented for a controller, monolithic drivers can assume
the role of partition manager or volume manager. If so, the
monolithic driver writer must supply the counters listed earlier
through the Windows Management Instrumentation (WMI)
interface, or the counters will not be available.

Processor Information

% DPC Time, % Interrupt Time, % Privileged Time

If the interrupt time and deferred procedure call (DPC) time are a
large part of privileged time, the kernel is spending a long time
processing I/Os. Sometimes, it is best to keep interrupts and DPCs
affinitized to only a few CPUs on a multiprocessor system to
improve cache locality. At other times, it is best to distribute the
interrupts and DPCs among many CPUs to prevent the interrupt
and DPC activity from becoming a bottleneck on individual CPUs.

DPCs Queued / second

April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 66

This counter is another measurement of how DPCs are using CPU

time and kernel resources.

Interrupts / second

This counter is another measurement of how interrupts are using

CPU time and kernel resources. Modern disk controllers often
combine or coalesce interrupts so that a single interrupt
processes multiple I/O completions. Of course, there is a trade-off
between delaying interrupts (and therefore completions) and
amortizing CPU processing time.

Power Protection and Advanced Performance Option

The following two performance-related options for every disk are
located under Disk > Properties > Policies:

Enable write caching

Enable an advanced performance mode that assumes the

storage is protected against power failures

Enable write caching means that the storage hardware can indicate
to the operating system that a write request is complete, even
though the data has not been flushed from the volatile intermediate
hardware cache(s) to its final nonvolatile storage location. With this
action, a period of time passes during which a power failure or other
catastrophic event could result in data loss. However, this period is
typically fairly short because write caches in the storage hardware
are usually flushed during any period of idle activity. Cache flushes
are also requested frequently by the operating system, NTFS, or
some applications, to explicitly force writes to be written to the final
storage medium in a specific order. Alternately, hardware time-outs
at the cache level might force dirty data out of the caches.
Other than cache flush requests, the only means of synchronizing
writes is to tag them as write-through. Storage hardware is
supposed to guarantee that write-through request data has reached
nonvolatile storage (such as magnetic media on a disk platter)
before it indicates a successful request completion to the operating
system. Some commodity disks or disk controllers may not honor
write-through semantics. In particular, SATA and USB storage
components may not support the ForceUnitAccess flag that is used
to tag write-through requests in the hardware. Enterprise storage
subsystems typically use battery-backed write caching or use
SAS/SCSI/FC hardware to correctly maintain write-through semantics.
In Windows Server 2012, NTFS exclusively uses cache flushes to
protect its metadata.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 67

The advanced performance disk policy option is available only

when write caching is enabled. This option strips all write-through
flags from disk requests and removes all flush-cache commands from
the request stream. If you have power protection (such as an
uninterruptible power supply, or UPS) for all hardware write caches
along the I/O path, you do not need to worry about write-through
flags and cache flushes. By definition, any dirty data that resides in a
power-protected write cache is safe, and it appears to have occurred
in-order to the software. If power is lost to the final storage
location while the data is being flushed from a write cache, the
cache manager can retry the write operation after power has been
restored to the relevant storage components.

Block Alignment (DISKPART)

NTFS aligns its metadata and data clusters to partition boundaries in
increments of the cluster size (which is selected during file system
creation or set by default to 4 KB). In releases of the Windows Server
operating system earlier than Windows Server 2008, the partition
boundary offset for a specific disk partition could be misaligned
when it was compared to array disk stripe unit boundaries. This
caused small requests to be unintentionally split across multiple
disks. To force alignment, you were required to use diskpar.exe or
DiskPart.exe at the time the partition was created.
In Windows Server 2012, Windows Server 2008 R2, and Windows
Server 2008, partitions are created by default with a 1 MB offset,
which provides good alignment for the power-of-two stripe unit sizes
that are typically found in hardware. If the stripe unit size is set to a
size that is greater than 1 MB, the alignment issue is much less of an
issue because small requests rarely cross large stripe unit
boundaries.
Note Windows Server 2012, Windows Server 2008 R2, and
Windows Server 2008 default to a 64 KB offset if the disk is smaller
than 4 GB.
If alignment is still an issue, even with the default offset, you can use
DiskPart.exe to force alternative alignments when you create a
partition.

Solid-State Drives
Previously, the cost of large quantities of nonvolatile memory used
as block storage was prohibitive for most server configurations.
Exceptions included aerospace or military applications in which the
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 68

high shock and vibration tolerance of solid-state storage is highly

desirable.
As the cost of flash memory continues to decrease, new hierarchies
of storage become feasible, where nonvolatile memory (NVM) is
used to improve the storage subsystem response time on servers.
The typical vehicle for incorporating NVM in a server is the solidstate drive (SSD). One cost-effective strategy is to place only the
hottest data of a workload into nonvolatile memory. In Windows
Server 2012, as in previous versions of Windows Server, partitioning
can be performed only by applications that store data on an SSD;
Windows operating systems do not try to dynamically determine
what data should optimally be stored on SSDs versus rotating media.
There are emerging non-Microsoft storage management products, in
software and hardware+software combinations, which allow data
placement and migration to be performed without human
intervention. This is called tiering.
Choosing suitable SSDs to complement the hierarchy is a tradeoff
between the cost of the additional storage layer, endurance of the
media (with associated servicing costs), improved responsiveness of
the system, and improved energy efficiency. Current server SSDs are
designed around one or more types of flash memory. Some
important flash memory characteristics include:

Cost per capacity is orders of magnitude higher than for the

rotational media, while SSD access times are 2-3 orders of
magnitude better for random I/Os.

Read latency is substantially higher than write latency.

Media lifetime is limited by the number of erase and write

operations.

SSDs are inherently parallel devices, and they operate better

with longer I/O queues.

Power consumption of an SSD may spike with load, especially

for random workloads. Contrary to many claims, an SSDs
power efficiency needs to be evaluated relative to the load,
and it is not always superior to that of a hard disk drive.

SSDs are commonly grouped by their designers along the axes of

performance (especially latency), price per capacity, and endurance.
Server grade SSDs are designed to withstand substantially higher I/O
pressure by overprovisioning flash media, which increases cost per
capacity. Therefore, server SSDs can support much higher random
write IOPS rates, while being marginally better at read rates than
desktop SSDs. Throughput characteristics of SSDs are improving with
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 69

time, with the rapid rate of changes currently in the industry making
it crucial to consult up-to-date performance, power, and reliability
comparison data.
Due to superior media performance, the interconnect to the SSD
plays an important role. While most of the client and entry-level
server designs settled on using SATA interconnect (SATA version 3.0
becoming dominant recently) as the most compatible, better
operation characteristics may be achieved by using newer, PCIebased interconnect schemes. PCIe SSDs bring substantial
advantages, like more scalable throughput, lower latency, and, in
some cases, improved energy efficiency relative to throughput. One
of the drawbacks of this technology is its relative immaturity,
creating vertically integrated silos based on proprietary driver
stacks. To evaluate PCIe SSDs, it is important to test complete
combinations of the storage device, server platform, and software
layer, including the impact of reliability and maintainability on the
total cost of ownership (TCO).

Trim and Unmap Capabilities

Windows Server 2012 provides storage allocation transparency to
storage devices, including traditional storage arrays, hard disk
drives, SSDs, and Storage Spaces. Although this transparency is
critical for reducing capacity utilization in thinly provisioned
environments, it can also have an important impact on performance
and power consumption. Providing greater visibility into whats
allocated for storage devices from a holistic view enables the devices
to make better resource utilization decisions that result in higher
performance. In addition, because the storage footprint for a
deployment is reduced, power consumption can be reduced also.
The storage stack in Windows Server 2012 will issue standardsbased trims or unmaps for any space that becomes unallocated,
even within virtualized environments. Further, the new Storage
Optimizer runs automatically to help further reduce the physical
footprint of the data by consolidating data from sparsely populated
slabs to more densely populated slabs.
Together, these technologies can help improve performance and
power consumption. Administrators should investigate whether their
storage devices support Trim or Unmap commands to ensure an
efficient and performant deployment.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 70

Response Times
You can use tools such as Perfmon to obtain data on disk request
response times. Write requests that enter a write-back hardware
cache often have very low response times (less than 1 ms) because
completion depends on dynamic RAM (DRAM) speeds instead of
rotating disk speeds. The data is lazily written to disk media in the
background. As the workload begins to saturate the cache, response
times increase until the write caches only potential benefit is a
better ordering of requests to reduce positioning delays.
For JBOD arrays, Reads and Writes have approximately the same
performance characteristics. Writes can be slightly longer due to
additional mechanical settling delays. With modern hard disks,
positioning delays for random requests are 5 to 15 ms. Smaller 2.5inch drives have shorter positioning distances and lighter actuators,
so they generally provide faster seek times than comparable larger
3.5-inch drives. Positioning delays for sequential requests should be
insignificant except for streams of write-through requests, where
each positioning delay should approximate the required time for a
complete disk rotation. (Write-through requests are typically
identified by the ForceUnitAccess (FUA) flag on the disk request.)
Transfer times are usually less significant when they are compared to
positioning delays, except for large or sequential requests, which are
instead dominated by disk media access speeds as the requests
become larger or more sequential. Modern enterprise disks access
their media at 50 to 150 MB/s depending on rotation speed and
sectors per track, which varies across a range of blocks on a specific
disk model. The outermost tracks can have up to twice the
sequential throughput of innermost tracks.
If the stripe unit size is well chosen, each request is serviced by a
single diskexcept for low-concurrency workloads. So, the same
general positioning and transfer times still apply.
For simple implementations of mirrored arrays, a write completion
must wait for both disks to complete the request. Depending on how
the requests are scheduled, the two completions of the requests
could take a long time. Although writes for mirrored arrays generally
should not take twice the time to complete, they are typically slower
than a JBOD configuration. Reads can experience a performance
increase if the array controller is dynamically load balancing or
factoring in spatial locality. More complex implementations may use

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 71

logging or battery-backed write caching or other means to improve

write latencies.
For RAID 5 arrays (rotated parity), small writes become four separate
requests in the basic read-modify-write scenario. In the best case,
this is approximately the equivalent of two mirrored reads plus a full
rotation of the two disks that hold the data and corresponding parity,
if you assume that the read/write pairs continue in parallel.
You must consider the performance effect of redundancy on Read
and Write requests when you plan subsystems or analyze
performance data. For example, Perfmon might show that 50 writes
per second are being processed by volume x, but in reality, this
could mean 100 requests per second for a mirrored array virtual
disk, 200 requests per second for a RAID 5 array or parity virtual
disk, or even more than 200 requests per second if the requests are
split across stripe units.
Use the following response-time guidelines if no workload details are
available:

For a lightly loaded system, average write response times

should be less than 25 ms on RAID 5 or RAID 6, and less than
15 ms on non-RAID 5 or non-RAID 6 disks. Average read
response times should be less than 15 ms regardless.

For a heavily loaded system that is not saturated, average

write response times should be less than 75 ms on RAID 5 or
RAID 6, and less than 50 ms on non-RAID 5 or non-RAID 6
disks. Average read response times should be less than
50 ms.

Queue Lengths
Several opinions exist about what constitutes excessive disk request
queuing. This guide assumes that the boundary between a busy disk
subsystem and a saturated subsystem is a persistent average of two
requests per physical disk. A disk subsystem is near saturation when
every physical disk is servicing a request and has at least one
queued-up request to maintain maximum concurrencythat is, to
keep the data pipeline flowing. In this guideline, disk requests that
split into multiple requests (because of striping or redundancy
maintenance) are considered multiple requests.
This rule has caveats, because most administrators do not want all
physical disks constantly busy. Because disk activity often occurs in
bursts, this rule is more likely applied over shorter periods of peak
time. Requests are typically not uniformly spread among all hard
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 72

disks at the same time, so the administrator must consider

deviations between queuesespecially for workloads that contain
significant bursts. Conversely, a longer queue provides more
opportunity for disk request schedulers to reduce positioning delays
or to optimize for full stripe RAID 5 writes or mirrored read selection.
Because hardware has an increased capability to queue requests
either through multiple queuing agents along the path or through
agents with more queuing capabilityincreasing the multiplier
threshold might allow more concurrency within the hardware. This
creates a potential increase in response time variance, however.
Ideally, the additional queuing time is balanced by increased
concurrency and reduced mechanical positioning times.
Use the following queue length targets when few workload details
are available:

For a lightly loaded system, the average queue length should

be less than one per physical disk, with occasional spikes of
10 or less. If the workload is write heavy, the average queue
length above a mirrored array or virtual disk should be less
than 0.6 per physical disk and the average queue length
above a RAID 5 or RAID 6 array or a parity virtual disk should
be less than 0.3 per physical disk.

For a heavily loaded system that is not saturated, the

average queue length should be less than 2.5 per physical
disk, with infrequent spikes up to 20. If the workload is write
heavy, the average queue length above a mirrored array or
virtual disk should be less than 1.5 per physical disk, and the
average queue length above a RAID 5 array or parity virtual
disk should be less than 1.0 per physical disk.

For workloads of sequential requests, larger queue lengths

can be tolerated because services times, and therefore
response times, are much shorter than those for a random
workload.

For more information about storage performance in Windows

Server 2012, see Resources later in this guide.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 73

Performance Tuning for Web Servers

Selecting the Proper Hardware for Performance
It is important to select the proper hardware to satisfy the expected
web load, considering average load, peak load, capacity, growth
plans, and response times. Hardware bottlenecks limit the
effectiveness of software tuning.
Choosing and Tuning Server Hardware earlier in this guide provides
recommendations for hardware to avoid the following performance
constraints:

Slow CPUs offer limited processing power for ASP, ASP.NET,

and SSL scenarios.

A small L2 processor cache might adversely affect

performance.

A limited amount of memory affects the number of sites that

can be hosted, how many dynamic content scripts (such as
ASP.NET) can be stored, and the number of application pools
or worker processes.

Networking becomes a bottleneck because of an inefficient

network adapter.

The file system becomes a bottleneck because of an

inefficient disk subsystem or storage adapter.

Operating System Practices

If possible, perform a clean installation of the operating system
software. Upgrading the software can leave outdated, unwanted, or
suboptimal registry settings and previously installed services and
applications that consume resources if they are started
automatically. If another operating system is installed and you must
keep it, you should install the new operating system on a different
partition. Otherwise, the new installation overwrites the settings
under Program Files\Common Files.
To reduce disk access interference, place the system page file,
operating system, web data, ASP template cache, and the Internet
Information Services (IIS) log on separate physical disks if possible.
To reduce contention for system resources, install SQL Server and IIS
on different servers if possible.
Avoid installing nonessential services and applications. In some
cases, it might be worthwhile to disable services that are not
required on a system.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 74

Tuning IIS 8.0

Internet Information Services (IIS) 8.0 is the version that ships as
part of Windows Server 2012 . It uses a process model similar to that
of IIS 6.0. A kernel-mode web driver (http.sys) receives and routes
HTTP requests, and it can satisfy requests from its response cache.
Worker processes register for URL subspaces, and http.sys routes the
request to the appropriate process (or set of processes for
application pools).
The IIS 8.0 process relies on the kernel-mode web driver, http.sys.
Http.sys is responsible for connection management and request
handling. The request can be served from the http.sys cache or
passed to a worker process for further handling (see Figure 11).
Multiple worker processes can be configured, which provides
isolation at a reduced cost.
Http.sys includes a response cache. When a request matches an
entry in the response cache, http.sys sends the cache response
directly from kernel mode. Figure 11 shows the request flow from the
network through http.sys and potentially up to a worker process.
Some web application platforms, such as ASP.NET, provide
mechanisms to enable any dynamic content to be cached in the
kernel-mode cache. The static file handler in IIS 8.0 automatically
caches frequently requested files in http.sys.

Figure 11. Request handling in IIS 8.0

Because a web server has kernel-mode and user-mode components,

both components must be tuned for optimal performance. Therefore,
tuning IIS 8.0 for a specific workload includes configuring the
following:

Http.sys (the kernel-mode web driver) and the associated

kernel-mode cache

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 75

Worker processes and user-mode IIS, including the application

pool configuration

Certain tuning parameters that affect performance

The following sections discuss how to configure the kernel-mode and

user-mode aspects of IIS 8.0.

Kernel-Mode Tunings
Performance-related http.sys settings fall into two broad categories:
cache management and connection and request management. All
registry settings are stored under the following registry entry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\Parameters

Note If the HTTP service is already running, you must restart it for
the changes to take effect.

Cache Management Settings

One benefit that http.sys provides is a kernel-mode cache. If the
response is in the kernel-mode cache, you can satisfy an HTTP
request entirely from the kernel mode, which significantly lowers the
CPU cost of handling the request. However, the kernel-mode cache
of IIS 8.0 is based on physical memory, and the cost of an entry is
the memory that it occupies.
An entry in the cache is helpful only when it is used. However, the
entry always consumes physical memory, whether or not the entry is
being used. You must evaluate the usefulness of an item in the cache
(the savings from being able to serve it from the cache) and its cost
(the physical memory occupied) over the lifetime of the entry by
considering the available resources (CPU and physical memory) and
the workload requirements. http.sys tries to keep only useful,
actively accessed items in the cache, but you can increase the
performance of the web server by tuning the http.sys cache for
particular workloads.
The following are some useful settings for the http.sys kernel-mode
cache:

UriEnableCache. Default value: 1.

A nonzero value enables the kernel-mode response and fragment

caching. For most workloads, the cache should remain enabled.
Consider disabling the cache if you expect a very low response
and fragment caching.

UriMaxCacheMegabyteCount. Default value: 0.

A nonzero value specifies the maximum memory that is available

to the kernel-mode cache. The default value, 0, enables the
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 76

system to automatically adjust how much memory is available to

the cache.
Note Specifying the size sets only the maximum, and the
system might not let the cache grow to the specified size.

UriMaxUriBytes. Default value: 262144 bytes (256 KB).

This is the maximum size of an entry in the kernel-mode cache.

Responses or fragments larger than this are not cached. If you
have enough memory, consider increasing the limit. If memory is
limited and large entries are crowding out smaller ones, it might
be helpful to lower the limit.

UriScavengerPeriod. Default value: 120 seconds.

The http.sys cache is periodically scanned by a scavenger, and

entries that are not accessed between scavenger scans are
removed. Setting the scavenger period to a high value reduces
the number of scavenger scans. However, the cache memory
usage might increase because older, less frequently accessed
entries can remain in the cache. Setting the period too low
causes more frequent scavenger scans, and it can result in too
many flushes and cache churn.

Request and Connection Management Settings

In Windows Server 2012, http.sys manages connections
automatically. The following registry keys that were used in earlier
releases are considered deprecated and are not necessary in
Windows Server 2012:

MaxConnections

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\MaxConnections

IdleConnectionsHighMark

IdleConnectionsLowMark

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\IdleConnectionsHighMark

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\IdleConnectionsLowMark

IdleListTrimmerPeriod

RequestBufferLookasideDepth

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\IdleListTrimmerPeriod

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\RequestBufferLookasideDepth

InternalRequestLookasideDepth

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Http\
Parameters\InternalRequestLookasideDepth
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 77

User-Mode Settings
The settings in this section affect the IIS 8.0 worker process behavior.
Most of these settings can be found in the following XML
configuration file:
%SystemRoot
%\system32\inetsrv\config\applicationHost.config
Use Appcmd.exe or the IIS 8.0 Management Console to change
them. Most settings are automatically detected, and they do not
require a restart of the IIS 8.0 worker processes or web application
server.

User-Mode Cache Behavior Settings

This section describes the settings that affect caching behavior in
IIS 8.0. The user-mode cache is implemented as a module that
listens to the global caching events that are raised by the integrated
pipeline. To completely disable the user-mode cache, remove the
FileCacheModule (cachfile.dll) module from the list of installed
modules in the system.webServer/globalModules configuration
section in applicationHost.config.
system.webServer/caching
Attribute

Description

Enabled

Disables the user-mode IIS cache when set to

False. When the cache hit rate is very small, you
can disable the cache completely to avoid the
overhead that is associated with the cache code
path. Disabling the user-mode cache does not
disable the kernel-mode cache.
Disables the kernel-mode cache when set to
False.
Limits the IIS user-mode cache size to the
specified size in megabytes. IIS adjusts the
default depending on available memory. Choose
the value carefully based on the size of the set
of frequently accessed files versus the amount
of RAM or the IIS process address space, which is
limited to 2 GB on 32-bit systems.
Caches files up to the specified size. The actual
value depends on the number and size of the
largest files in the data set versus the available
RAM. Caching large, frequently requested files
can reduce CPU usage, disk access, and
associated latencies. The default value
is 256 KB.

enableKernelCa
che
maxCacheSize

maxResponseSi
ze

April 12, 2013

Defa
ult
True

True
0

2621
44

Performance Tuning Guidelines for Windows Server 2012 - 78

Compression Behavior Settings

In Windows Server 2012, IIS 8.0 compresses static content by
default. Also, compression of dynamic content is enabled by default
when the DynamicCompressionModule is installed. Compression
reduces bandwidth usage but increases CPU usage. Compressed
content is cached in the kernel-mode cache if possible. IIS 8.0 lets
compression be controlled independently for static and dynamic
content. Static content typically refers to content that does not
change, such as GIF or HTM files. Dynamic content is typically
generated by scripts or code on the server, that is, ASP.NET pages.
You can customize the classification of any particular extension as
static or dynamic.
To completely disable compression, remove
StaticCompressionModule and DynamicCompressionModule from the
list of modules in the system.webServer/globalModules section in
applicationHost.config.
system.webServer/httpCompression
Attribute
staticCompressio
nEnableCpuUsage,
staticCompressio
nDisableCpuUsage,
dynamicCompres
sionEnableCpuUsage,
dynamicCompres
sionDisableCpuUsage
directory

doDiskSpaceLimiti
ng

Description
Enables or disables compression if the
current percentage CPU usage goes above
or below specified limits.
IIS 8.0 automatically disables compression
if steady-state CPU increases above the
Disable threshold. Compression is enabled
if CPU drops below the Enable threshold.

Default
50,
100, 50,
and 90
respectiv
ely

Specifies the directory in which

compressed versions of static files are
temporarily stored and cached. Consider
moving this directory off the system drive
if it is accessed frequently.
The default value is %SystemDrive
%\inetpub\temp
\IIS Temporary Compressed Files.
Specifies whether a limit exists for how
much disk space all compressed files can
occupy. Compressed files are stored in the
compression directory that is specified by
the directory attribute.

See
Descripti
on
column

April 12, 2013

True

Performance Tuning Guidelines for Windows Server 2012 - 79

Attribute
maxDiskSpaceUsa
ge

Description
Specifies the number of bytes of disk
space that compressed files can occupy in
the compression directory.
This setting might need to be increased if
the total size of all compressed content is
too large.

Default
100 MB

system.webServer/urlCompression
Attribute
doStaticCompression
doDynamicCompressi
on

Description
Specifies whether static content is
compressed.
Specifies whether dynamic content is
compressed.

Default
True
True

Note For IIS 8.0 servers that have low average CPU usage, consider
enabling compression for dynamic content, especially if responses
are large. This should first be done in a test environment to assess
the effect on the CPU usage from the baseline.

Tuning the Default Document List

The default document module handles HTTP requests for the root of
a directory and translates them into requests for a specific file, such
as Default.htm or Index.htm. On average, around 25 percent of all
requests on the Internet go through the default document path. This
varies significantly for individual sites. When an HTTP request does
not specify a file name, the default document module searches the
list of allowed default documents for each name in the file system.
This can adversely affect performance, especially if reaching the
content requires making a network round trip or touching a disk.
You can avoid the overhead by selectively disabling default
documents and by reducing or ordering the list of documents. For
websites that use a default document, you should reduce the list to
only the default document types that are used. Additionally, order
the list so that it begins with the most frequently accessed default
document file name.
You can selectively set the default document behavior on particular
URLs by customizing the configuration inside a location tag in
applicationHost.config or by inserting a web.config file directly in the
content directory. This allows a hybrid approach, which enables
default documents only where they are necessary and sets the list to
the correct file name for each URL.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 80

To disable default documents completely, remove

DefaultDocumentModule from the list of modules in the
system.webServer/globalModules section in applicationHost.config.
system.webServer/defaultDocument
Attribute
Description

Default

enabled

True

<files>
element

Specifies that default documents are

enabled.
Specifies the file names that are configured
as default documents.
The default list is Default.htm, Default.asp,
Index.htm, Index.html, Iisstart.htm, and
Default.aspx.

See
Descript
ion
column

Central Binary Logging

Binary IIS logging reduces CPU usage, disk I/O, and disk space usage.
Central binary logging is directed to a single file in binary format,
regardless of the number of hosted sites. Parsing binary-format logs
requires a post-processing tool.
You can enable central binary logging by setting the
centralLogFileMode attribute to CentralBinary and setting the
enabled attribute to True. Consider moving the location of the central
log file off the system partition and onto a dedicated logging
partition to avoid contention between system activities and logging
activities.
system.applicationHost/log
Attribute
centralLogFileM
ode

Description
Specifies the logging mode for a server.
Change this value to CentralBinary to enable
central binary logging.

Default
Site

system.applicationHost/log/centralBinaryLogFile
Attribute
enabled
directory

Description
Specifies whether central binary logging is
enabled.
Specifies the directory where log entries are
written.
The default directory is: %SystemDrive
%\inetpub\logs\LogFiles

Default
False
See Description
column

Application and Site Tunings

The following settings relate to application pool and site tunings.
system.applicationHost/applicationPools/applicationPoolDefaults
Attribute
Description
Default

queueLength

Indicates to the kernel-mode web

April 12, 2013
2012 Microsoft. All rights reserved.

1000

Performance Tuning Guidelines for Windows Server 2012 - 81

enable32BitAppOn
Win64

driver, http.sys, how many requests

are queued for an application pool
before future requests are rejected.
When the value for this property is
exceeded, IIS rejects subsequent
requests with a 503 error.
Consider increasing this for
applications that communicate with
high-latency back-end data stores
if 503 errors are observed.
When True, enables a 32-bit
application to run on a computer
that has a 64-bit processor.
Consider enabling 32-bit mode if
memory consumption is a concern.
Because pointer sizes and instruction
sizes are smaller, 32-bit applications
use less memory than 64-bit
applications. The drawback to
running 32-bit applications on a 64bit computer is that user-mode
address space is limited to 4 GB.

False

system.applicationHost/sites/VirtualDirectoryDefault
Attribute
allowSubDirConfig

Description
Specifies whether IIS looks for web.config
files in content directories lower than the
current level (True) or does not look for
web.config files in content directories
lower than the current level (False).
By imposing a simple limitation, which
allows configuration only in virtual
directories, IIS 8.0 can know that unless
/<name>.htm is a virtual directory it
should not look for a configuration file.
Skipping the additional file operations
can significantly improve performance of
websites that have a very large set of
randomly accessed static content.

Default
True

Managing IIS 8.0 Modules

IIS 8.0 has been factored into multiple, user-extensible modules to
support a modular structure. This factorization has a small cost. For
each module the integrated pipeline must call the module for every
event that is relevant to the module. This happens regardless of
whether the module must do any work. You can conserve CPU cycles
and memory by removing all modules that are not relevant to a
particular website.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 82

A web server that is tuned for simple static files might include only
the following five modules: UriCacheModule, HttpCacheModule,
StaticFileModule, AnonymousAuthenticationModule, and
HttpLoggingModule.
To remove modules from applicationHost.config, remove all
references to the module from the system.webServer/handlers and
system.webServer/modules sections in addition to the module
declaration in system.webServer/globalModules.

Classic ASP Settings

The following settings apply only to classic ASP pages and do not
affect ASP.NET settings. For performance recommendations for
ASP.NET, see 10 Tips for Writing High-Performance Web Applications.
system.webServer/asp/cache
Attribute
Description

diskTemplateCacheDi
rectory

Contains the name of the directory

that ASP uses to store compiled
templates when the in-memory
cache overflows.
Recommendation: If possible, set to
a directory that is not heavily used,
for example, a drive that is not
shared with the operating system,
IIS log, or other frequently accessed
content.

maxDiskTemplateCac
heFiles

The default directory is:

%SystemDrive%\inetpub\temp
\ASP Compiled Templates
Specifies the maximum number of
compiled ASP templates that can
be stored.
Recommendation: Set to the
maximum value of 0x7FFFFFFF.
This attribute specifies the number
of precompiled script files to cache.

scriptFileCacheSize

scriptEngineCacheMa
x

Recommendation: Set to as many

ASP templates as memory limits
allow.
Specifies the maximum number of
scripting engines that ASP pages
will keep cached in memory.
Recommendation: Set to as many
script engines as the memory limit

April 12, 2013

Defaul
t

See
Descriptio
n
column

2000

500

250

Performance Tuning Guidelines for Windows Server 2012 - 83

allows.
system.webServer/asp/limits
Attribute
processorThreadMax

Description
Specifies the maximum number of
worker threads per processor that ASP
can create. Increase if the current
setting is insufficient to handle the
load, which can cause errors when it is
serving requests or cause under-usage
of CPU resources.

Default
25

system.webServer/asp/comPlus
Attribute

Description

executeInMta

Set to True if errors or failures are

detected while IIS is serving ASP
content. This can occur, for example,
when hosting multiple isolated sites in
which each site runs under its own
worker process. Errors are typically
reported from COM+ in the Event
Viewer. This setting enables the
multithreaded apartment model in ASP.

Defaul
t
False

ASP.NET Concurrency Setting

By default, ASP.NET limits request concurrency to reduce steadystate memory consumption on the server. High concurrency
applications might need to adjust some settings to improve overall
performance. These settings are stored under the following registry
entry:
HKEY_LOCAL_MACHINE\Software\Microsoft\ASP.NET\4.0.30319.0\Parameters

The following setting is useful to fully use resources on a system:

MaxConcurrentRequestPerCpu. Default value: 5000.

This setting limits the maximum number of concurrently

executing ASP.NET requests on a system. The default value is
conservative to reduce memory consumption of ASP.NET
applications. Consider increasing this limit on systems that run
applications that perform long, synchronous I/O operations.
Otherwise, users can experience high latency because of queuing
or request failures due to exceeding queue limits under a high
load when the default setting is used.

Worker Process and Recycling Options

In the IIS Administrator user interface, the options for recycling IIS
worker processes provide practical solutions to acute situations or
events without requiring intervention or resetting a service or
computer. Such situations and events include memory leaks,
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 84

increasing memory load, or unresponsive or idle worker processes.

Under ordinary conditions, recycling options might not be needed
and recycling can be turned off or the system can be configured to
recycle very infrequently.
You can enable process recycling for a particular application by
adding attributes to the recycling/periodicRestart element. The
recycle event can be triggered by several events including memory
usage, a fixed number of requests, and a fixed time period. When a
worker process is recycled, the queued and executing requests are
drained, and a new process is simultaneously started to service new
requests. The recycling/periodicRestart element is perapplication, which means that each attribute in the following table is
partitioned on a per-application basis.
system.applicationHost/applicationPools/ApplicationPoolDefaults/recy
cling/periodicRestart
Attribute

Description

memory

Enable process recycling if virtual memory

consumption exceeds the specified limit in
kilobytes. This is a useful setting for 32-bit
computers that have a small, 2 GB address space.
It can help avoid failed requests due to out-ofmemory errors.
Enable process recycling if private memory
allocations exceed a specified limit in kilobytes.
Enable process recycling after a certain number
of requests.
Enable process recycling after a specified time
period.

privateMem
ory
requests
time

Defaul
t
0

0
0
29:00:
00

Secure Sockets Layer Tuning Parameters

The use of Secure Sockets Layer (SSL) imposes additional CPU cost.
The most expensive component of SSL is the session establishment
cost (which involves a full handshake). Reconnection, encryption,
and decryption also add to the cost. For better SSL performance, do
the following:

Enable HTTP keep-alives for SSL sessions. This eliminates the

session establishment costs.

Reuse sessions when appropriate, especially with non-keepalive traffic.

Notes

Larger keys provide more security, but they also use more
CPU time.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 85

All components might not need to be encrypted. However,

mixing plain HTTP and HTTPS might result in a pop-up
warning that not all content on the page is secure.

ISAPI
No special tuning parameters are needed for the Internet Server
Application Programming Interface (ISAPI) applications. If you write a
private ISAPI extension, make sure that you code it efficiently for
performance and resource use. For more information, see Other
Issues that Affect IIS Performance later in this guide.

Managed Code Tuning Guidelines

The integrated pipeline model in IIS 8.0 enables a high degree of
flexibility and extensibility. Custom modules that are implemented in
native or managed code can be inserted into the pipeline, or they
can replace existing modules. Although this extensibility model
offers convenience and simplicity, you should be careful before you
insert new managed modules that hook into global events. Adding a
global managed module means that all requests, including static file
requests, must touch managed code. Custom modules are
susceptible to events such as garbage collection. In addition, custom
modules add significant CPU cost due to marshaling data between
native and managed code. If possible, you should implement global
modules in native (C/C++) code.
Before you deploy an ASP.NET website, make sure that you compile
all scripts. You can do this by calling one .NET script in each
directory. Reset IIS after the compilation is complete. Recompile the
scripts after you make changes to machine.config, web.config, or
any .aspx scripts.
If session state is not needed, make sure that you turn it off for each
page.
When you run multiple hosts that contain ASP.NET scripts in isolated
mode (one application pool per site), monitor the memory usage.
Make sure that the server has enough RAM for the expected number
of concurrently running application pools. Consider using multiple
application domains instead of multiple isolated processes.
For performance recommendations on ASP.NET, see 10 Tips for
Writing High-Performance Web Applications.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 86

Other Issues that Affect IIS Performance

The following issues affect IIS performance:

Installation of filters that are not cache-aware

The installation of a filter that is not HTTP-cache-aware causes IIS

to completely disable caching, which results in poor performance.
ISAPI filters that were written before IIS 6.0 can cause this
behavior.

Common Gateway Interface (CGI) requests

For performance reasons, the use of CGI applications to serve

requests is not recommended with IIS. Frequently creating and
deleting CGI processes involves significant overhead. Better
alternatives include using ISAPI application scripts and ASP or
ASP.NET scripts. Isolation is available for each of these options.

NTFS File System Setting

The system-global switch NtfsDisableLastAccessUpdate
(REG_DWORD) 1 is located under
HKLM\System\CurrentControlSet\Control\FileSystem\.
This switch reduces disk I/O load and latencies by disabling date and
time stamp updating for the last file or directory access. This key is
set to 1 by default. Clean installations of Windows Server 2012,
Windows Server 2008 R2, and Windows Server 2008 set this key by
default, and you do not need to adjust it. Earlier versions of Windows
operating systems did not set this key. If your server is running an
earlier version of Windows, or it was upgraded to Windows
Server 2012, Windows Server 2008 R2, or Windows Server 2008, you
should set this key to 1.
Disabling the updates is effective when you are using large data sets
(or many hosts) that contain thousands of directories. We
recommend that you use IIS logging instead if you maintain this
information only for web administration.
Caution Some applications such as incremental backup utilities
rely on this update information, and they do not function correctly
without it.

Networking Subsystem Performance Settings for IIS

See Performance Tuning for Networking Subsystem earlier in this
guide.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 87

Performance Tuning for File Servers

Selecting the Proper Hardware for Performance
You should select the proper hardware to satisfy the expected file
server load, considering average load, peak load, capacity, growth
plans, and response times. Hardware bottlenecks limit the
effectiveness of software tuning. Choosing and Tuning Server
Hardware earlier in this guide provides hardware recommendations.
The sections on networking and storage subsystems also apply to
file servers.

Server Message Block Model

This section provides information about the Server Message Block
(SMB) model for client-server communication, including the SMB 1.0,
SMB 2.0 and SMB 3.0 protocols.

SMB Model Overview

The SMB model consists of two entities: the client and the server.
On the client, applications perform system calls by requesting
operations on remote files. These requests are handled by the
redirector subsystem (Rdbss.sys) and the SMB miniredirector
(Mrxsmb.sys), which translate them into SMB protocol sessions and
requests over TCP/IP. In Windows 8, the SMB 3.0 protocol is
supported. The Mrxsmb10.sys driver handles legacy SMB traffic, and
the Mrxsmb20.sys driver handles SMB 2.0 and SMB 3.0 traffic.
On the server, SMB connections are accepted and SMB requests are
processed as local file system operations through the NT file system
(NTFS), Resilient File System (ReFS) or the Cluster Shared Volume file
system (CSVFS) and the local storage stack. The Srv.sys driver
handles legacy SMB traffic, and the Srv2.sys driver handles SMB 2.0
and SMB 3.0 traffic. The Srvnet.sys component implements the
interface between networking and the file server for both SMB
protocols. File system metadata and content can be cached in
memory through the system cache in the kernel (Ntoskrnl.exe) if the
file is not opened with the write-through flag set.
Figure 12 summarizes the layers that a user request on a client
computer must pass through to perform file operations over the
network on a remote SMB file server.

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 88

File Client

SM
B File
Server
SMB
File Server

SMB

Application

SR
V.SYS or
or SRV2.SYS
SRV2.SYS
SRV.SYS

RDBSS.SYS
MRXSMB.SYS
MRXSMB10.SYS
or

MRXSMB20.SYS
Network Stack

SR
VNET.S
SRVNET.S
YS
YS

System
System
Cache
Cache

Netw
ork
Network
Stack
Stack

NTFS.SYS
NTFS.SYS
Storage
Storage
Stack
Stack

Figure 12. Layers on a remote SMB File Server

SMB Configuration Considerations

Do not enable any services or features that your file server and file
clients do not require. These might include SMB signing, client-side
caching, file system minifilters, search service, scheduled tasks,
NTFS encryption, NTFS compression, IPSEC, firewall filters, Teredo,
SMB encryption, and antivirus features.
Ensure that the BIOS and operating system power management
modes are set as needed, which might include High Performance
mode. Ensure that the latest, most resilient, and fastest storage and
networking device drivers are installed.
Copying files is a common operations performed on a file server. The
Windows Server operating system has several built-in file copy
utilities that you can run in a command shell, including Xcopy and
Robocopy. When you use Xcopy, we recommend adding the /q and
/k options to your existing parameters, when applicable, to maximize
performance. The former option reduces CPU overhead by reducing
console output and the latter reduces network traffic. When using
Robocopy, the /mt option (in Windows Server 2012 and Windows
Server 2008 R2) can significantly improve speed on remote file
transfers by using multiple threads when copying multiple small files.
We also recommend the /log option to reduce console output by
redirecting to NUL device or to a file.
Previous releases of the Windows Server operating system
sometimes benefitted from tools that limit the working-set size of the
Windows file cache. These tools are not necessary on most servers
running Windows 2008 R2 and Windows Server 2012. You should
reevaluate your use of such tools.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 89

Tuning Parameters for SMB File Servers

The following registry tuning parameters can affect the performance
of SMB file servers:

NtfsDisable8dot3NameCreation
HKLM\System\CurrentControlSet\Control\FileSystem\REG_DWORD)

The default in Windows Server 2012 is 2, and in previous releases

it is 0. This parameter determines whether NTFS generates a
short name in the 8dot3 (MS-DOS) naming convention for long file
names and for file names that contain characters from the
extended character set. If the value of this entry is 0, files can
have two names: the name that the user specifies and the short
name that NTFS generates. If the user-specified name follows the
8dot3 naming convention, NTFS does not generate a short name.
A value of 2 means that this parameter can be configured per
volume.
Note The system volume will have 8dot3 enabled, whereas it is
disabled by default in other volumes in Windows Server 2012.
Changing this value does not change the contents of a file, but it
avoids the short-name attribute creation for the file, which also
changes how NTFS displays and manages the file. For most SMB
file servers, the recommended setting is 1 (disabled). For
example, you would want to disable the setting if you have a
clustered file server.
In Windows Server 2012 and Windows Server 2008 R2, you can
disable 8dot3name creation on a per-volume basis without using
the global NtfsDisable8dot3NameCreation setting. You can do this
with the built-in fsutil tool. For example, to disable 8dot3 name
creation on the volume D, run fsutil 8dot3name set d: 1 from a
Command Prompt window. You can view Help text by using the
command fsutil 8dot3name. If you are disabling a new 8dot3
name creation on a volume that has existing data, consider
stripping existing 8dot3 names from the volume. This can also be
done with the fsutil tool. For example, to strip existing 8dot3
names on volume D and log the changes made, run fsutil
8dot3name strip /l 8dot3_removal_log.log /s d:\. You can
view Help text by typing the command fsutil 8dot3name strip.

TreatHostAsStableStorage
HKLM\System\CurrentControlSet\Services\LanmanServer
\Parameters\(REG_DWORD)

The default is 0. This parameter disables processing write flush

commands from clients. If the value of this entry is 1, the server
performance and client latency for power-protected servers can
improve. Workloads that resemble the NetBench file server
benchmark benefit from this behavior.
Note If you have a clustered file server, it is possible that you
may experience data loss if the server fails with this setting
enabled. Therefore, evaluate it carefully prior to applying it.
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 90

AsynchronousCredits
HKLM\System\CurrentControlSet\Services\LanmanServer
\Parameters\(REG_DWORD)

The default is 512. This parameter limits the number of

concurrent asynchronous SMB commands that are allowed on a
single connection. Some cases (such as when there is a front-end
server with a back-end IIS server) require a large amount of
concurrency (for file change notification requests, in particular).
The value of this entry can be increased to support these cases.

Smb2CreditsMin and Smb2CreditsMax

HKLM\System\CurrentControlSet\Services\LanmanServer
\Parameters\(REG_DWORD)

The defaults are 512 and 8192, respectively. These parameters

allow the server to throttle client operation concurrency
dynamically within the specified boundaries. Some clients might
achieve increased throughput with higher concurrency limits, for
example, copying files over high-bandwidth, high-latency links.

AdditionalCriticalWorkerThreads
HKLM\System\CurrentControlSet\Control\Session Manager\Executive\
(REG_DWORD)

The default is 0, which means that no additional critical kernel

worker threads are added. This value affects the number of
threads that the file system cache uses for read-ahead and writebehind requests. Raising this value can allow for more queued I/O
in the storage subsystem, and it can improve I/O performance,
particularly on systems with many logical processors and
powerful storage hardware.

MaximumTunnelEntries
HKLM\System\CurrentControlSet\Control\FileSystem\(REG_DWORD)

The default is 1024. Reduce this value to reduce the size of the
NTFS tunnel cache. This can significantly improve file deletion
performance for directories that contain a large number of files.
Note

Some applications depend on NTFS tunnel caching.

MaxThreadsPerQueue
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\
(REG_DWORD)

The default is 20. Increasing this value raises the number of

threads that the file server can use to service concurrent
requests. When a large number of active connections need to be
serviced, and hardware resources (such as storage bandwidth)
are sufficient, increasing the value can improve server scalability,
performance, and response times.

RequireSecuritySignature
HKLM\system\CurrentControlSet\Services\LanmanServer\Parameters
\(REG_DWORD)

The default is 0. Changing this value to 1 prevents SMB

communication with computers where SMB signing is disabled. In
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 91

addition, a value of 1 causes SMB signing to be used for all SMB

communication. SMB signing can increase CPU cost and network
round trips. If SMB signing is not required, ensure that the registry
value is 0 on all clients and servers.

NtfsDisableLastAccessUpdate
HKLM\System\CurrentControlSet\Control\FileSystem\(REG_DWORD)

The default is 1. In versions of Windows earlier than Windows

Vista and Windows Server 2008, the default is 0. A value of 0 can
reduce performance because the system performs additional
storage I/O when files and directories are accessed to update
date and time information.

MaxMpxCt (SMB 1 clients only)

HKLM\System\CurrentControlSet\Services\LanmanServer
\Parameters\(REG_DWORD)

The default is 50. This parameter suggests a limit on the

maximum number of outstanding requests that an SMB 1 client
can send. Increasing the value can use more memory, but it can
improve performance for some client applications by enabling a
deeper request pipeline. Increasing the value in conjunction with
MaxCmds can also eliminate errors that are encountered due to
large numbers of outstanding long-term file requests, such as
FindFirstChangeNotification calls. This parameter does not affect
connections with SMB 2 clients.
The following parameters are not required in Windows Server 2012:

NoAliasingOnFileSystem

HKLM\System\CurrentControlSet\Services\LanmanServer
\Parameters\(REG_DWORD)

PagedPoolSize

HKLM\System\CurrentControlSet\Control\SessionManager
\MemoryManagement\(REG_DWORD)

NumTcbTablePartitions

HKLM\system\CurrentControlSet\Services\Tcpip\Parameters\(REG_DWORD)

TcpAckFrequency

HKLM\system\CurrentControlSet\Services\Tcpip\Parameters\Interfaces

SMB Server Tuning Example

The following settings can optimize a computer for file server
performance in many cases. The settings are not optimal or
appropriate on all computers. You should evaluate the impact of
individual settings before applying them.
Parameter

Value

NtfsDisable8dot3NameCreation

Defaul
t
2

TreatHostAsStableStorage

April 12, 2013

Performance Tuning Guidelines for Windows Server 2012 - 92

AdditionalCriticalWorkerThreads

MaximumTunnelEntries

1024

MaxThreadsPerQueue

32768

RequireSecuritySignature
MaxMpxCt (only applicable to SMB 1 clients)

Services for NFS Model

The following sections provide information about the Microsoft
Services for Network File System (NFS) model for client-server
communication.

Services for NFS Model Overview

Microsoft Services for NFS provides a file-sharing solution for
enterprises that have a mixed Windows and UNIX environment. This
communication model consists of client computers and a server (see
Figure 13). Applications on the client request files that are located on
the server through the redirector (Rdbss.sys and NFS miniredirector
Nfsrdr.sys). The miniredirector uses the NFS protocol to send its
request through TCP/IP. The server receives multiple requests from
the clients through TCP/IP and routes the requests to the local file
system (Ntfs.sys), which accesses the storage stack.

Figure 13. Microsoft services for NFS model for client-server communication

Tuning Parameters for NFS File Servers

The following registry-tuning parameters can affect the performance
of NFS file servers:

OptimalReads

HKLM\System\CurrentControlSet\Services\NfsServer\Parameters\
(REG_DWORD)
April 12, 2013
2012 Microsoft. All rights reserved.

Performance Tuning Guidelines for Windows Server 2012 - 93

Default is 0. Determines whether files are opened for

FILE_RANDOM_ACCESS or for FILE_SEQUENTIAL_ONLY, depending
on the workload I/O characteristics. Set this value to 1 to force
files to be opened for FILE_RANDOM_ACCESS.
FILE_RANDOM_ACCESS prevents the file system and cache
manager from prefetching.
For more information about File Access Services, see the File
Servers section under Resources later in this guide.

RdWrHandleLifeTime