Networker Pop Guide 19 12 en Us
Networker Pop Guide 19 12 en Us
12
Performance Optimization Planning Guide
Dell Inc.
January 2025
Rev. 01
Notes, cautions, and warnings
NOTE: A NOTE indicates important information that helps you make better use of your product.
CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid
the problem.
WARNING: A WARNING indicates a potential for property damage, personal injury, or death.
© 2000 - 2025 Dell Inc. or its subsidiaries. All rights reserved. Dell Technologies, Dell, and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.
Contents
Figures..........................................................................................................................................6
Tables........................................................................................................................................... 7
Preface.........................................................................................................................................................................................8
Chapter 1: Overview.....................................................................................................................12
Introduction......................................................................................................................................................................... 12
NetWorker data flow.........................................................................................................................................................12
Contents 3
Chapter 3: Tune Settings.............................................................................................................47
Optimize NetWorker parallelism.....................................................................................................................................47
Server parallelism.........................................................................................................................................................47
Server's client parallelism.......................................................................................................................................... 47
Action parallelism.........................................................................................................................................................48
Multiplexing...................................................................................................................................................................48
File system density............................................................................................................................................................48
Disk optimization............................................................................................................................................................... 48
Device performance tuning methods........................................................................................................................... 49
Input/output transfer rate........................................................................................................................................ 49
Built-in compression................................................................................................................................................... 49
Drive streaming............................................................................................................................................................ 49
Device load balancing................................................................................................................................................. 49
Fragmenting a disk drive............................................................................................................................................49
Network devices................................................................................................................................................................50
Fibre Channel latency.................................................................................................................................................50
Data Domain.................................................................................................................................................................. 51
CloudBoost....................................................................................................................................................................52
AFTD device target and max sessions................................................................................................................... 52
Number of virtual device drives versus physical device drives........................................................................53
Network optimization....................................................................................................................................................... 54
Advanced configuration optimization..................................................................................................................... 54
Operating system TCP stack optimization............................................................................................................54
Advanced tuning..........................................................................................................................................................54
Network latency.......................................................................................................................................................... 55
Ethernet duplexing......................................................................................................................................................55
Firewalls......................................................................................................................................................................... 56
Jumbo frames...............................................................................................................................................................56
Congestion notification..............................................................................................................................................56
TCP buffers.................................................................................................................................................................. 57
Increase TCP backlog buffer size............................................................................................................................58
IRQ balancing and CPU affinity............................................................................................................................... 59
Interrupt moderation.................................................................................................................................................. 59
TCP chimney offloading............................................................................................................................................ 59
Name resolution...........................................................................................................................................................60
Operating system specific settings for SLES SP2....................................................................................................60
Optimizing nsrmmdbd memory usage...........................................................................................................................61
4 Contents
Limit memory usage on the host during clone operations...................................................................................... 69
Contents 5
Figures
6 Figures
Tables
1 Revision history.......................................................................................................................................................... 8
2 Style conventions.......................................................................................................................................................9
3 Sizing information for a physical server.............................................................................................................. 17
4 Sizing information for a virtual server................................................................................................................. 18
5 Reliability events....................................................................................................................................................... 19
6 Bus specifications.....................................................................................................................................................21
7 Disk write latency results and recommendations.............................................................................................23
8 PSS support by NetWorker release.................................................................................................................... 28
9 Required IOPS for NetWorker server operations............................................................................................ 32
10 Disk drive IOPS values............................................................................................................................................ 34
11 Distribution of workflows and jobs...................................................................................................................... 37
12 The effect of blocksize on an LTO-4 tape drive..............................................................................................50
13 Environment variables............................................................................................................................................. 61
14 Tolerable Range for Low Density file system....................................................................................................68
15 Tolerable Range for High Density file system...................................................................................................69
Tables 7
Preface
As part of an effort to improve product lines, periodic revisions of software and hardware are released. Therefore, all versions of
the software or hardware currently in use might not support some functions that are described in this document. The product
release notes provide the most up-to-date information about product features.
If a product does not function correctly or does not function as described in this document, contact a technical support
professional.
NOTE: This document was accurate at publication time. To ensure that you are using the latest version of this document,
go to the Dell Support site.
Purpose
This document describes how to size and optimize the NetWorker software.
Audience
This document is intended for the NetWorker software administrator.
Revision history
The following table presents the revision history of this document.
Related documentation
The NetWorker documentation set includes the following publications, available on the Support website:
● NetWorker E-LAB Navigator
Provides compatibility information, including specific software and hardware configurations that NetWorker supports. To
access E-LAB Navigator, go to elabnavigator.
● NetWorker Administration Guide
Describes how to use the NetWorker software to provide data protection for NDMP filers.
● NetWorker Cluster Integration Guide
Contains information that is related to configuring NetWorker software on cluster servers and clients.
● NetWorker Installation Guide
Provides information about how to install, uninstall, and update the NetWorker software for clients, storage nodes, and
servers on all supported operating systems.
● NetWorker Update Guide
Describes how to update the NetWorker software from a previously installed release.
● NetWorker Release Notes
Contains information about new features and changes, fixed problems, known limitations, environment, and system
requirements for the latest NetWorker software release.
8 Preface
● NetWorker Command Reference Guide
Describes how to design, plan for, and perform a step-by-step NetWorker disaster recovery.
● NetWorker Snapshot Management Configuration Guide
Describes the ability to catalog and manage snapshot copies of production data that are created by using mirror technologies
on storage arrays.
● NetWorkerSnapshot Management for NAS Devices Configuration Guide
Describes how to catalog and manage snapshot copies of production data that are created by using replication technologies
on NAS devices.
● NetWorker Security Configuration Guide
Provides an overview of security configuration settings available in NetWorker, secure deployment, and physical security
controls needed to ensure the secure operation of the product.
● NetWorker and VMware Integration Guide
Provides planning and configuration information about the use of VMware in a NetWorker environment.
● NetWorker Error Message Guide
Contains the NetWorker APIs and includes tutorials to guide you in their use.
● CloudBoost Integration Guide
Provides an overview of security configuration settings available in NetWorker and Cloud Boost, secure deployment, and
physical security controls needed to ensure the secure operation of the product.
● NetWorker Management Console Online Help
Describes the day-to-day administration tasks that are performed in the NetWorker Management Console and the
NetWorker Administration window. To view the online help, click Help in the main menu.
● NetWorker User Online Help
Describes how to use the NetWorker User program, which is the Windows client interface, to connect to a NetWorker
server to back up, recover, archive, and retrieve files over a network.
Typographical conventions
The following type style conventions are used in this document:
Preface 9
Table 2. Style conventions (continued)
Formatting Description
Monospace Used for:
● System code
● System output, such as an error message or script
● Pathnames, file names, file name extensions, prompts, and syntax
● Commands and options
Monospace italic Used for variables.
Monospace bold Used for user input.
[] Square brackets enclose optional values.
| Vertical line indicates alternate selections. The vertical line means or for the alternate
selections.
{} Braces enclose content that the user must specify, such as x, y, or z.
... Ellipses indicate non-essential information that is omitted from the example.
You can use the following resources to find more information about this product, obtain support, and provide feedback.
Knowledgebase
The Knowledgebase contains applicable solutions that you can search for either by solution number (for example, KB000xxxxxx)
or by keyword.
To search the Knowledgebase:
1. Go to Dell Customer Support.
2. On the Support tab, click Knowledge Base.
3. In the search box, type either the solution number or keywords. Optionally, you can limit the search to specific products by
typing a product name in the search box, and then selecting the product from the list that appears.
Live chat
To participate in a live interactive chat with a support agent:
1. Go to Dell Customer Support.
2. On the Support tab, click Contact Support.
3. On the Contact Information page, click the relevant support, and then proceed.
10 Preface
Service requests
To obtain in-depth help from Licensing, submit a service request. To submit a service request:
1. Go to Dell Customer Support.
2. On the Support tab, click Service Requests.
NOTE: To create a service request, you must have a valid support agreement. For details about either an account or
obtaining a valid support agreement, contact a sales representative. To find the details of a service request in the Service
Request Number field, type the service request number, and then click the right arrow.
Online communities
For peer contacts, conversations, and content on product support and solutions, go to the Dell Community Network.
Interactively engage with customers, partners, and certified professionals online.
Preface 11
1
Overview
This chapter includes the following topics:
Topics:
• Introduction
• NetWorker data flow
Introduction
The NetWorker software is a network storage management application that is optimized for the high-speed backup
and recovery operations of large amounts of complex data across entire datazones. This guide addresses non-disruptive
performance tuning options. Although some physical devices may not meet the expected performance, it is understood that
when a physical component is replaced with a better performing device, another component becomes a bottle neck.
This manual tries to address NetWorker performance tuning with minimal disruptions to the existing environment. It tries to
fine-tune feature functions to achieve better performance with the same set of hardware, and to assist administrators to:
● Understand data transfer fundamentals
● Determine requirements
● Identify bottlenecks
● Optimize and tune NetWorker performance
NOTE: Any references to the Data Domain systems and the Data Domain devices in the document indicate PowerProtect
DD appliances.
12 Overview
Initial Handshake Communication
Client Direct
nsrindexd nsrmmdbd
nsrmmgd
nsrexcd nsrjobd nsrsnmd
Data
Tracking info
Interprocess
communication
Overview 13
Initial Handshake Communication
Client Direct
nsrindexd nsrmmdbd
recover
nsrmmd
ansrd
nsrlcpd
nsrjobd nsrmmgd
14 Overview
2
Size the NetWorker Environment
This chapter describes how to best determine backup and system requirements. The first step is to understand the
environment. Performance issues are often attributed to hardware or environmental issues. An understanding of the entire
backup data flow is important to determine the optimal performance expected from the NetWorker software.
Topics:
• Expectations
• System components
• Storage considerations
• Backup operation requirements
• Components of a NetWorker environment
• Recovery performance factors
• Parallel restore
• Connectivity and bottlenecks
Expectations
You can determine the backup performance expectations and required backup configurations for your environment based on the
Recovery Time Objective (RTO) for each client.
System components
Every backup environment has a bottleneck. It may be a fast bottleneck, but the bottleneck will determine the maximum
throughput obtainable in the system. Backup and restore operations are only as fast as the slowest component in the backup
chain.
Performance issues are often attributed to hardware devices in the datazone. This guide assumes that hardware devices are
correctly installed and configured.
This section discusses how to determine requirements. For example:
● How much data must move?
● What is the backup window?
● How many drives are required?
● How many CPUs are required?
Devices on backup networks can be grouped into four component types. These are based on how and where devices are used.
In a typical backup network, the following four components are present:
● System
● Storage
● Network
● Target device
System
Several system configuration components impact performance:
● CPU
● Memory
● System bus (this determines the maximum available I/O bandwidth)
CPU requirements
Determine the optimal number of CPUs required, if 5 MHz is required to move 1 MB of data from a source device to a target
device. For example, a NetWorker server, or storage node backing up to a local tape drive at a rate of 100 MB per second,
requires 1 GHz of CPU power:
● 500 MHz is required to move data from the network to a NetWorker server or storage node.
● 500 MHz is required to move data from the NetWorker server or storage node to the backup target device.
NOTE: 1 GHz on one type of CPU does not directly compare to a 1 GHz of CPU from a different vendor.
Because the NMC UI uses more memory when processing messages from the RabbitMQ service, it is recommended that
you change the default Heap memory from 4 GB to 12 GB for small, medium, and large-scale environments.
Sizing information for NetWorker 19.7 and later on physical and virtual hosts
Table 3. Sizing information for a physical server
Configur NetWorker server NMC server NMC client (To launch NMC) NWUI server
ation (Installed (Installed
independent of independent of
NetWorker server) NetWorker server)
Minimum Minimum Minimum Minimum Minimum Minimum Java heap Minimum Minimum
CPU memory CPU memory CPU memory size CPU memory
required required required required required required required required required
Small 4 CPUs 16 GB 4 CPUs 8 GB 4 CPUs 4 GB 2 GB 4 CPUs 8 GB
configura
tion
(maximu
m 500
clients)
Up to
10,000
jobs per
day
jobs per
day
a. For example, if 8 vCPUs are configured on a virtual server, then the total clock speed of the 8 vCPUs in MHz must be
multiplied with 0.50 for 50% reservation.
b. For example, if 16 GB of RAM is configured on a virtual server, then a minimum of 8 GB must be reserved, which is 50% of
16 GB RAM.
NOTE:
● Calculate the reservation clock speed of your CPU by multiplying the total CPU clock speed of all CPUs in MHz by 0.50
if reservation is 50% or by 0.75 if reservation is 75%.
● For virtual machines running as NetWorker server, ensure that you reserve memory and vCPU.
● Ensure that the swap space is equal to or double the RAM size.
● In the case of cloning, if RPS is enabled (nsrrecopy spawns per process), the server requires additional memory of
around 1.5 GB for each nsrrecopy process to run smoothly. For example, if five nsrrecopy processes are running on
a local or remote storage node, then additional 7.5 GB of memory is required for the clone to complete in a large-scale
environment.
● Media database-related operations using mminfo queries with different switch options (-avot, -avVS, -aS, and so
on) can consume a reasonable amount of memory to process the mminfo query request. For example, On a scaled
media database of around 5 million records, processing of certain mminfo query requests requires an additional memory
of around 7 GB.
● For better performance and scalability of NMC, it is recommended that you have a separate NMC server and a separate
NMC UI client. Ensure that the NMC server and the NetWorker server are running inside the same subnet or vLAN to
avoid latency impact when communicating with the NetWorker server.
● It is recommended that you configure a maximum of 2000 jobs per workflow and a maximum of 100 workflows per
policy. Exceeding these limits will significantly increase the load on the NetWorker server and on the user interface (UI)
response time. NetWorker can process 1024 jobs at a time, the rest of the jobs are queued, multiple workflows are
started concurrently. Do not exceed more than 6000 jobs in a queue at any fixed point in time. To prevent overloading
the server, stagger the workflow start time.
NetWorker Recommended Threshold Limit Schedule workflows in a manner that does not involve
Exceeded: Concurrent Running Workflows running 100 workflows concurrently. The section Distribution
Recommendation is 100, Current value is 106. of workflows and jobs provides more information.
Bus specifications
Bus specifications are based on bus type, MHz, and MB per second.
Bus specifications for specific buses are listed in the following table.
Storage considerations
There are components that impact the performance of storage configurations. They are as follows:
● Storage connectivity:
○ Local versus SAN attached versus NAS attached.
○ Use of storage snapshots.
The type of snapshot technology that is used determines the read performance.
● Some storage replication technologies add significant latency to write access which slows down storage access.
● Storage type:
○ Serial ATA (SATA) computer bus is a storage-interface for connecting host bus adapters to storage devices such as hard
disk drives and optical drives.
○ Fibre Channel (FC) is a gigabit-speed network technology that is primarily used for storage networking.
○ Flash is a non-volatile computer storage that is used for general storage and the transfer of data between computers and
other digital products.
● I/O transfer rate of storage is influenced by different RAID levels, where the best RAID level for the backup server is RAID1
or RAID5. Backup to disk should use RAID3.
● If the target system is scheduled to perform I/O intensive tasks at a specific time, schedule backups to run at a different
time.
● I/O data:
○ Raw data access offers the highest level of performance, but does not logically sort saved data for future access.
○ File systems with a large number of files have degraded performance due to additional processing required by the file
system.
● If data is compressed on the disk, the operating system or an application, the data is decompressed before a backup. The
CPU requires time to re-compress the files, and disk speed is negatively impacted.
NOTE: Avoid using synchronous replication technologies or any other technology that adversely impacts latency.
● For file caching, aggressive file system caching can cause commit issues for:
○ The NetWorker server: All NetWorker databases can be impacted (nsr\res, nsr\index, nsr\mm).
○ The NetWorker storage node: When configured to use Advanced File Type Device (AFTD).
Be sure to disable delayed write operations, and use driver Flush and Write-Through commands instead.
● Disk latency considerations for the NetWorker server are higher than for typical server applications as NetWorker uses
committed I/O: Each write to the NetWorker internal database must be acknowledged and flushed before next write is tried.
This setting avoids any potential data loss in internal databases.
Where storage is replicated or mirrored for /nsr, consider the following:
○ Do not use software based replication as it adds an additional layer to I/O throughput and causes unexpected NetWorker
behavior.
○ With hardware based replication, the preferred method is asynchronous replication as it does not add latency on write
operations.
○ Do not use synchronous replication over long distance links, or links with non-guaranteed latency.
○ SANs limit local replication to 12 km and longer distances require special handling.
○ Do not use TCP networks for synchronous replication as they do not guarantee latency.
○ Consider the number of hops as each hardware component adds latency.
NOTE: Add the additional IOPS only if the bootstrap backup runs concurrently with the normal backup operations. If
the bootstrap backup is configured to run when the NetWorker server is idle, the IOPS requirements do not increase.
In NetWorker 9.0 and later, the Bootstrap backup runs as part of server protection policy. However the IOPS
requirement remains almost same as mentioned in this section.
● IOPS requirements increase if the NetWorker software is configured to start many jobs simultaneously.
To accommodate load spikes, add one IOPS for each parallel session that is started.
It is recommended not to start more than 40 clients per group with the default client parallelism of four. The result is 160
IOPS during group startup.
Starting many clients simultaneously can lead to I/O system starvation.
● Each volume request results in a short I/O burst of approximately 200 IOPS for a few seconds.
For environments running a small number of volumes the effect is minimal. However, for environments with frequent mount
requests, a significant load is added to the NetWorker server. In this case, add 100 IOPS for high activity (more than 50
mount requests per hour). To avoid the excessive load, use a smaller number of large volumes.
● NDMP backups add additional load due to index post-processing.
Add the following TCP parameters when the NetWorker server runs with a heavy load (concurrent runs with a large number of
socket requests being made on the server application ports):
● On a Linux NetWorker server, add the following TCP parameters in the /etc/sysctl.conf file and run the sysctl
--system command:
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 15000 65535
net.core.somaxconn = 1024
● On a Linux NMC server, update the file-max value to 65536 to ensure Postgres database connectivity when the
NetWorker server runs with heavy loads:
HKEY_LOCAL_MACHINE\System\CurrectControlSet\services\Tcpip\Parameters
Value Name: TcpTimedWaitDelay
Data type: REG_DWORD
Base: Decimal
Value: 30
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value Name: MaxUserPort
Data Type: REG_DWORD
Base: Decimal
Value: 65535
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Value Name: TcpNumConnections
Data Type: REG_DWORD
NOTE: Use the default startup script on the NetWorker storage nodes and clients. The open file descriptor parameter is
not required on storage nodes and clients.
NOTE: PSS backups currently ignore the policy workflow action's parallelism, previously known as the savegrp parallelism.
When you set the client parallelism to a value less than the number of save points, some save point backups run in PSS mode,
with only a single stream, and other save points run in the default mode (non-PSS). Therefore, for consistent use of PSS,
maintain the default setting or set the client parallelism to the number of save points. This step ensures streams for each save
point.
NOTE: The PSS incremental backup of a save point with zero to few files changed since its prior backup result in one or
more empty media databases save sets (actual size of 4 bytes). This is expected behavior.
When a PSS enabled UNIX Client resource's parallelism value is greater than the resource's number of save points, the
scheduled backup savegroup process divides the parallelism among the save points and starts PSS save processes for all the
save points at approximately the same time. However, this is done within the limits of the following:
● The NetWorker server
● Group parallelism controls
● Media device session availability
It is recommended that you set the Client resource PSS parallelism value to two times or more the number of save points.
The number of streams for each PSS save point is determined before the backup from its client parallelism value and it remains
fixed throughout the backup. It is a value 1–4 (maximum), where one indicates a single stream with a separate PSS process
that traverses the save point's file system to determine the files to back up. The separation of processes for streaming data
and traversing the file system can improve performance. Also, the number of save processes that run during a PSS save point
backup is equal to the number of save stream processes that are assigned with two additional save processes for both the
director and file system traversal processes.
When the client parallelism is less than its number of save points, some save point backups are run in PSS mode, with only a
single stream. Other save points are run in the default mode (non-PSS). Therefore, for consistent use of PSS, set the client
parallelism to two times or more the number of save points. This ensures multiple streams for each save point.
It is recommended that large, fast file systems that should benefit from PSS be put in a new separate PSS-enabled Client
resource that is scheduled separately from the client's other save points. Separate scheduling is achieved by using two different
save groups with different runtimes, but the same savegroup can be used if you avoid client disk parallel read contention. Also,
use caution when enabling PSS on a single Client resource with the keyword All. All typically expands to include multiple small
operating file systems that reside on the same installation disk. These file systems usually do not benefit from PSS but instead
might waste valuable PSS multi-streaming resources.
Based on the second example, the /sp1 save set record is referred to as the primary and its save set time is used in browsing
and time-based recover operations. It references the two related records (dependents) through the *mbs dependents
attribute. This attribute lists the portable long-format save set IDs of the dependents. Each dependent indirectly references
its primary through save set name and save time associations. Its primary is the save set record with the next highest save time
and save set name with no prefix. Also, each primary record has an *mbs anchor save set time attribute, which references its
dependent with the earliest save set time.
PSS improves on manually dividing save point /sp1, into multiple sub-directories, /sp1/subdirA, /sp1/subdirB... and
typing each subdirectory separately in the Client resource. PSS eliminates the need to do this and automatically performs better
load balancing optimization at the file-level, rather than at the directory level that is used in the manual approach. PSS creates
pseudo sub-directories corresponding to the media save set record names, for example, /sp1, <1>/sp1, and <2>/sp1.
Both time-based recovery and savegroup cloning automatically aggregate the multiple physical save sets of a save point PSS
backup. The multiple physical dependent save sets remain hidden. However, there is no automatic aggregation in save set based
recovery, scanner, nsrmm, or nsrclone -S manual command line usage. The -S option requires the PSS save set IDs of
both primary and dependents to be specified at the command line. However, the -S option should rarely be required with PSS.
When the following PSS client configuration settings are changed, the number of save streams can change for the next save
point incremental backup:
● The number of save points
● The parallelism value
Example 1
The following provides performance configuration alternatives for a PSS enabled client with the following backup requirements
and constraints:
● Two savepoints: /sp200GB and /sp2000GB
● Save streams able to back up at 100 GB/hr
● Client parallelism is set to four (No more than four concurrent streams to avoid disk IO contention)
Based on these requirements and constraints, the following are specific configuration alternatives with the overall backup time
in hours:
● A non-PSS Client resource with both savepoints at one stream each: 20 hours
● A single PSS Client resource with both /sp200GB at two streams and /sp2000GB at two streams for the same save
group: 10 hours
● A non-PSS Client resource with /sp200GB at one stream and a PSS Client resource with /sp2000GB at three streams for
the same client host and same save group: 6.7 hours
● A PSS Client resource with /sp200GB at four streams and another PSS Client resource with /sp2000GB at four streams
for the same client but different sequentially scheduled save groups: 5.5 hours aggregate
Example 2
With client parallelism set to eight and three save points /sp1, /sp2, and /sp3 explicitly listed or expanded by the keyword
ALL for UNIX, the number of PSS streams for each savepoint backup is three, three, and two respectively. The number of
mminfo media database save set records is also three, three, and two respectively.
For a particular save point, /sp1, mminfo, and NMC save set query results shows three save set records each named /sp1,
<1>/sp1, and <2>/sp1. These related records have unique save times that are close to one another. The /sp1 record always
has the latest save time, that is, maximum save time, as it starts last. This makes time-based recovery aggregation for the entire
save point /sp1 work automatically.
Example 3
For a PSS Windows save point backup, the number of streams per save point is estimated in the following two scenarios:
● The client parallelism per save point, where client parallelism=5, and the number of save points=2, the number of PSS
streams is three for the first save point, and two streams for the second.
For the save set ALL, with two volumes and client parallelism=5, each volume (save point) gets two streams.
● Using client parallelism=4, every save point is given two save streams. Both DISASTER_RECOVERY:\ volumes, C:\, and
D:\ are given two streams also.
For the save set ALL, the DISASTER_RECOVERY:\ save set is considered to be a single save point. For this example, the
system has C:\, D:\, and E:\, where C:\, and D:\ are the critical volumes that make up the DISASTER_RECOVERY:\
save set.
The save operation controls how the save points are started, and the total number of streams never exceeds the client
parallelism value of 4.
This setting will use one stream per client save set entry by default, with the exception of two streams for each of /data1, /
data2 and /data3, and eight streams for each of /data4 and /data5. Client-supported wildcard characters can be used.
After setting the environment variable, restart the NetWorker services for the changes to take effect. Increasing the default
maximum value can improve the performance for clients with very fast disks.
On Windows, launch NMC and the NetWorker Administration window, and then go to View > Diagnostic Mode >
Protection > Clients > Client Properties > Apps & Modules > Save operations and set the following:
PSS:streams_per_ss=2,C:\, D:\, 8, E:\, F:\HR
This Windows PSS client setting will continue to use the default four streams for each save point not explicitly listed here, but
two streams each for the C:\ and D:\ drives, and eight streams each for the E:\ drive and F:\HR folder.
NOTE: Note PSS backups currently ignore the policy workflow action's parallelism, previously known as the savegrp
parallelism.
When you set the client parallelism to a value less than the number of save points, some save point backups run in PSS mode,
with only a single stream, and other save points will run in the default mode (non-PSS). Therefore, for consistent use of PSS,
maintain the default setting or set the client parallelism to the number of save points. This ensures multiple streams for each
save point.
NOTE: The PSS incremental backup of a save point with zero to few files changed since its prior backup will result in one or
more empty media database save sets (actual size of 4 bytes), which is to be expected.
PSS enabled, CP=6 with 3 client save points
In NetWorker releases previous to NetWorker 19.12, if you set CP=6 and have three client save points, PSS will start all save
points together at two streams each, and each save point will remain at two streams, with each stream actively backing up files
from the start.
NetWorker 19.12, however, would start save point one with four active backup streams, and simultaneously start save point
two with two active streams and two idle streams. If save point one finishes first, then save point two could end up with four
active streams, and save point three would then start with two active streams and two idle streams. Depending on the time it
takes the save point to complete, save point two could remain as it started and save point three may start similar to how save
point one started, with four active streams. An idle stream is one that has not yet started saving data and will only become
active when CP allows. The total number of active streams from all save sets at any one point in time will not exceed CP. It is
recommended that you specify a value of 4 or a multiple of four to avoid idle streams.
● To list only the primary save sets for all /sp1 full and incremental backups, type the following command:
(1) A small NetWorker server environment is considered to have fewer than 500 clients, or 256 concurrent backup sessions.
(2) A medium NetWorker server environment is considered to have more than 500, and up to 1000 clients or 512 concurrent
backup sessions.
(3) A large NetWorker server environment is considered to have more than 1000 clients, and up to 2000 clients or 1024
concurrent backup sessions.
IOPS considerations
The following are considerations and recommendations for IOPS values:
● The NetWorker software does not limit the number of clients per datazone, but a maximum of 1000 clients is recommended
due to the complexity of managing large datazones, and the increased hardware requirements on the NetWorker server.
NOTE: As the I/O load on the NetWorker server increases, so does the storage layer service time. If service times
exceed the required values there is a direct impact on NetWorker server performance and reliability. Information on the
requirements for maximum service times are available in NetWorker server and storage node disk write latency.
● The NetWorker server performs the data movement itself. If the backup device resides on the server rather than the
NetWorker storage node, the backup performance is directly impacted.
Examples 2 and 3 are based on the preceding requirements that are listed in Table 7.
NOTE: This example identifies that the difference in NetWorker configuration can result in up to a 250% additional load
on the NetWorker server. Also, the impact on sizing is such that well-optimized large environments perform better than
non-optimized medium environments.
NOTE: The expected results are approximately 20 minutes per each 10 million files.
File history processing creates a significant I/O load on the backup server, and increases IOPS requirements by 100-120 I/O
operations per second during processing. If minimum IOPS requirements are not met, file history processing can be significantly
slower.
Network
Several components impact network configuration performance:
● IP network:
A computer network made of devices that support the Internet Protocol to determine the source and destination of network
communication.
● Storage network:
Target device
Storage type and connectivity have the types of components that impact performance in target device configurations. They are
as follows:
● Storage type:
○ Raw disk versus Disk Appliance:
■ Raw disk: Hard disk access at a raw, binary level, beneath the file system level.
■ Disk Appliance: A system of servers, storage nodes, and software.
○ Physical tape versus virtual tape library (VTL):
■ VTL presents a storage component (usually hard disk storage) as tape libraries or tape drives for use as storage
medium with the NetWorker software.
■ Physical tape is a type of removable storage media, generally referred to as a volume or cartridge, that contains
magnetic tape as its medium.
● Connectivity:
○ Local, SAN-attached:
A computer network, separate from a LAN or WAN, designed to attach shared storage devices such as disk arrays and
tape libraries to servers.
○ IP-attached:
The storage device has its own unique IP address.
Console
Servers
Storage Nodes
Devices
NetWorker
Servers
NetWorker
Clients
Data Zones
Datazone
A datazone is a single NetWorker server and its client computers. Additional datazones can be added as backup requirements
increase.
NOTE: It is recommended to have no more than 1500 clients or 3000 client instances per NetWorker datazone. This
number reflects an average NetWorker server and is not a hard limit.
NOTE: The Java heap memory is a critical component for UI response. The default heap size on NMC server is 2 GB. If
the NMC server is running to handle a large scale NetWorker server, then it is recommended that you change the Java
heap memory between 6 GB to 12 GB.
NOTE: The recommended maximum number of save sets that can be recovered together using the NMC UI is 500. To
perform concurrent restore of a large number of save sets (100 or more save sets), do the following:
1. Install the NMC server on a separate machine.
2. Increase the NMC UI's Java heap memory size to a maximum of 16 GB.
If the save sets are distributed across multiple volumes, then a delay in restore can be expected, which is proportional to
the number of volumes involved.
3. Increase the value of the Xmx2048m attribute to a value which is either 6 GB or 12 GB, based on the NetWorker server
memory availability.
Evaluate the workload for any datazone with more than a 100K jobs per day, and consider moving some of jobs to other
datazones.
If the NetWorker server and NMC are migrated from a prior NetWorker 9.x installation to a NetWorker 9.1.x installation, then
it is recommended to distribute the multiple workflows (previously configured as savegroups), among multiple policies with
the above recommendations. For example, assuming a NetWorker 9.x datazone has 800 savegroups, if this gets migrated to
NetWorker 9.1x, then all the savegroups are converted into workflows under a single backup policy. It is recommended to
distribute these workflows among multiple policies and schedule them with interleaved time intervals. Adhere to the above
recommendations when you are running multiple policies and workflows simultaneously.
NetWorker 18.1 and later has optimized Java heap memory within the UI so that large scale environments (100K jobs per day),
can be handled with 6GB to 12 GB of Java heap memory.
Console database
Use formulas to estimate the size and space requirements for the Console database.
NOTE: Since the amount of required disk space is directly related to the amount of historical data that is stored, the
requirements can vary greatly, on average between 0.5 GB and several GB. Allow space for this when planning hardware
requirements.
Formulas for estimating the space required for the Console database
information
There are existing formulas used to estimate the space needed for different types of data and to estimate the total space
required.
NetWorker server
NetWorker servers provide services to back up and recover data for the NetWorker client computers in a datazone. The
NetWorker server can also act as a storage node and control multiple remote storage nodes.
Index and media management operations are some of the primary processes of the NetWorker server:
● The client file index tracks the files that belong to a save set. There is one client file index for each client.
● The media database tracks:
○ The volume name
○ The location of each save set fragment on the physical media (file number/file record)
○ The backup dates of the save sets on the volume
○ The file systems in each save set
● Unlike the client file indexes, there is only one media database per server.
● The client file indexes and media database can grow to become prohibitively large over time and will negatively impact
backup performance.
● The NetWorker server schedules and queues all backup operations, tracks real-time backup and restore related activities,
and all NMC communication. This information is stored for a limited amount of time in the jobsdb which for real-time
operations has the most critical backup server performance impact.
NOTE: The data stored in this database is not required for restore operations.
NOTE: The system load that results from storage node processing is significant in large environments. For enterprise
environments, the backup server should backup only its internal databases (index and bootstrap).
NetWorker client
A NetWorker client computer is any computer whose data must be backed up. The NetWorker Console server, NetWorker
servers, and NetWorker storage nodes are also NetWorker clients.
NetWorker clients hold mission critical data and are resource intensive. Applications on NetWorker clients are the primary users
of CPU, network, and I/O resources. Only read operations performed on the client do not require additional processing.
Client speed is determined by all active instances of a specific client backup at a point in time.
NOTE: BBB and DPSS must be used when millions of files are used on the clients as save sets.
● Backup data must be transferred to target storage and processed on the backup server:
○ Client/storage node performance:
■ A local storage node: Uses shared memory and does not require additional overhead.
■ A remote storage node: Receive performance is limited by network components.
○ Client/backup server load:
Does not normally slow client backup performance unless the backup server is significantly undersized.
NetWorker databases
Several factors determine the size of NetWorker databases.
These factors are available in NetWorker database bottlenecks.
Virtual environments
NetWorker clients can be created for virtual machines for either traditional backup or VBA backup in the case of NetWorker
8.2.x or vProxy backups in the case of NetWorker 9.1.x or later.
Additionally, the NetWorker software can automatically discover virtual environments and changes to those environments on
either a scheduled or on-demand basis and provides a graphical view of those environments.
Parallel restore
Starting with NetWorker 19.2, the recover workflow for file system backups has been enhanced to perform restores in parallel.
The new improved logic tries to split recover requests into multiple recover requests resulting in more than one recover thread
yielding better recover performance in comparison to earlier versions.
The following restore workflows are supported with Data Domain and AFTD devices:
● File level restore
● Save set restore
1 Gbps Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6419
As illustrated in the following figure, the network is upgraded from a 1 GigE network to a 10 GigE network, and the
bottleneck has moved to another device. The host is now unable to generate data fast enough to use the available network
bandwidth. System bottlenecks can be due to lack of CPU, memory, or other resources.
DATA DOMAIN
10 GigE Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6420
DATA DOMAIN
GigE Network
Control/
Meta Data
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
CL6421
As illustrated in the following figure, higher performance tape devices on a SAN remove them as the bottleneck. The
bottleneck device is now the storage devices. Although the local volumes are performing at optimal speeds, they are unable
to use the available system, network, and target device resources. To improve the storage performance, move the data
volumes to high performance external RAID arrays.
DATA DOMAIN
GigE Network
Control/
Meta Data Data
(Client Direct)
SAN
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
As illustrated in the following figure, the external RAID arrays have improved the system performance. The RAID arrays
perform nearly as well as the other components in the chain ensuring that performance expectations are met. There will
always be a bottleneck, however the impact of the bottleneck device is limited as all devices are performing at almost the
same level as the other devices in the chain.
NetWorker
Server
DATA DOMAIN
GigE Network
Control/
Meta Data Data
(Client Direct)
SAN
NetWorker NetWorker
Client Storage Node
Control Path
Data Path
RAID Array Tape Device
NOTE: The index database can be split over multiple locations, and the location is determined on a per client basis.
The following figure illustrates the overall performance degradation when the disk performance on which NetWorker media
database resides is a bottleneck. The chart on the right illustrates net data write throughput (save set + index + bootstrap)
and the chart on the left is save set write throughput.
Server parallelism
The server parallelism attribute controls how many save streams the server accepts simultaneously. The more save streams the
server can accept, the faster the devices and client disks run. Client disks can run at their performance limit or the limits of the
connections between them. The default Server parallelism is 32 you can configure the parallelism up to 1024.
Server parallelism is not used to control the startup of backup jobs, but as a final limit of sessions accepted by a backup server.
The server parallelism value should be as high as possible while not overloading the backup server itself.
NOTE: If you schedule more than 50 concurrent clone workflows in a data zone, ensure that you configure the server
parallelism value to 1024 to avoid the starvation of streams reserved by clone operation.
Tune Settings 47
Action parallelism
Action parallelism defines the maximum number of simultaneous data streams that can occur on all clients in a group that is
associated with the workflow that contains action.
Data streams include back data streams, savefs processes, and probe jobs. For a Backup action, the default parallelism value is
100. For all other action types, the default value is 0, or unlimited.
Multiplexing
The Target Sessions attribute sets the target number of simultaneous save streams that write to a device. This value is not a
limit, therefore a device might receive more sessions than the Target Sessions attribute specifies. The more sessions specified
for Target Sessions, the more save sets that can be multiplexed (or interleaved) onto the same volume.
AFTD device target and max sessions provides additional information on device Target Sessions.
Performance tests and evaluation can determine whether multiplexing is appropriate for the system. Follow these guidelines
when evaluating the use of multiplexing:
● Find the maximum rate of each device. Use the bigasm test described in The bigasm directive.
● Find the backup rate of each disk on the client. Use the uasm test described in The uasm directive.
If the sum of the backup rates from all disks in a backup is greater than the maximum rate of the device, do not increase server
parallelism. If more save groups are multiplexed in this case, backup performance will not improve, and recovery performance
might slow down.
Disk optimization
NetWorker uses an intelligent algorithm when reading files from a client to choose an optimal block size value in the range of 64
KB and 8 MB based on the current read performance of the client system.
This block size selection occurs during the actual data transfer and does not add any overhead to the backup process, and
potentially significantly increases disk read performance.
48 Tune Settings
NOTE: Read block size is not related to device block size used for backup, which remains unchanged.
This feature is transparent to the rest of the backup process and does not require any additional configuration.
You can override the dynamic block size by setting the NSR_READ_SIZE environment variable to a desired value in the
NetWorker client. For example, NSR_READ_SIZE=65536 forces the NetWorker software to use 64 KB block size during the
read process.
Built-in compression
Turn on device compression to increase effective throughput to the device.
Some devices have a built-in hardware compression feature. Depending on how compressible the backup data is, this can
improve effective data throughput, from a ratio of 1.5:1 to 3:1.
Drive streaming
To obtain peak performance from most devices, stream the drive at its maximum sustained throughput.
Without drive streaming, the drive must stop to wait for its buffer to refill or to reposition the media before it can resume
writing. This can cause a delay in the cycle time of a drive, depending on the device.
Tune Settings 49
e. Click Defragment disk. If prompted for an administrator password or confirmation, type the password or provide
confirmation.
NOTE: The defragmentation might take from several minutes to a few hours to complete, depending on the size and
degree of fragmentation of the hard disk. You can still use the computer during the defragmentation process.
Network devices
Data that is backed up from remote clients, the routers, network cables, and network interface cards can affect the backup and
recovery operations.
This section lists the performance variables in network hardware, and suggests some basic tuning for networks. The following
items address specific network issues:
● Network I/O bandwidth:
The maximum data transfer rate across a network rarely approaches the specification of the manufacturer because of
network protocol overhead.
NOTE: The following statement concerning overall system sizing must be considered when addressing network
bandwidth.
Each attached tape drive (physical VTL or AFTD) uses available I/O bandwidth, and also consumes CPU as data still requires
processing.
● Network path:
Networking components such as routers, bridges, and hubs consume some overhead bandwidth, which degrades network
throughput performance.
● Network load:
○ Do not attach a large number of high-speed NICs directly to the NetWorker server, as each IP address use significant
amounts of CPU resources. For example, a mid-size system with four 1 GB NICs uses more than 50 percent of its
resources to process TCP data during a backup.
○ Other network traffic limits the bandwidth available to the NetWorker server and degrades backup performance. As the
network load reaches a saturation threshold, data packet collisions degrade performance even more.
○ The nsrmmdbd process uses a high CPU intensive operation when thousands of save sets are processed in a single
operation. Therefore, cloning operations with huge save sets and NetWorker maintenance activities should run outside of
the primary backup window.
NOTE: High bandwidth does not directly increase performance if latency is the cause of slow data.
50 Tune Settings
Table 12. The effect of blocksize on an LTO-4 tape drive (continued)
Blocksize Local backup performance Remote backup performance
512 KB 173 MB/second 130 MB/second
1024 KB 173 MB/second 130 MB/second
The following figure illustrates that the NetWorker backup throughput drops from 100 percent to 0 percent when the delay is
set from 0.001 ms to 2.0 ms.
Data Domain
Backup to Data Domain storage can be configured by using multiple technologies:
● NetWorker 8.1 and later supports DD Boost over Fibre Channel. This feature leverages the advantage of the boost protocol
in a SAN infrastructure. It provides the following benefits:
○ DD Boost over Fibre Channel (DFC) backup with Client Direct is 20–25% faster when compared to backup with DD VTL.
○ The next subsequent full backup is three times faster than the first full backup.
○ Recovery over DFC is 2.5 times faster than recovery using DD VTL.
● Backup to VTL:
○ NetWorker devices are configured as tape devices and data transfer occurs over Fibre Channel.
○ Information on VTL optimization is available in Number of virtual device drives versus physical device drives.
● Backup to AFTD over CIFS or NFS:
○ Overall network throughput depends on the CIFS and NFS performance which depends on network configuration.
Network optimization provides best practices on backup to AFTD over CIFS or NFS.
○ Inefficiencies in the underlying transport limits backup performance to 70-80% of the link speed.
● The Client Direct attribute to enable direct file access (DFA):
○ Client Direct to Data Domain (DD) using Boost provides much better performance than DFA-AFTD using CIFS/NFS.
○ Backup performance with client direct enabled (DFA-DD/DFA-AFTD) is 20–60% faster than traditional backup using
nsrmmd.
○ With an increasing number of streams to single device, DFA handles the backup streams much better than nsrmmd.
● The minimum required memory for a NetWorker Data Domain Boost device with each device total streams set to 10 is
approximately 250 MB. Each OST stream for BOOST takes an additional 25 MB of memory.
● Compared to traditional (non-DFA) backups, backups utilizing DDBoost require 2-40% of additional CPU, but for a much
shorter period. Overall, the CPU load of a backup utilizing DDBoost is lower than traditional backup.
● Networker 19.9 onwards, you can configure up to 160 DD Boost devices per storage Node.
Tune Settings 51
CloudBoost
The CloudBoost device leverages the CloudBoost appliance and creates the NetWorker device on the cloud object store that is
hosted on a CloudBoost appliance.
The following are CloudBoost benefits:
● Leverages the sending of a NetWorker client backup to the cloud for long term retention capabilities.
● Data can be sent directly to the Cloud from Linux x64 clients. For other client types, data is written to the cloud via a
CloudBoost Storage Node.
● Data can be restored directly from the Cloud for Linux x64 clients. For other client types, the restore is performed via a
CloudBoost Storage Node.
● NetWorker 18.1 allows Windows x64 clients to perform backup and recovery directly to and from the Cloud.
● The default target sessions for a CloudBoost device type are 10 for NetWorker 9.1, and 4 for NetWorker 9.0.1. The default
maximum sessions are 80 for NetWorker 9.1, and 60 for NetWorker 9.0.1. For better performance, it is recommended that
you keep the default values for Target and Maximum sessions.
● CloudBoost performs native deduplication, similar to the Data Domain device, the consecutive backups can be 2–3 time
faster, based on the rate of change in data.
The NetWorker with CloudBoost Integration Guide provides details on configuring the Cloud appliance and device.
52 Tune Settings
● Max nsrmmd count is an advanced setting that can be used to increase data throughput by restricting the number of
backup processes that the storage node can simultaneously run. When the target or max sessions are changed, the max
nsrmmd count is automatically adjusted according to the formula MS/TS plus four. The default values are 12 (FTD/AFTD)
and 4 (DD Boost devices).
NOTE: It is not recommended to modify both session attributes and max nsrmmd count simultaneously. If you must
modify all these values, adjust the sessions attributes first, apply the changes, and then update max nsrmmd count.
Tune Settings 53
Network optimization
Adjust the following components of the network to ensure optimal performance.
NOTE: It is required that all network components in the data path can handle jumbo frames. Do not enable jumbo
frames if this is not the case.
● TCP hardware offloading is beneficial if it works correctly. However, it can cause CRC mismatches. Be sure to monitor for
errors if it is enabled.
● TCP windows scaling is beneficial if it is supported by all network equipment in the chain.
● TCP congestion notification can cause problems in heterogeneous environments. Only enable it in single operating system
environments.
Advanced tuning
IRQ processing for high-speed NICs is very expensive, but can provide enhanced performance by selecting specific CPU cores.
Specific recommendations depend on the CPU architecture.
54 Tune Settings
Network latency
Increased network TCP latency has a negative impact on overall throughput, despite the amount of available link bandwidth.
Longer distances or more hops between network hosts can result in lower overall throughput.
Network latency has a high impact on the efficiency of bandwidth use.
For example, The following figures illustrate backup throughput on the same network link, with varying latency.
For these examples, non-optimized TCP settings were used.
Ethernet duplexing
Network links that perform in half-duplex mode cause decreased NetWorker traffic flow performance.
For example, a 100 MB half-duplex link results in backup performance of less than 1 MB per second.
The default configuration setting on most operating systems for duplexing is automatically negotiated as recommended by
IEEE802.3. However, automatic negotiation requires that the following conditions are met:
● Proper cabling
Tune Settings 55
● Compatible NIC adapter
● Compatible switch
Automatic negotiation can result in a link performing as half-duplex.
To avoid issues with auto negotiation, force full-duplex settings on the NIC. Forced full-duplex setting must be applied to both
sides of the link. Forced full-duplex on only one side of the link results in failed automatic negotiation on the other side of the
link.
Firewalls
The additional layer on the I/O path in a hardware firewall increases network latency, and reduces the overall bandwidth use.
Use software firewalls on the backup server as it processes many packets that result in significant overhead.
Details on firewall configuration and impact are available in the NetWorker Administration Guide.
Jumbo frames
Use jumbo frames in environments capable of handling them. If both the source, the computers, and all equipment in the data
path can handle jumbo frames, increase the MTU to 9 KB.
These examples are for Linux and Solaris operating systems:
● On Linux, type the following command to configure jumbo frames:
ifconfig eth0 mtu 9000 up
● On Solaris, type the following command to configure jumbo frames for a nxge device:
To determine the instance number of a following device, type the following command:
nxge /etc /path_to_inst
Congestion notification
Methods to disable congestion notification algorithms vary based on the operating system.
On Windows Server 2012, and 2012 R2:
● Disable optional congestion notification algorithms by typing the following command:
C:\> netsh interface tcp set global ecncapability=disabled
● Compound TCP is an advanced TCP algorithm that provides the best results on Windows via the TCP Global parameter
Add-On Congestion Control Provider. The value for this parameter is none if Compound TCP is disabled, or ctcp if
Compound TCP is enabled.
If both sides of the network conversion are not capable of the negotiation, you can disable Add-On Congestion Control
Provider by typing the following command:
C:\> netsh interface tcp set global congestionprovider=none
NOTE: A reboot of the system is required if you enable Add-On Congestion Control Provider by typing the command
C:\> netsh int tcp set global congestionprovider=ctcp.
On Linux systems:
● Check for non-standard algorithms by typing the following command:
cat /proc/sys/net/ipv4/tcp_available_congestion_control
● To disable ECN type the following command:
echo 0 >/proc/sys/net/ipv4/tcp_ecn
56 Tune Settings
On Solaris systems:
● To disable TCP Fusion, if present, type the following command:
set ip:do_tcp_fusion = 0x0
TCP buffers
When the rate of inbound TCP packets is higher than the system can process, the operating system drops some of the packets.
This scenario can lead to an undetermined NetWorker state and unreliable backup operations. For NetWorker server or storage
node systems that are equipped with high-speed interfaces, it is critical to monitor the system TCP statistics for dropped TCP
packets, commonly done by using the netstat -s command. To avoid dropped TCP packets, increase the TCP buffer size.
Depending on the operating system, this parameter is referred to as buffer size, queue size, hash size, backlog, or connection
depth.
For high-speed network interfaces, increase size of TCP send/receive buffers.
NetWorker server
● Linux:
To modify the TCP buffer settings on Linux:
1. Add the following parameters to the /etc/sysctl.conf file:
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 8192 524288 16777216
net.ipv4.tcp_wmem = 8192 524288 16777216
net.ipv4.tcp_fin_timeout: 120
2. Type the following command:
/sbin/sysctl -p
3. Set the recommended RPC value:
sunrpc.tcp_slot_table_entries = 64
4. Enable dynamic TCP window scaling which requires compatible equipment in the data path:
sysctl -w net.ipv4.tcp_window_scaling=1
● Solaris:
To modify the TCP buffer settings on Solaris, type the following command:
tcp_max_buf 10485760
tcp_cwnd_max 10485760
tcp_recv_hiwat 65536
tcp_xmit_hiwat 65536
● Windows:
The default Windows buffer sizes are sufficient. To modify the TCP buffer settings on Windows:
○ Set the registry entry:
AdditionalCriticalWorkerThreads: DWORD=10
○ If the NIC drivers can create multiple buffers or queues at the driver-level, enable it at the driver level. For example, Intel
10 GB NIC drivers by default have RSS Queues set to two, and the recommended value for best performance is 16.
○ Increase the recycle time of ports in TIME_WAIT as observed in netstat commands:
On Windows, set the following registry entries:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay
Data type REG_DWORD
Range 0x1E 0x12C ( 30–300 seconds )
Default value 0xF0 ( 240 seconds = 4 minutes )
Tune Settings 57
○ Increase the TIME_WAIT seconds:
net.ipv4.tcp_fin_timeout: 120
● Solaris:
To modify the TCP buffer settings on Solaris, type the following command:
tcp_max_buf 10485760
tcp_cwnd_max 10485760
tcp_recv_hiwat 65536
tcp_xmit_hiwat 65536
● AIX:
To modify the TCP buffer settings on AIX, modify the values for the parameters in /etc/rc.net if the values are lower
than the recommended. Consider the following:
○ The number of bytes a system can buffer in the kernel on the receiving sockets queue:
no -o tcp_recvspace=524288
○ The number of bytes an application can buffer in the kernel before the application is blocked on a send call:
no -o tcp_sendspace=524288
● Windows:
The default Windows buffer sizes are sufficient. To modify the TCP buffer settings on Windows:
○ Set the registry entry:
AdditionalCriticalWorkerThreads: DWORD=10
○ If the NIC drivers can create multiple buffers or queues at the driver-level, enable it at the driver level. For example, Intel
10 GB NIC drivers by default have RSS Queues set to two, and the recommended value for best performance is 16.
○ Increase the recycle time of ports in TIME_WAIT as observed in netstat commands:
On Windows, set the following registry entries:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay
Data type REG_DWORD
Range 0x1E 0x12C ( 30–300 seconds )
Default value 0xF0 ( 240 seconds = 4 minutes )
The net.core.somaxconn value default is 128. Raise the value substantially to support bursts of requests. For example, to
support a burst of 1024 requests, set net.core.somaxconn to 1024:
net.core.somaxconn = 1024
# vi /etc/sysctl.conf
2. Add value:
variable=value
3. Save the changes and load sysctl settings from the /etc/sysctl.conf file:
# sysctl -p
58 Tune Settings
The changes will reflect after restarting the NetWorker core services or by rebooting the system. To verify the updated value,
run the command: ss -ntlp -o sport = :7938 where 7938 is the port where nsrexecd runs.
NOTE: On Windows platform, net.core.somaxconn value is 0x7fffffff, which is a high value. Hence, it does not
require an update.
SMP affinity works only for IO-APIC enabled device drivers. Check for the IO-APIC capability of a device by using
cat /proc/interrupts, or by referencing the device documentation.
● Solaris:
Interrupt only one core per CPU. For example, for a system with 4 CPUs and four cores per CPU, use this command:
psradm -i 1-3 5-7 9-11 13-15
Some NIC drivers artificially limit interrupt rates to reduce peak CPU use which limits the maximum achievable throughput. If
a NIC driver is set for “Interrupt moderation,” disable it for optimal network throughput.
Interrupt moderation
On Windows, for a 10 GB network, it is recommended to disable interrupt moderation for the network adapter to improve
network performance.
Tune Settings 59
C:\> netsh interface tcp set global dca=enabled
● Disable TCP offloading for older generation NIC cards that exhibit problems such as backup sessions that stop responding,
failures due to RPC errors, or connection reset (CRC) errors similar to the following:
Connection reset by peer
NOTE: TCP chimney offloading can cause CRC mismatches. Ensure that you consistently monitor for errors when you
enable TCP chimney offloading.
Name resolution
The NetWorker server relies heavily on the name resolution capabilities of the operating system.
For a DNS server, set low-latency access to the DNS server to avoid performance issues by configuring, either of the following:
● Local DNS cache
● Local non-authoritative DNS server with zone transfers from the main DNS server
Ensure that the server name and the hostnames that are assigned to each IP address on the system are defined in the hosts file
to avoid DNS lookups for local hostname checks.
However, in a large NetWorker environment, it might be required to temporarily retire or decommission a client. During the
retired phase, you may want the client to be part of the infrastructure but removed from active protection, scheduled, or
manual backup.
The retired client can still have valid backup copies which you may want to restore and clone. Whereas in the decommissioned
phase, you might not want to perform any further backup, restore, and clone operations with the client. To retire or
decommission a client, you remove the client from the DNS entries. This results in DNS lookup failures and therefore slower
NetWorker startup time.
To simplify this process, NetWorker 19.4 and later provides you with an option to set the state of the client using an attribute in
the RAP resource. Based on the client state, NetWorker makes an appropriate decision whether to perform DNS lookup or not.
From NetWorker 19.4 and later, DNS lookups for clients in the retired and the decommissioned state are avoided. This has led to
a reduced NetWorker startup time by up to two or three times compared to previous releases when 40 percent of the clients in
the datazone are marked as retired or decommissioned.
As a performance best practice and to take advantage of the new feature, after upgrading to NetWorker 19.4 or later, it is
recommended that you mark the clients which are not going to be used for backup as retired. For more information about the
feature, see Decommission a Client resource in the NetWorker Administration Guide.
60 Tune Settings
Optimizing nsrmmdbd memory usage
To ensure that the nsrmmdbd memory usage does not surge under conditions of large growth of /nsr/mm/mmvolrel/ss file
size (like 500 MB, 1 GB):
For Linux
You must create the nsrrc file. Add the following lines under the /nsr/nsrrc file:
export MMDB_SQLITE_CONFIGURE_MEMORY=1
export MMDB_SQLITE_PAGECACHE_SIZE=65536
export MMDB_SQLITE_PAGE_COUNT=65536
export MMDB_SQLITE_HEAP_SIZE=1073741824
export MDB_SQLITE_HEAP_MIN_ALLOC_SIZE=128
For Windows
Under Advanced System Settings, go to the Environment Variables setting, and create the following new environment
variables along with the values:
Once the environment variables are added, restart the NetWorker server machine for the settings to get implemented.
Tune Settings 61
4
Test Performance
This chapter describes how to test and understand bottlenecks by using available tools including NetWorker programs such as
bigasm and uasm.
Topics:
• Determine symptoms
• Monitor performance
• Determining bottlenecks by using a generic FTP test
• Testing setup performance using the dd test
• Test disk performance by using bigasm and uasm
• TCP window size and network latency considerations
• Clone performance
• Limit memory usage on the host during clone operations
Determine symptoms
There are many considerations for determining the reason for poor backup performance.
Ask the following questions to determine the cause for poor performance:
● Is the performance consistent for the entire duration of the backup?
● Do the backups perform better when started at a different time?
● Is it consistent across all save sets for the clients?
● Is it consistent across all clients with similar system configuration using a specific storage node?
● Is it consistent across all clients with similar system configuration in the same subnet?
● Is it consistent across all clients with similar system configuration and applications?
Observe how the client performs with different parameters. Inconsistent backup speed can indicate problems with software or
firmware.
For each NetWorker client, answer these questions:
● Is the performance consistent for the entire duration of the backup?
● Is there a change in performance if the backup is started at a different time?
● Is it consistent across all clients using specific storage node?
● Is it consistent across all save sets for the client?
● Is it consistent across all clients in the same subnet?
● Is it consistent across all clients with similar operating systems, service packs, applications?
● Does the backup performance improve during the save or does it decrease?
These and similar questions can help to identify the specific performance issues.
Monitor performance
You can monitor the I/O, disk, CPU, and network performance by using native performance monitoring tools.
The monitoring tools available for performance monitoring include the following:
● Windows: perfmon program
● UNIX: iostat, vmstat, or netstat commands
Unusual activity before, during, and after backups can determine that devices are using excessive resources. By using these
tools to observe performance over a period, resources that are consumed by each application, including NetWorker, are clearly
identified. If it is discovered that slow backups are due to excessive network use by other applications, this can be corrected by
changing backup schedules.
62 Test Performance
High CPU use is often the result of waiting for external I/O, not insufficient CPU power. This is indicated by high CPU use inside
SYSTEM versus user space.
On Windows, if much time is spent on Deferred Procedure Calls, it often indicates a problem with device drivers.
The uasm directive tests disk read speeds, and by writing data to a null device can identify disk-based bottlenecks.
Test Performance 63
TCP window size and network latency considerations
Increased network TCP latency has a negative impact on overall throughput despite the amount of available link bandwidth.
Longer distances or more hops between network hosts can result in lower overall throughput. Since the propagation delay of
the TCP packet depends on the distance between the two locations, increased bandwidth will not help if high latency exists
between the two sites.
Throughput also depends on the TCP window size and the amount of latency between links. A high TCP window size generally
results in better performance, however, with a high latency link, increasing the TCP window may significantly impact the backup
window due to packet loss. Every unsuccessful packet that is sent must be kept in memory and must be re-transmitted in
case of packet loss. Therefore, for TCP windows with a high latency link, it is recommended that you maintain the default TCP
window size.
The network latency impact on NetWorker backup, clone, and recovery performance depends on the control path and data
path:
● Latency between clients and NetWorker server (control path)—The latency impact on the NetWorker control path
(metadata update) can vary based on the type of data you process during NetWorker backup and recovery operations.
For example, if NetWorker clients and the server are separated by a high latency link, and clients back up a high density file
system dataset, the large amount of metadata (file indexes) being sent over the wire impacts the index commit.
● Latency between client and target device (DD) (data path)—Latency between the NetWorker client and the target device
significantly impacts throughput. Any packet loss will further impact throughput. The high latency link in the data path
affects throughput irrespective of the type of data being processed.
The following section provides best practices and recommendations when using high latency networks such as WAN for
NetWorker application data and control paths for backup, clone, and recovery operations.
These examples show the results from using high density datasets (many files but with a low overall size) and large density
datasets (a small number of files but with a large overall size) during backup, clone, and restore workflows.
The data layout is as follows:
● High density file system (FS): 1 million files with approximately 4000 MB overall size
● Large density file system: <1000 files with approximately 390 GB overall size
NOTE: These tests were conducted using the WANem simulation tool by inducing latency and packet loss between the
NetWorker control and data path. Allow for a 10–20% error margin in the results due to the simulation technique.
Figure 14. Large Density FS Backup - WAN between NetWorker Clients and Server (High Packet Loss)
NOTE: The items that are marked in RED are reattempted backups with client retries >0 and <=5
64 Test Performance
Figure 15. Large Density FS Backup - WAN between NetWorker Clients and Server (Low Packet Loss)
Figure 16. High Density FS Backup Performance - WAN between NetWorker Clients and Server
Test Performance 65
WAN Latency impact in data path (NetWorker clients and target device
such as Data Domain separated by high latency) for backup
Figure 17. Large Density FS Backup Performance - WAN between NetWorker Clients and Data Domain
Figure 18. High Density FS Backup Performance - WAN between NetWorker Clients and Data Domain
66 Test Performance
WAN latency impact in data path (Latency between source and target
DDR) for cloning
Figure 19. Large Density FS Clone Performance - WAN between NetWorker Clients and Data Domain
Figure 20. High Density FS Clone Performance - WAN between NetWorker Clients and Data Domain
NOTE: Clone-controlled replication (CCR) performance completely depends on the Data Domain model, the existing load on
DDRs and the latency between two different Data Domain systems that are separated by WAN. The preceding results show
the WAN latency impact on a Large Density File System and High Density File System.
Observations and recommendations:
● If there is high latency link between the source and target DDR, then there is significant impact in clone throughput.
● Every 10 ms increase in latency reduces the clone throughput by 4-45 times.
● The packet loss in the WAN link further reduces the clone throughput by 4-300 times for a large density dataset and by
4-500 times for high density datasets.
● It is not recommended that you exceed the 50 ms latency for large size datasets and 20 ms latency for high density dataset
cloning.
Test Performance 67
WAN latency impact in data path (Latency between source and target
DDR) for recovery
Figure 21. Large Density FS Recovery Performance - WAN between NetWorker Clients and Data Domain
Figure 22. High Density FS Recovery Performance - WAN between NetWorker Clients and Data Domain
NOTE: For large density file system and high density file system dataset restore, time indicates the time that is taken to
perform 10 simultaneous restores.
Observations and recommendations:
● Latency impacts recovery performance similar to the backup and clone workflows.
● If a high latency link exists between the NetWorker client and DDR during recovery, then the performance slows down
drastically.
● Every 10 ms increase in latency reduces the recover throughput by 1-2 times for a high density dataset with multiple client
restore. For a large density dataset with multiple client restore, throughput decreases by 2-10 times with increase in latency.
● The packet loss in the WAN link further reduces the restore throughput by 2-12 times.
● It is not recommended that you exceed the 50 ms latency (with multiple restores) for a high dataset and 20 ms latency (with
multiple restores) for a large dataset during recovery.
Summary
Table 14. Tolerable Range for Low Density file system
WAN path Latency Packet loss
Client - NetWorker server 0-100ms 0-1%
68 Test Performance
Table 14. Tolerable Range for Low Density file system (continued)
WAN path Latency Packet loss
Client - Data Domain (DFA) 0-50ms 0-0.1%
NOTE: Higher latency and packet loss between the data path impacts throughput significantly. You can still use the
high latency link for data path but the NetWorker server might re-attempt the failed backups due to packet loss. It is
recommended that you apply the preceding recommendations to avoid the failures with high latency WAN links.
Clone performance
For small sized save set (KB files), a Recover Pipe to Save (RPS) clone takes 30 seconds more than a non-RPS clone. When a
dataset size is more than 2 GB, an RPS clone performs better than a non-RPS clone.
Test Performance 69