WWoIT - Wayne's World of IT: Cluster

Showing posts with label Cluster. Show all posts

Friday, January 30, 2009

printQueue AD objects for 2003 Cluster

Print queue objects in AD provide a useful facility when users are trying to find printers, but with a 2003 MSCS clustered virtual print spooler, occasionally the information in AD does not reflect the current state of printers. This post describes some problems I've come across with duplicate/incorrect information and some ideas of how to automatically combat the problem.

Print Queue Objects in AD

Print queue objects in 2003 clustering are named with the virtual print server name, but they are children off a physical computer account. Which computer account the printers are children of is determined by the physical node that owned the cluster spooler resource when the printer was originally published in AD. As a virtual print server fails between nodes, the printer objects in the directory are not re-published (I assume unless the object is not found in the directory).

It's intuitive that print queue objects would be republished on failover to the node that currently owns the spooler, but that could potentially be hundreds or thousands of printer objects being created/deleted with each failover so it's practical not to. It appears the printer object is confirmed using the virtual print server name, and no change is made if the object is found - regardless of which physical node the print queue object is a child of.

In the scenario of a stand-alone printer server, when a printer is deleted, the spoolsv service also removes the directory object. In a clustered virtual print server this also occurs, however, it appears that in a 2003 cluster the object is not automatically removed from the directory if the node that owns the object when deleted is different than the publishing node.

None of this really matters if everything is working perfectly, but in a 2003 MSCS I have seen the following situations:

Print queues that no longer exist still being visible through a search in AD
Duplicate print queue objects, published against each physical none in the cluster that has hosted the virtual print spooler.

The first was a bigger problem, and I believe the following scenario will result in stale print queue objects persisting:

You have a two node cluster, CL01 and CL02. CL01 owns a virtual print spooler and other cluster groups, under which you create all the print queues.
At a later time you decide that the load could be better split, and move the virtual print spooler to CL02
You then clean up your print queues from the virtual server, also expecting that they will be automatically removed from AD.

In the scenario above, the print queue objects would not be removed from AD, as the physical node that owns the spooler (CL02) does not own the original print queue objects - as they were created when CL01 owned the resources. In this state, the invalid print queue objects will not be purged. Note that this is assuming you aren't using AD printer pruning - by disabling the spooler service on your DCs or using Group Policy to control pruning.

I'm unsure of the exact scenario that caused the duplicate print queue objects, presumably there was some problem finding the existing record, so at some point it was created off the other node as well - resulting in duplicate results in a search (both of which would work, but still).

Some low maintenance ideas to correct this problem:

Use AD printer pruning, which will ensure print queue objects in AD are managed. Note that this sounds like the obvious solution, but does have caveats and may not suit all environments.
Periodically remove published records from all but the designated primary node, toggle the published attribute on those printers no longer having a record in AD, causing the printers to be republished against the primary node. This could easily be scripted and scheduled
Modify printer creation change control processes to ensure that new printers are only created and deleted when the preferred owner is hosting the virtual print server

In an ideal world, three above followed by one make the most sense, but if you needed option two you could do something like this:

dsrm CN=%virtual_server%-%QueueName%,CN=%physical_server%,DC=domainRoot
cscript prncfg.vbs -s -b \\%virtual_server%\%QueueName% -published
cscript prncfg.vbs -s -b \\%virtual_server%\%QueueName% +published
dsquery * -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName distinguishedname find /i "%QueueName%"

This removes the AD object against the 'incorrect' node, toggles the published flag (using prncfg from the Resource Kit Tools - see 'Network Printing Tools and Settings' reference below), and then queries AD to verify the printQueue object has been created.

Printer Pruning in AD

Pruning of printer objects in Active Directory is controlled either by the server that deletes the printer from its local spooler, or Domain Controllers through periodic printer pruning. Printer pruning is a domain/site-wide activity which processes all printQueue objects.

In a clustered solution, I believe when a Domain Controller looks up the printqueue objects, it will connect to the virtual print spooler node to verify the printers still exist. So regardless of which physical is publishing the printer, as long as the printer is contactable through the virtual server it shouldn’t be pruned.

As long as the spooler service is enabled on at least one Domain Controller, it will prune printers (at the default of 3x8 hour checks). There are risks of doing this, primarily that if the print server is down for longer than 24 hours (or if the DC can’t contact the server), all printers will be pruned from the directory. This logs an Event 50 for each pruned printer in the system event log of the DC that pruned the object - at least it’s easy to trace.

Printer Commands

Query and compare the printers published from each server to determine duplicates:

dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported > CL1.txt
dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported > CL2.txt
for /f "skip=1" %i in (CL1.txt) do @find /i "%i" CL2.txt

The following two commands help identify mismatches in printers published in AD versus those shared through the virtual print server.

Count the number of printers published in AD:

find /i /c "%virtual_server%" CL?.txt

The number of printers shared against a node:

rmtshare \\%physical_server% find /i "\\%virtual_Server%" /c

Query printers published against a physical server:

dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported

References:

Network Printing Tools and Settings
http://technet.microsoft.com/en-us/library/cc778201.aspx

Printer Pruner May Prune All the Print Queue Objects on Its Site
http://support.microsoft.com/kb/246906

Printer Pruner May Not Remove Printer Queue Objects from Active Directory
http://support.microsoft.com/kb/246174/

A server does not prune printers on a Microsoft Windows Server 2003-based server cluster
http://support.microsoft.com/kb/908128

Useful Windows Printer command-line operations:
http://waynes-world-it.blogspot.com/2008/09/useful-windows-printer-command-line.html

Read more!

Saturday, January 17, 2009

Virtual 2003 MSCS Cluster in ESX VI3

This post shares a method I've used to create test-lab instances of standard 2003 file and print Microsoft Cluster Services (MSCS) clusters in a VMware ESX VI3 virtual environment. The resultant solution is not supported and definitely not production-ready, but if you want a real multi-node MSCS cluster in an ESX lab environment, this process might be helpful with a minimum set of requirements.

With my usual theme of repeatable command-line execution, most of these operations can be completed via the command-line, either in the ESX service console or a command-prompt from the virtual MSCS nodes.

I followed bits and pieces of the VMware supported method - which is very specific and quite restrictive. Note that I’m a little dubious that this cluster would be particularly stable – the SCSI reservations MSCS uses to lock disks are in no way supported when using a shared VMDK through a shared SCSI adapter (I think RDM is the only supported method), but it does work and at least provided me with a test environment.

The shared nothing model of 2003 MSCS clustering dictates that only one node accesses the partition at any one time, but the disk still needs to be visible to both nodes. A limitation of this solution is that both MSCS nodes need to be hosted on one ESX server – a requirement you could satisfy with a DRS rule to keep the two nodes together. However, if DRS decided to migrate both VMs, the cluster would almost certainly break during the failover (and possibly after).

If you follow the steps below, you should end up with two virtual x64 2003 enterprise servers, both members of a single MSCS cluster. In the cluster there will three shared disks (VMDKs), one for the quorum and one each for file and print – with a virtual server and relevant cluster resources. A test file share is created, along with drivers and a test printer. You'll need to modify the commands that reference the public adapter and IP addresses

Steps involved:

Create an area for storage of the shared disk on your datastore:
1. mkdir /vmfs/volumes/%datastore%/cluster01
Create a 5GB quorum disk:
1. vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-Quorum.vmdk
Create a 5GB disk for shared data:

vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-disk01.vmdk

Create two 2003 x64 enterprise virtual machines, either through cloning, deployment with templates or whatever your standard build process may be
If cloning was used, run sysprep on both nodes to give a unique SID and join your lab domain
Shutdown the first node and add the shared disk

Add the quorum disk, mounted under scsi 1:0 (which adds a new SCSI adapter)
Set the newly created SCSI Adapter to SCSI bus sharing virtual
Add disk01, attached as scsi 1:1

In the first VM, use disk administrator (or diskpart) to initialise the quorum and disk01 disks, partitioned with basic. Record the signature of the disk and the drive letter used (although this is the disk volume when the disk is owned by the OS, not the cluster).
Add a service account for the cluster service:

dsadd user "CN=clustersvc,CN=Users,DC=test,DC=local" -pwdneverexpires yes -pwd password -disabled no -desc "MSCS VM cluster service account"
Ensure the service account is an administrator of each virtual 2003 node

Use Cluster Administrator to install the cluster on the first node, with your chosen cluster name, using the created quorum disk and service account
Verify correct operation of the single-node cluster, and then add the second VM node to the cluster.
Create a new port group to allow a second private adapter on each ESX server:

esxcfg-vswitch -A MSCS-Private Private
Add a second interface to each VM cluster node, allocated separate address space
Verify connectivity (ping) and configuration following cluster best practices (no gateway, no DNS etc)
Mark as a private heartbeat connection for the cluster, prioritised above the LAN connection.

Create a virtual resource group, creating IP, network name and disk resources in the group, the following commands will create a group called v01, in the lab01 cluster. For these steps, you’ll need the drive letter to use (M: below), the disk signature, the public network name, IP Address and subnet mask of the virtual server being created:

cluster /cluster:lab01 group "v01" /create
cluster /cluster:lab01 res "v01 Disk01" /create /group:"v01" /type:"physical disk"
cluster /cluster:lab01 res "v01 Disk01" /priv Drive="M:"
cluster /cluster:lab01 res "v01 Disk01" /priv signature=0x%disksignature%
cluster /cluster:lab01 res "v01 Disk01" /prop Description="M: disk01"
cluster /cluster:lab01 res "v01 Disk01" /On
cluster /cluster:lab01 res "v01 IP" /create /group:"v01" /type:"IP Address"
cluster /cluster:lab01 res "v01 IP" /priv Network="%publicNetwork%"
cluster /cluster:lab01 res "v01 IP" /priv Address=192.168.10.10
cluster /cluster:lab01 res "v01 IP" /priv SubnetMask=255.255.255.0
cluster /cluster:lab01 res "v01 IP" /priv EnableNetBIOS=1
cluster /cluster:lab01 res "v01 IP" /priv OverrideAddressMatch=0
cluster /cluster:lab01 res "v01 IP" /AddDep:"v01 Disk01"
cluster /cluster:lab01 res "v01 IP" /On
cluster /cluster:lab01 res "v01" /create /group:"v01" /type:"Network Name"
cluster /cluster:lab01 res "v01" /priv RequireKerberos=1
cluster /cluster:lab01 res "v01" /AddDep:"v01 IP"
cluster /cluster:lab01 res "v01" /priv Name="v01"
cluster /cluster:lab01 res "v01" /On

Install ABEUIamd64.msi on each node if Access Based Enumeration is required
To create a test directory, share and ABE resource on the new virtual server on the cluster (v01):

md \\v01\m$\Dir01
cluster /cluster:lab01 res "v01 Dir01 Share" /create /group:"v01" /type:"File Share"
cluster /cluster:lab01 res "v01 Dir01 Share" /priv path="M:\Dir01"
cluster /cluster:lab01 res "v01 Dir01 Share" /priv Sharename=Dir01
cluster /cluster:lab01 res "v01 Dir01 Share" /priv Remark="Dir01 File Share"
cluster /cluster:lab01 res "v01 Dir01 Share" /prop Description="Dir01 File Share"
cluster /cluster:lab01 res "v01 Dir01 Share" /priv security=Everyone,grant,F:security
cluster /cluster:lab01 res "v01 Dir01 Share" /AddDep:"v01"
cluster /cluster:lab01 res "v01 Dir01 Share" /AddDep:"v01 Disk01"
cluster /cluster:lab01 res "v01 Dir01 Share" /On
cluster /cluster:lab01 res "v01 Dir01 ABE" /create /group:"v01" /type:"Generic Application"
cluster /cluster:lab01 res "v01 Dir01 ABE" /priv CommandLine="cmd.exe /k abecmd.exe /enable Dir01"
cluster /cluster:lab01 res "v01 Dir01 ABE" /priv CurrentDirectory="%SystemRoot%"
cluster /cluster:lab01 res "v01 Dir01 ABE" /priv InteractWithDesktop=0
cluster /cluster:lab01 res "v01 Dir01 ABE" /priv UseNetworkName=0
cluster /cluster:lab01 res "v01 Dir01 ABE" /prop SeparateMonitor=1
cluster /cluster:lab01 res "v01 Dir01 ABE" /prop Description="Access Based Enumeration for Dir01 File Share"
cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01"
cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01 Disk01"
cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01 Dir01 Share"
cluster /cluster:lab01 res "v01 Dir01 ABE" /On

Additional shared cluster disks can be created as required, eg:

vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-disk02.vmdk
Add the disks to one node, (scsi 1:2 in this example). Initialise and allocate in the cluster (as in step 7 above)

To create a virtual print server (assuming you’ve mounted disk02 from step 15 for use in the cluster):

cluster /cluster:lab01 group "v02" /create
cluster /cluster:lab01 res "v02 Disk02" /create /group:"v02" /type:"physical disk"
cluster /cluster:lab01 res "v02 Disk02" /priv Drive="P:"
cluster /cluster:lab01 res "v02 Disk02" /priv signature=0x%disksignature%
cluster /cluster:lab01 res "v02 Disk02" /prop Description="P: print01"
cluster /cluster:lab01 res "v02 Disk02" /On
cluster /cluster:lab01 res "v02 IP" /create /group:"v02" /type:"IP Address"
cluster /cluster:lab01 res "v01 IP" /priv Network="%publicNetwork%"
cluster /cluster:lab01 res "v01 IP" /priv Address=192.168.10.11
cluster /cluster:lab01 res "v01 IP" /priv SubnetMask=255.255.255.0
cluster /cluster:lab01 res "v02 IP" /priv EnableNetBIOS=1
cluster /cluster:lab01 res "v02 IP" /priv OverrideAddressMatch=0
cluster /cluster:lab01 res "v02 IP" /AddDep:"v02 Disk02"
cluster /cluster:lab01 res "v02 IP" /On
cluster /cluster:lab01 res "v02" /create /group:"v02" /type:"Network Name"
cluster /cluster:lab01 res "v02" /priv RequireKerberos=1
cluster /cluster:lab01 res "v02" /AddDep:"v02 IP"
cluster /cluster:lab01 res "v02" /priv Name="v02"
cluster /cluster:lab01 res "v02" /On

Create v02 print spooler:

cluster /cluster:lab01 res "v02 Spooler" /create /group:"v02" /type:"print spooler"
cluster /cluster:lab01 res "v02 Spooler" /priv DefaultSpoolDirectory="P:\Spool"
cluster /cluster:lab01 res "v02 Spooler" /prop Description="v02 Print Spooler"
cluster /cluster:lab01 res "v02 Spooler" /AddDep:"v02 Disk02"
cluster /cluster:lab01 res "v02 Spooler" /AddDep:"v02"
cluster /cluster:lab01 res "v02 Spooler" /On

On v02, add a standard Laserjet 4000 retail driver for x64 and x86, run from a cluster node:

rundll32 printui.dll,PrintUIEntry /ia /c \\v02 /m "HP LaserJet 4000 Series PCL6" /h "x64" /v "Windows XP and Windows Server 2003"
rundll32 printui.dll,PrintUIEntry /ia /c \\v02 /m "HP LaserJet 4000 Series PCL6" /h "x86" /v "Windows 2000, Windows XP and Windows Server 2003"

Create a test printer on v02 called printer01 using the LJ 4000 driver, with a record in DNS, published in AD, set to duplex by default, with customised permissions using the standard winprint processor:

dnscmd %DNSserver% /recordadd %zone% printer01 A 192.168.10.100
cscript //nologo portmgr.vbs -a -c \\v02 -p printer01 -h 192.168.10.100 -t LPR -q printer01
cscript //nologo prnmgr.vbs -a -c \\v02 -b printer01 -m "HP LaserJet 4000 Series PCL6" -r printer01
cscript //nologo prncfg.vbs -s -b \\v02\printer01 -h printer01 -l "%Location%" +published
setprinter.exe \\v02\printer01 8 "pDevMode=dmDuplex=2,dmCollate=1,dmFields=duplex collate"
subinacl /printer \\v02\printer01 /grant=%domain%\%group%=F
setprinter \\v02\printer01 2 pPrintProcessor="WinPrint"

References

VMware Support method of running MSCS clusters:
http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_mscs.pdf

Implementing an MSCS 2003 server cluster Cluster
http://waynes-world-it.blogspot.com/2008/03/implementing-mscs-2003-server-cluster.html

subinacl 5.2.3790.1180:
http://www.microsoft.com/downloads/details.aspx?FamilyID=E8BA3E56-D8FE-4A91-93CF-ED6985E3927B

Windows Server 2003 Resource Kit Tools:
http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-b18c4790cffd&DisplayLang=en

Read more!

Thursday, December 11, 2008

MSCS 2003 Cluster Virtual Server Components

This post provides my interpretation of a simple MSCS 2003 virtual server with a file share, including how the cluster interacts with the OS and network services to provide access to the share. This follows on from the last post on low-level detail of file access in an attempt to provide a clearer picture of these often taken-for-granted components.

Note that this is only my opinion, based on less-than complete knowledge and more than likely contains semantic errors if nothing else.

File & Print Cluster Native x64 (EM64T/AMD64)

Cluster Service. Includes Checkpoint Manager, Database Manager, Event Log Replication Manager, Failover Manager, Global Update Manager, Log Manager, Membership Manager, Node Manager.

Operating System Interaction with the LanManServer Service which advertised shares.
NetBIOS registration of the virtual server name through existing network services
DNS registration of the virtual server name through existing network services
Kerberos SPNs registered against an AD computer account through Active Directory

Resource Monitor. Spawned child process of the cluster service, separate resource monitors can exist for resource DLLs
ClusRes.dll Physical Disk <-> IsAlive/LooksAlive SCSI reservation and directory access. LooksAlive issues a SCSI reservation every 3 seconds through ClusDisk.sys against all managed disks. IsAlive performs a ‘dir’ equivalent
ClusRes.dll Network Name <-> IsAlive/LooksAlive check on NetBT/DNS registration. LooksAlive relies on MSCS NIC failure detection. IsAlive queries local TCP/IP stack for virtual IP and the NetBT driver if NetBIOS is enabled
ClusRes.dll IP Address <-> IsAlive/LooksAlive check on cluster NIC. LooksAlive queries NetBT driver and every 24 hours issues a dynamic DNS host record registration. If ‘DNS is required’ resource will fail if DNS registration fails. Same test for IsAlive
ClusRes.dll Resource DLL File Shares <-> IsAlive/LooksAlive check on file share visibility. LooksAlive queries lanmanserver service for the share name. IsAlive does the same, and if unlimited users, the first file on the share is copied
32-bit Resource Monitor WOW64, Eg. Enterprise Vault Cluster application. Third-party cluster resources, eg Enterprise Vault, which in this case notifies the local FSA placeholder service on each physical node of virtual server changes
ABE enabled by generic cluster resource. Access based enumeration with a generic cluster application running abecmd.exe during virtual server/share creation. Uses 32-bit cluster resource monitor with WOW64, setting SHI1005_FLAGS_ACCESS_BASED_DIRECTORY_ENUM (0x0800) flag set on the otherwise standard share.

Pretty picture view:

References

Server side processes for simple file access
http://waynes-world-it.blogspot.com/2008/11/server-side-process-for-simple-file.html

Access Based Enumeration
http://technet.microsoft.com/en-us/library/cc784710.aspx

SHARE_INFO_1005 Structure
http://msdn.microsoft.com/en-us/library/bb525404.aspx

Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.

Read more!

Friday, November 7, 2008

Server-side process for simple file access

This post provides my interpretation of what happens at a low level when a user on a workstation tries to access a file on a server - in this case a Windows server 2003 x64 MSCS cluster. I was trying to demonstrate the complexity of what seems like such a simple action, and in particular trying to incorporate the cluster network/disk elements and highlighting the WOW64 side of things when you are running x64 with x86 third-party software (such as archiving, quotas etc).

Note that I'm reasonably confident this isn't correct because my understanding if lacking, combined with lack of time/information, but even if it were accurate in content, the step-by-step / flowchart views aren't the best way to represent these multi-layered processes. But anyway, somebody else might also think it interesting (nerds!)

Workstation redirector file access to a virtual node in the cluster. Includes DNS/NetBIOS calls to determine the cluster virtual server IP address
LanManServer service. User-mode LAN Manager 2.x server service, providing file and print sharing, and with 2003 SP1, Access Based Enumeration
NDIS layer. Network Driver Interface Specification, Hardware interrupts passing frames to the NIC driver, then passed to the bound transport driver. send and receive raw packets, includes LLC
TDI Layer. Single interface for upper-level clients to access transport providers, such as TCP/IP
ClusNet.sys Driver. Cluster specific driver interpreting and routing intracluster traffic and determining communication failure
Srv.sys Server Service. SMB file server driver, kernel-mode companion to the LanManServer service
I/O Manager. I/O support routines, I/O Request Packets (IRPs) and Fast I/O interfaces
FSD Manager. File System Driver Manager, loads FSDs and legacy file system filters, interacts with the file system cache manager.
Filter Manager. A file system filter that provides a standard interface for minifilters, managing altitudes and connectivity to the file system stack. Interacts with the file system cache manager, and in the case of x86 filters, an instance of WOW64 controls the runspace for the filters (on an x64 platform)
File System Cache Manager. A file system filter, working with kernel memory-mapped and paging routines to provide file level caching
ntfs.sys File System FSD. Windows NT File System driver, creates a Control Device Object (CDO) and Volume Device Object (VDO)
ClusDisk.sys upper-level storage filter driver. Cluster specific storage filter driver maintaining exclusive access to cluster disks using SCSI reserve commands
Volume Snapshot Volsnap.sys. Manages software snapshots through a standard storage filter
Volume Manager ftDisk.sys. Presents volumes and manages I/O for basic and dynamic disk configurations
Partition Manager Partmgr.sys. Filter driver that sits on top of the disk driver, creates partition devices, notifies volume manager and exposes IOCTLs
Class Driver disk.sys. Presents a standard disk interface to the storage stack, creating a Function Device Object (FDO)
Storage Port Driver - Storport. Assists with PnP and power functionality, providing a Physical Device Object (PDO) for the device->bus connection
Miniport Driver. Interface to the storage adapter’s hardware, combining with the storport driver to create the storage stack
SMB response to the client. SMB response to the redirector on the workstation requesting the file

Pretty picture view:

Read more!

Monday, September 8, 2008

Useful Windows MSCS Cluster command-line operations

The commands below are a subset of the complete command list found in Useful command-lines, and are command-line operations for Microsoft Windows MSCS server clusters. Most commands are based around the Microsoft cluster.exe utility, with some using WMI, defrag and diruse to provide information on cluster disk resources.

Each command-line can be copied and pasted at the command prompt, if you use a batch file you'll need to reference variables with double-percent (%%).

Find cluster disk size and free space in CSV format
wmic /node:"%server%","%server%","%server%","%server%" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name,Size,FreeSpace,VolumeName /format:csv

Find cluster disk size and free space in modified CSV format with thousand sep.
wmic /node:"%server%","%server%","%server%","%server%" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET Name,Size,FreeSpace,VolumeName /format:csv2

Report the windows MSCS cluster virtual groups
cluster /cluster:%cluster% group /prop | find /i "description" | find /i /v "pbx" | find /i /v "cluster"

Report folders being archived from Enterprise Vault EV FSA
sqlcmd -S sqlServer%\%instance% -o ArchivedFolders.txt -d %enterprisevaultdirectory% -W -s "," -Q "select FSVP.UncName, FSVP.VolumeName, FSFE.FolderPath, FSVP.UncName + '\' + FSVP.VolumeName + '\' + FSFE.FolderPath as 'Path' from dbo.FileServerFolderEntry FSFE inner join dbo.vw_FileServer_Volume_Policy FSVP on FSFE.VolumeEntryID = FSVP.VolumeEntryID"

Report folders from the one or more servers not being archived compared to FSA export
for %i in (\\%server%\share% \\%server%\share% ) do @for /f "tokens=1-4,*" %m in ('"dir %i\* /ad /tc | find "DIR" | find "-""') do @find /i "%q" ArchivedFolders.txt >nul & @If errorlevel 1 (echo %q,%i,%m %n %o) >> NotArchived.csv

Delete a cluster resource type
cluster restype "%resource_name%" /delete /type

Find cluster disk size and free space
echo clusnode1 > clusternodes.txt & echo clusnode2 >> clusternodes.txt & echo clusnode3 >> clusternodes.txt & echo clusnode4 >> clusternodes.txt & wmic /node:@clusternodes.txt path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name,Size,FreeSpace,VolumeName

show the MSCS cluster multicast address properties
cluster /cluster:%Cluster% network "%PublicNetwork%" /priv

Find the MSCS cluster resources
cluster /cluster:%Cluster% res /prop find /i "sr"

Find the disks currently owned by each cluster node
for %i in (%server1% %server2%) do @wmic /node:"%i" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name find /i "%server_prefix%"

In a 2003 cluster, find each disk volume and analyse file fragmentation
for /f "tokens=2,5,6,8" %i in ('"cluster /cluster:%cluster% resource /prop find /i "disk" find /i "description" find /i "%CommonTag%""') do echo \\%i\%k %j %l>> Defrag_%i_%j.txt && psexec \\%i defrag %k -a -v >> Defrag_%i_%j.txt

From cluster defrag analysis, print out details for each cluster volume
for /f "tokens=1,* delims=:" %i in ('"findstr /i /c:%server% /c:"Total files" /c:"Volume size" /c:"Used space" /c:"Percent free space" /c:"Total fragmented files" defrag*"') do @echo %j

Create a cluster file share:
cluster /cluster:%cluster% res "%share_res_name%" /create /group:"%group%" /type:"File Share"
cluster /cluster:%cluster% res "%share_res_name%" /priv path="%path%"
cluster /cluster:%cluster% res "%share_res_name%" /priv Sharename=%share_name%
cluster /cluster:%cluster% res "%share_res_name%" /priv Remark="File Share Remark"
cluster /cluster:%cluster% res "%share_res_name%" /prop Description="File Share Description"
cluster /cluster:%cluster% res "%share_res_name%" /priv security=Everyone,grant,F:security
cluster /cluster:%cluster% res "%share_res_name%" /AddDep:"%networkname_res%"
cluster /cluster:%cluster% res "%share_res_name%" /AddDep:"%disk_res%"
cluster /cluster:%cluster% res "%share_res_name%" /On

Create an ABE resource for the file share
cluster /cluster:%cluster% res "%shareabe_res_name%" /create /group:"%group%" /type:"Generic Application"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv CommandLine="cmd.exe /k abecmd.exe /enable %share_name%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv CurrentDirectory="%SystemRoot%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv InteractWithDesktop=0
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv UseNetworkName=0
cluster /cluster:%cluster% res "%shareabe_res_name%" /prop SeparateMonitor=1
cluster /cluster:%cluster% res "%shareabe_res_name%" /prop Description="Access Based Enumeration for %share_name% File Share"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%networkname_res%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%disk_res%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%share_res_name%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /On

Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.

Read more!

Sunday, June 22, 2008

FSRM and NTFS Quotas in 2003 R2

This post discusses several methods of using File Server Resource Manager (FSRM) auto-quotas with a single share for many home directories, and how you can bypass the limitation with FSRM quotas over SMB and return a reduced amount of disk space through the single share. The two methods discussed are reparse points, and combined FSRM and NTFS quotas.

There is an inherent problem with FSRM quotas in Windows Server 2003 R2 – when accessed remotely, a hard quota is used to report disk free space to the client only when a quota is set on the root of the disk or share. The share overwrites volume root if both have hard quotas set.

Unfortunately this is not practical in this scenario, as the free space from the quota root down will be affected by a hard quota. For example, a hard quota set on the root of the share, where that share contains user home directories, the total space would be limited based on the quota, rather than limiting each home directory. No method could be found to prevent inheritance of a quota setting to sub-folders.

Note that this does not occur when accessing the quota locally on a machine; the problem exists due to the SMB call for QUERY_FS_INFO is querying the free space at the root, not the free space at the folder (historically there was no difference). File screening has the capability to include a blocking exception entry deeper in the tree to override policies above, but quotas do not have the same interface through the GUI.

The following methods were tried (and failed) to see if there was an easy workaround for this issue:

A mapped drive to the UNC directly (eg. file://server/share/user1)
A mapped drive to the share (eg. file://server/share)
No mapped drive, using UNC only
path parsing disabled accessing a mapped drive file:///?\h:\
pushd file://server/share/user1
subst y: h:\

However, if this functionality is required, there are at least two methods to work around the problem – using reparse points or using a combination of NTFS quotas and FSRM quotas.

Reparse Points

Testing was conducted to see whether reparse points, junctions, mount points or symbolic links could be used to return a different amount of free space from the root of the volume compared to the quota applied to each home drive folder.

Using one directory junction, one share, one hard quota and one autoquota, it is possible to use FSRM R2 quotas to report the free disk space based on a hard quota at a root folder, while still providing different per-folder quotas.

For example, in the following scenario, it’s possible to report a reduced disk free space limit, using only FSRM quotas and a directory junction point on the same volume.

Cluster share Root: f:\QuotaTest - file://server/QuotaTest
User Home Root: file://server/f$/users
User home drive: \\server\quotatest\junction\user1 (f:)
FSRM Hard Quota on the share root: 10MB
FSRM Hard or Soft autoquota on the home directory root: 20MB
Junction Directory: f:\quotatest\junction
Junction Target: f:\users
Create the directory junction/reparse point: junction.exe f:\quotatest\junction f:\users

Tests completed under this scenario from a workstation:

Directory of H: on reports 10MB free space, based on the hard quota set at the root of the share
Explorer view of H: reports 10MB free space, with the drive mapped through the junction (AD)
Copy a 13MB file to H: succeeds, still 10MB reported free, FSRM warning triggered based on 50% usage (of the 20MB)
Copy another 13MB file to H: fails, 20MB hard autoquota set on h:\users prevents copy

Notes:

Apparently Windows Vista clients using SMB 2.0 do not have this issue
Windows 2000 and later support directory junctions – reparse points. When accessing a reparse point, the processing occurs on the server, unlike Vista/2008 which has a modified MUP and network redirector architecture, supporting client-side processing of file and directory symbolic links.
This still has at least one major disadvantage in that free space will not change for users, they would always see the free space available at the root of the share, 10MB in the example above. However, if hard FSRM autoquotas were used without this method, the free space reported to users would be the total free space on the volume, regardless of the 10MB hard limit that they would be limited to. This is potentially confusing in both scenarios.

Combined FSRM and NTFS quotas

Being completely different technologies, it doesn’t seem that NTFS quotas and FSRM quotas conflict with each other. Therefore one method of providing soft/hard FSRM quotas and also reducing the disk space seen by users is to also use NTFS hard quotas.

There are several caveats with this approach:

NTFS quotas are only relevant for user-owned data, where each user has data in one directory, ideal for home directories, but not suitable for shared data directories.
The two different quota systems would have to be separately maintained and aligned as configuration changes in the other. While all users conform to the standard template this would not be challenging, but as individual quotas are changed this will become problematic (as always happens).

Overall this solution provides a more realistic disk-free result for each user, provided the FSRM hard quota matches the NTFS hard quota, and file ownership is correctly set.

The following testing was completed with FSRM and NTFS quotas working together in a 2003 MSCS cluster:

Hard NTFS quota of 15MB
Soft auto-quota of 20MB
Writing a file using user1 to the H: drive, automatically creates a quota entry in NTFS quotas
Writing a second file which takes it over 10MB (50%), the FSRM quota event/command takes place
The user doing a directory of the filesystem reports only the NTFS hard quota disk free space.
Trying to copy another file as user1 to the H: drive fails with not enough disk space according to the hard NTFS quota
Moved the cluster group to verify this follows on a cluster
After the group was moved to another server, conducted same tests, NTFS quotas still apply and hard limites being returned to the client as total space.

Read more!

Sunday, May 25, 2008

Automated Cluster File Security and Purging

If you have a cluster share that contains temporary data in separate top-level directories, this post may help you automate the security and purging of that shared data. This is useful for transient data such as drop directories for scanners and faxes, or scratch directories for general sharing.

To summarise, this will provide:

A cluster-based scheduled task that runs each day, dependant on the network name and physical disk resource currently hosting the directory
A batch file run by the scheduled task that secures each directory, and purges files older than 30 days, logging results to the physical node hosting the resource.

Creating the Scheduled Task

Create the scheduled task cluster resource:
cluster /cluster:%cluster% res "%resource_name%" /create /group:"%cluster_group%" /type:"Volume Shadow Copy Service Task"
cluster /cluster:%cluster% res "%resource_name%" /priv ApplicationName="cmd.exe"
cluster /cluster:%cluster% res "%resource_name%" /priv ApplicationParams="/c c:\admin\SecureAndPurge.bat"
cluster /cluster:%cluster% res "%resource_name%" /priv CurrentDirectory=""
cluster /cluster:%cluster% res "%resource_name%" /prop Description="%resource_name%"
cluster /cluster:%cluster% res "%resource_name%" /AddDep:"%network_name_resource%"
cluster /cluster:%cluster% res "%resource_name%" /AddDep:"%disk_resource%"
cluster /cluster:%cluster% res "%resource_name%" /On
cluster /cluster:%cluster% res "%resource_name%" /prop RestartAction=1
Set the schedule for the cluster resource:
• Use the cluster administrator GUI, this cannot currently be set with cluster.exe with the VSS scheduled task cluster resource
Restart the resource to pickup the schedule change:
cluster /cluster:%cluster% res "%resource_name%" /Off
cluster /cluster:%cluster% res "%resource_name%" /On

Note that the cluster resource providing scheduled task capability is the ‘Volume Shadow Copy Service Task’ resource. This is a recommended solution from Microsoft for providing scheduled task capability on a cluster. See the ‘Cluster Resource’ document in the references below.

The LooksAlive and IsAlive functions for the VSSTask.dll simply check that the scheduled task is known to the local task scheduler. To further reduce the impact of resource failure, the resource should be marked as not affecting the cluster, preventing potential failover if this task were to fail more than three times (by default).

The scheduled task should run a simple batch file on the local disk of the cluster node. Keeping the batch file local further reduces the risk that problems with the batch file could cause the cluster group to fail. The theory is that if the batch file is on local disk, it can be modified/deleted before bringing the cluster resources online.

Creating the batch file

Create a batch file and set some environment variables for %directory%, %purgeDir%, %domain%, %logFile%, %AdminUtil%, %FileAge% to fit your environment, and then include at least the three commands below:

Set the security on each directory within the directory. Note that this assumes that for each directory, there is a matching same-named security group, prefixed with l (for local), eg lDirectory1.

for /d %%i in (%Directory%\*) do cacls %%i /e /g %Domain%\l%%~ni:C >> %LogFile%
Move the files with robocopy that are older than %FileAge% days:

%AdminUtil%\robocopy %Directory% "%PurgeDir%" *.* /minage:%FileAge% /v /fp /ts /mov /e /r:1 /w:1 /log+:%LogFile%
Delete the files that were moved:

If Exist "%PurgeDir%" rd /s /q "%PurgeDir%"

Note that depending on the size of data, you might want to ensure that the purgedir is on the same volume as the source files, which won't use any disk space as the files are moved. If the purgedir was on a different drive you would temporarily need as much free space as the size of data being purged.

References:

Cluster resource
http://technet2.microsoft.com/windowsserver/en/library/f6b35982-b355-4b55-8d7f-33127ded5d371033.mspx?mfr=true

Volume Shadow Copy Service resource type
http://technet2.microsoft.com/windowsserver/en/library/bc7b7f3a-d477-42b8-8f2d-a99748e3db3b1033.mspx?mfr=true

Using Shadow Copies of Shared Folders in a server cluster
http://technet2.microsoft.com/windowsserver/en/library/66a9936d-2234-411f-87b4-9699d5401c8c1033.mspx?mfr=true

Scheduled task does not run after you push the task to another computer
http://support.microsoft.com/kb/317529

Scheduled Task for the Shadow Copies of Shared Folders Feature May Not Run on a Windows Server 2003 Cluster
http://support.microsoft.com/kb/828259

Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows Server Clustering component of Windows Server 2003
http://support.microsoft.com/kb/914458

Generic Cluster-enabled Scheduled Tasks:
http://waynes-world-it.blogspot.com/2008/04/2003-cluster-enabled-scheduled-tasks.html

Read more!

Tuesday, May 20, 2008

Access Based Enumeration in 2003 and MSCS

This post provides an overview of Access Based Enumeration on Windows Server 2003, some limitations, advantages and information on controlling ABE in an Windows MSCS environment.

With a standard Windows file server, for users to map the share and view the directories they have access to, all users require the list directory right at the root of the share. The client would then see all directories, and receive access denied errors if they try to navigate to any sub-folder without NTFS access.

To provide access control similar to Netware, Microsoft have introduced Access Based Enumeration in Windows Server 2003 SP1. This provides a method of displaying only files and folders that a user has access to, rather than displaying all objects in the tree.

The best description I can give is that ABE will hide what you don't have access to, as opposed to ensuring you can see what you do have access to. For example, if you have .\A .\A\B and .\A\B\C, and you have access to C but you don't have access to B, C will be hidden through the explorer GUI.

While this allows for a seamless migration from Netware-based file servers, there are several potential limitations:

ABE has to be installed as a separate component to the Operating System, which must be documented and managed for implementation and recovery scenarios.
ABE is not cluster-aware, and as enabling ABE is a per-share operation, cluster failovers resulting in the dynamic creation of shares on a physical cluster node will not create ABE-enabled shares. A generic cluster application could be created to enable ABE on all shares as they are created on each cluster node. This is non-standard and not ideal.
Increased CPU usage on the file server and slower response times to the client, processing the file data to provide information to the client on which files and directories are visible. Depending on the algorithm used, the size and depth of data structures and file system security, this may be an issue.
There are known issues with DFS and Access Based Enumeration
There are known issues with the cache on multi-user workstations, which will provide a view of any files and directories that have been viewed by any user of a computer. Client cache characteristics such as retention and location are not known.

Advantages:

If looking at migrating from Netware to Microsoft file sharing, the migration will be seamless for users. The file/directory structure and security will be similar, along with end-user access.
Using ABE is a documented solution for managing the sharing and access control for clustered home folders, rather than using the share sub-directories feature of MSCS.

Notes:

There are known issues with backup applications that do not use the ‘back up files and directories’ right backing up data through standard file shares.
It is unknown whether navigating with explorer to a top-level directory causes server-side processing of all files/folders in the share to determine the access path to all items in the tree, or whether the algorithm will process per-directory. This may be relevant in determining the test-cases to apply to assess performance. Based on testing, it is assumed that if access is denied at a top-level, sub-folders and files are not processed deeper in that branch.
Windows Server 2008 includes cluster-aware ABE, enabled by default on all shares. Going forward this minimises the risk that this is a non-standard solution.

Controlling ABE in a cluster environment

Access Based Enumeration is controlled through SMB share settings for each instance of the lanmanserver service – each physical node in the cluster. These settings are not cluster-aware, and will be lost during a cluster fail-over operation.

To ensure that ABE follows virtual cluster nodes during failovers, a generic cluster application can be created, running the abecmd.exe to verify that ABE is enabled after failover. The cluster application will be dependant on the file share resource, run for each file share.

Resource Type - Generic Cluster Application
Resource Name - <share> ABE
Description - <share> ABE
Dependencies - <share>
Command - cmd.exe /k abecmd.exe /enable <share>
Current Directory - %SystemRoot%
Interact with Desktop - De-selected

Notes:

This assumes abecmd.exe is available in the path of each physical cluster node (this is the case when you install the Microsoft package).
The /k switch is required to ensure that the cmd.exe application remains open, after the abecmd.exe process terminates. This ensures that the cluster resource monitor does not detect the resource as failed. This also leaves a command shell running for each instance of ABE being enabled, certainly not ideal and potentially misleading.

Other solutions considered

Other solutions I've considered for controlling ABE in a Windows file and print cluster environment included:

Creating a generic cluster application dependent on all shares within a particular group, using the 'abecmd.exe /all' method. This is potentially less maintenance, but does not offer granular control of particular shares
Creating a generic script resource type, with a VBScript using the supported cluster entry points to enable ABE on shares when they are created. This requires VBScript knowledge to create and maintain the solution, as opposed to a standard Microsoft provided executable.
Creating an external watcher than determine cluster failovers and share re-creation, running the appropriate abecmd.exe commands as required. This requires an external server process, either a compiled application or script, watching the cluster, not intuitive and adding a single point of failure
Controlling Access Based Enumeration through Group Policy. Third-party software exists to control the enforcement of Access Based Enumeration to file servers. However, unless the scheduled GPO refresh period was very regular, this would not be relevant in a failover cluster scenario.

CPU usage and end-user response times

The biggest concern is response times, as the Microsoft whitepaper on ABE determines that with 150,000 files, the operation of ‘reading shared directories’ increase from 12 seconds to up to 58 seconds. However, there is no detail on the type of test performed or the hardware used.

On a 2003 cluster with several million files and more than a terabyte of data, no performance impacts have been noticed when accessing folder structures through shares with ABE enabled.

References:

Access Mask:
http://msdn2.microsoft.com/en-us/library/ms790780.aspx

Access Based Enumeration whitepaper:
http://www.microsoft.com/windowsserver2003/techinfo/overview/abe.mspx

Access Based Enumeration:
http://technet2.microsoft.com/WindowsServer/en/library/f04862a9-3e37-4f8c-ba87-917f4fb5b42c1033.mspx

Enabling ABE with DFS:
http://support.microsoft.com/kb/907458

Implementing home folders on a cluster server:
http://support.microsoft.com/kb/256926

Windows Server 2008 failover clustering:
http://technet2.microsoft.com/windowsserver2008/en/library/13c0a922-6097-4f34-ac64-18820094128b1033.mspx?mfr=true

Scripting Entry Points:
http://msdn2.microsoft.com/en-us/library/aa372846.aspx

Create a generic application resource type:
http://technet2.microsoft.com/windowsserver/en/library/ad0bd83d-6144-45b5-8dda-3e599d1edfdb1033.mspx

Generic script cluster resource:
http://www.microsoft.com/windowsserver2003/technologies/clustering/resources.mspx http://msdn2.microsoft.com/en-us/library/aa369599.aspx
http://msdn2.microsoft.com/en-us/library/aa373089.aspx

Read more!

Friday, May 16, 2008

Comparing MSCS/VMware/DFS File & Print

The following table shows information I was using to compare various Windows HA file and print solutions, including MSCS, VMware, VMware+MSCS, DFS, VMware+DFS and stand-alone servers. There are no recommendations, and most need to be adjusted or at least considered for your environment, but it might help crystalise your thoughts as it did mine.

Comparison	Microsoft server Clustering	VMware HA	Microsoft Clustering on VMware HA	DFS	Stand-alone server(s)	VMware HA with DFS for file shares
Highly Available	Y	Y	Y	N	N	Y
Satisfies SLAs	?	?	?	?	?	?
Maximum nodes	8	Limited by host hardware	2	N/A	N/A	Limited by host
Failover time	<2 minutes	VMotion or server startup time	<2 minutes	SPF	SPF	VMotion or server startup time
Single server Disaster Recovery – Software Failure	<2 minutes	Snapshot, server rebuild or manual recovery procedure	<2 minutes	SPF	SPF	pshot, server rebuild or manual recovery procedure
Single server Disaster Recovery – Hardware Failure	<2 minutes	< 30 seconds	< 30 seconds	SPF	SPF	< 30 seconds
Licensing	2003 Enterprise per node	DataCentre + CALs (depending on VM design)	DataCentre + CALs (depending on VM design)	2003 Standard	2003 Standard	2003 Standard
Hardware Failure – Data Communications	Single/teamed NIC for prod interface	NIC redundancy depending on virtual solution	NIC redundancy depending on virtual solution + cluster-specific requirements	Teamed NIC	Teamed NIC	NIC redundancy depending on virtual solution
Hardware Failure – HBA	Single HBA per node	HBA redundancy depending on virtual solution	HBA redundancy depending on virtual solution	Single HBA	Single HBA	HBA redundancy depending on virtual solution
OS Disk Configuration	Basic	Dynamic	Basic	Dynamic	Dynamic	Dynamic
Hardware utilisation	Physical servers	Virtual servers	Virtual servers	Physical servers	Physical servers	Virtual servers
Cost allocation	Cost model required	Cost model required	Cost model required	Per server/LUN	Per server/LUN	Cost model required
Scalability/Flexibility – adding new nodes/LUNs	Y	Y	Y	N	N	Y, DFS for file
Manageability	MSCS skills required	VMware skills required	Complex combination of both technologies	DFS skills required	Existing skills, but increased per server	DFS and VMware skills required
User access to shares via UNC	Single name	Multiple names	Single name	Single name	Multiple names	DFS namespace
Future proofing – migration to new hardware/OS	Moderately complicated migration	Relatively simple upgrade path, reattaching LUNs or adding another VM	Moderately complicated migration	Relatively simple	Relatively simple	Relatively simple
Hardware on Vendor HCL	?	?	?	?	?	?
Backup/restore	?	Standard file backup	VCB or ?	Standard file backup	Standard file backup	VCB or ?
Printer administration	Simplified with Cluster 2003	Duplicated effort on each print server	Simplified with Cluster 2003	N/A	Duplicated effort on each print server	Duplicated effort on each print server
Service and Event Monitoring	Cluster monitoring required	Standard monitoring for virtual servers, host monitoring required	Cluster monitoring for virtual servers, VMware host monitoring required	Standard monitoring	Standard monitoring	Standard monitoring for virtual servers, host monitoring required

1. Basic disks on a Microsoft server cluster can be extended if new space is visible on the LUN. The disks cannot be dynamic in MSCS.
See http://technet2.microsoft.com/windowsserver/en/library/cd4d0a84-6712-4fbc-b099-2e8fefeb694c1033.mspx?mfr=true

Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.

Read more!

Wednesday, April 16, 2008

2003 Cluster-enabled scheduled tasks

Creating a cluster-aware scheduled task has several benefits and has historically been quite difficult. The volume shadow copy service task resource type in Windows Server 2003 clusters provides a mechanism to allow scheduled task capability as a cluster resource. Despite the name, this resource type seems to be a generic cluster resource that provides access to the standard task scheduler interface to schedule and run any command within a resource group.

This post provides information on creating the cluster resource using the cluster.exe command-line interface, some best practices (in my opinion) - preventing this resource from affecting the group, network and disk dependencies, using local scripts and some background in the LooksAlive/IsAlive functions provided by this resource type.

Create the scheduled task cluster resource:

cluster /cluster:%Cluster% res "TaskName" /create /group:"BNE-VFP03-CL4" /type:"Volume Shadow Copy Service Task"
cluster /cluster:%Cluster% res "%TaskName%" /priv ApplicationName="cmd.exe"
cluster /cluster:%Cluster% res "%TaskName%" /priv ApplicationParams="/c Command-Batch-Or-Script"
cluster /cluster:%Cluster% res "%TaskName%" /priv CurrentDirectory=""
cluster /cluster:%Cluster% res "%TaskName%" /prop Description="Task Description"
cluster /cluster:%Cluster% res "%TaskName%" /AddDep:"%NetworkName%"
cluster /cluster:%Cluster% res "%TaskName%" /AddDep:"%PhysicalDisk%"
cluster /cluster:%Cluster% res "%TaskName%" /prop RestartAction=1
cluster /cluster:%Cluster% res "%TaskName%" /On

Set the schedule for the cluster resource:

Use the cluster administrator GUI, this cannot currently be set with cluster.exe in Windows Server 2003 clusters.

Restart the resource to pickup the schedule change:

cluster /cluster:%Cluster% res "%TaskName%" /Off
cluster /cluster:%Cluster% res "%TaskName%" /On

Notes:

The cluster resource providing scheduled task capability is the 'Volume Shadow Copy Service Task' resource. This is a recommended solution from Microsoft for providing scheduled ask capability on a cluster. See the 'Volume Shadow Copy Service resource type' reference.
The LooksAlive and IsAlive functions for the VSSTask.dll simply check that the scheduled task is known to the local task scheduler. However, to further reduce the impact of resource failure, this resource has been marked as not affecting the cluster, preventing potential failover if this task were to fail more than the default of three times.
This causes the creation of a schedule job using the %TaskName% you have chosen in (by default) the %systemroot%\tasks folder. Any existing local computer task with the same name would be overwritten.
When specifying a command to run, it is safer to run a command from local disk. If something in the command were to cause the cluster resource to fail, you would have to modify the task resource before bringing the disk online. Having the commands on local disk allows easy changes. The drawback of this approach is that you'll have to either manually copy or have a process that copies the commands to each node of your cluster.

References:

Cluster resource
http://technet2.microsoft.com/windowsserver/en/library/f6b35982-b355-4b55-8d7f-33127ded5d371033.mspx?mfr=true

Volume Shadow Copy Service resource type
http://technet2.microsoft.com/windowsserver/en/library/bc7b7f3a-d477-42b8-8f2d-a99748e3db3b1033.mspx?mfr=true

With the Volume Shadow Copy Service Task resource type, you can create jobs in
the Scheduled Task folder that must be run on the node that is currently hosting
a particular resource group. In this way, you can define a scheduled task that
can failover from one cluster node to another. However, in the Microsoft®
Windows Server 2003 family of products, the Volume Shadow Copy Service Task
resource type has limited capabilities for scheduling tasks and serves primarily
to support Shadow Copies of Shared Folders in a server cluster. If you need to
extend the capabilities of this resource type, consider using the Generic Script
resource type

Using Shadow Copies of Shared Folders in a server cluster
http://technet2.microsoft.com/windowsserver/en/library/66a9936d-2234-411f-87b4-9699d5401c8c1033.mspx?mfr=true

Scheduled task does not run after you push the task to another computer
http://support.microsoft.com/kb/317529

Scheduled Task for the Shadow Copies of Shared Folders Feature May Not Run on a Windows Server 2003 Cluster
http://support.microsoft.com/kb/828259

Read more!

Saturday, March 29, 2008

Implementing an MSCS 2003 server cluster

When implementing a Windows Server 2003 MSCS server cluster there are several common issues that can easily be avoided by extra planning and configuration. I've compiled a list of pre-configuration, installation and post-installation steps to reduce the risk of encountering issues when installing an MSCS cluster in a SAN environment.

This is mostly a summary of MS documents and general best practice, but I've not seen all of these in one place before so I thought I would post them.

Pre-installation steps for each server:

Unplug all HBA's from all cluster nodes.
Set the network adapter binding order to external and then internal.
Manually set speed, duplex, IP for all NICs, no gateway/DNS/WINS for private network
Verify connectivity between each node on public and private adapters
Turn off any APM/ACPI power saving features relating to disk drives
Create the cluster service account in the domain
Ensure the cluster service account is an administrator of the physical cluster nodes. (especially if Kerberos authentication is enabled for virtual servers, but general best practice)
Ensure the windows firewall is disabled on both adapters
Ensure security auditing is enabled on each node
Verify correct drivers are installed on each node (HBA, NIC, chassis backplane etc.) and no device manager errors exist.
Shutdown all nodes. Patch HBA’s on the first node, turn on the first node and check storage is visible. Repeat the step for each node, ensuring that only one node has visibility of the storage at any one time. Verify all nodes see the same target paths/disk in the same order.
Ensure backup agent is installed and functioning on all servers.
Ensure anti-virus agent is installed and functioning on all servers.
Configure permissions and role-based security on the servers as required
Install Access-Based enumeration on each server (if required)

Cluster first node installation:

Shut down all but the first node, so that only the first server has visibility of the storage
Re-verify all the SAN disk is visible to the OS
Partition and format disks using MBR before adding the first node to the cluster, disable compression. Q: for quorum is a defacto standard, other disks starting after leaving a few letters early in the alphabet for any removable devices/KVM virtual devices etc if they are auto-mounted
Use Cluster Administrator to install the cluster, use typical (full) installation when creating a new cluster (there should be no reason not to if the disk is presented the same to each server). Do not use ‘Manage Your Server’ to configure cluster nodes
This is where you'll need the cluster name. Use a naming convention that makes sense, linking the physical nodes in the cluster to the virtual cluster name(s)
Ensure that the all disks managed by the cluster have associated disk resources before adding the second and subsequent nodes, this ensures disk locking
Verify the cluster is functioning, cluster service is running, no event errors, all resources available and functioning etc.

Second and subsequent nodes:

Plug in HBA to all other nodes, turn on second node
Re-verify all the SAN disk is visible to the OS on the second node
Add second node to cluster using Cluster Administrator (the first node will lock the disk)
Verify the cluster is functioning (cluster service is running, no event errors, all resources available and functioning etc).
Add subsequent nodes using Cluster Administrator
Verify the cluster is functioning (cluster service is running, no event errors, all resources available and functioning etc).

Post-installation configuration:

Set the role of the private network to be only for internal communication (with mixed for failover according to the design)
Set the role of the public network to public network
Place the private network at the top of the priority list for internal node-to-node communication
Do not use the default cluster group for any resources
Do not use the quorum disk for anything else in the cluster
Do not install scripts used by generic script resources on cluster disk (easier to recover if they're on local disk)
Enable kerberos authentication for network name resources, after taking the network name resources offline). Enabling Kerberos will ensure a computer account is created for the virtual servers and adds Service Principal Names for Kerberos lookup and authentication
For the first node, set the startup and recovery settings to start within 5 seconds. For the other nodes, set to 30 seconds, to reduce the risk if all cluster nodes are starting at the same time that there would be quorum conflict/contention.
Create and test all resources, resource groups and virtual servers, dependencies, failover/failback policies, including file shares/ABE and print spooler
Configure backups appropriate for all cluster nodes
Configure performance and service monitoring
Configure quotas and file screening using FSRM if required

Other general thoughts:

Access Based Enumeration is useful in some file structures, but does not fully equate to functionality provided in Netware. The easiest way I can describe ABE is that it hides what you do not have access to, rather than ensuring you can see what you do have permissions for. For example, in the tree A\B\C, if you have permissions to A and C, but not permissions to B, you will not see C. This is because ABE has hidden what you don’t have access to (B), a by-product of which is that (C) won’t be visible in a default explorer navigation.
Having a single virtual print spooler still has a single point of failure – the spoolsv.exe process running on the host system. If that dies because of a configuration error, that error will most likely fail over to any other nodes that can host that resource group. Regardless of ensuring you don’t use kernel-mode (version 2), and only use user-mode (version 3) drivers, any number of issues can occur somewhere in the print process, whether it’s a third-party print processor causing issues, a non-standard port type, or just a poorly written unidrv support DLL. Everything is a lot more transparent with 2003 clustering – drivers, processors and ports all follow the virtual spooler, which most of the time is good, except when you have a problem.

Testing, reproduced from the standard Microsoft confclus.doc document:

Test: Start Cluster Administrator, right-click a resource, and then click “Initiate Failure”. The resource should go into an failed state, and then it will be restarted and brought back into an online state on that node.
Expected Result: Resources should come back online on the same node

Test: Conduct the above “Initiate Failure” test three more times on that same resource. On the fourth failure, the resources should all failover to another node in the cluster.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node. Start Computer Management, and then click Services under Services and Applications. Stop the Cluster service. Start Cluster Administrator on another node and verify that all resources failover and come online on another node correctly.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node. On that node, click Start, and then click Shutdown. This will turn off that node. Start Cluster Administrator on another node, and then verify that all resources failover and come online on another node correctly.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node, and then press the power button on the front of that server to turn it off. If you have an ACPI compliant server, the server will perform an “Emergency Shutdown” and turn off the server. Start Cluster Administrator on another node and verify that all resources failover and come online on another node correctly. For additional information about an Emergency Shutdown, see the following articles in the Microsoft

Knowledge Base:
325343 HOW TO: Perform an Emergency Shutdown in Windows Server 2003
297150 Power Button on ACPI Computer May Force an Emergency Shutdown
Expected Result: Resources should failover to another node in the cluster
Warning: Performing the Emergency Shutdown test may cause data corruption and data loss. Do not conduct this test on a production server

Test: Move all resources to one node, and then pull the power cables from that server to simulate a hard failure. Start Cluster Administrator on another node, and then verify that all resources failover and come online on another node correctly
Expected Result: Resources should failover to another node in the cluster
Warning: Performing the hard failure test may cause data corruption and data loss. This is an extreme test. Make sure you have a backup of all critical data, and then conduct the test at your own risk. Do not conduct this test on a production server

Test: Move all resources to one node, and then remove the public network cable from that node. The IP Address resources should fail, and the groups will all failover to another node in the cluster. For additional information, see the following articles in the Microsoft Knowledge Base:
286342 Network Failure Detection and Recovery in Windows Server 2003 Clusters
Expected Result: Resources should failover to another node in the cluster

Test: Remove the network cable for the Private heartbeat network. The heartbeat traffic will failover to the public network, and no failover should occur. If failover occurs, please see the “Configuring the Private Network Adaptor” section in earlier in this document
Expected Result: There should be no failures or resource failovers

References:

Guide to Creating and Configuring a Server Cluster Under Windows Server 2003

http://www.microsoft.com/downloads/details.aspx?familyid=96F76ED7-9634-4300-9159-89638F4B4EF7&displaylang=en

Best practices for installing and upgrading cluster nodes

http://technet2.microsoft.com/windowsserver/en/library/87f23f24-474b-4dea-bfb5-cfecb3dc5f1d1033.mspx?mfr=true

Best practices for configuring and operating server clusters

http://technet2.microsoft.com/windowsserver/en/library/2798643f-427a-4d26-b510-d7a4a4d3a95c1033.mspx?mfr=true

Before Installing Failover Clustering

http://msdn2.microsoft.com/en-us/library/ms189910.aspx

Cluster Configuration Best Practices for Windows Server 2003

http://www.microsoft.com/downloads/details.aspx?FamilyID=98BC4061-31A1-42FB-9730-4FAB59CF3BF5&displaylang=en

Server Cluster Best Practices

http://technet2.microsoft.com/windowsserver/en/library/8c91dba9-edfc-48b5-8d98-48d6536e0db81033.mspx?mfr=true

Cluster architecture

http://download.microsoft.com/download/0/a/4/0a4db63c-0488-46e3-8add-28a3c0648855/ServerClustersArchitecture.doc

Creating and Configuring a Highly Available Print Server

http://download.microsoft.com/download/2/a/9/2a9c5a6b-472a-40b0-942f-3ba50240ccd9/ConfiguringAHighlyAvailablePrintServer.doc

Disk quotas and clusters

http://technet2.microsoft.com/windowsserver/en/library/1ee8754e-48d6-4472-9b53-29e8d1de09f81033.mspx?mfr=true

Read more!

WWoIT - Wayne's World of IT

Blog Archive

Labels

Friday, January 30, 2009

printQueue AD objects for 2003 Cluster

Saturday, January 17, 2009

Virtual 2003 MSCS Cluster in ESX VI3

Thursday, December 11, 2008

MSCS 2003 Cluster Virtual Server Components

Friday, November 7, 2008

Server-side process for simple file access

Monday, September 8, 2008

Useful Windows MSCS Cluster command-line operations

Sunday, June 22, 2008

FSRM and NTFS Quotas in 2003 R2

Sunday, May 25, 2008

Automated Cluster File Security and Purging

Tuesday, May 20, 2008

Access Based Enumeration in 2003 and MSCS

Friday, May 16, 2008

Comparing MSCS/VMware/DFS File & Print

Wednesday, April 16, 2008

2003 Cluster-enabled scheduled tasks

Saturday, March 29, 2008

Implementing an MSCS 2003 server cluster

All Posts

About Me

Links