Labels

Showing posts with label Cluster. Show all posts
Showing posts with label Cluster. Show all posts

Friday, January 30, 2009

printQueue AD objects for 2003 Cluster

Print queue objects in AD provide a useful facility when users are trying to find printers, but with a 2003 MSCS clustered virtual print spooler, occasionally the information in AD does not reflect the current state of printers. This post describes some problems I've come across with duplicate/incorrect information and some ideas of how to automatically combat the problem.

Print Queue Objects in AD

Print queue objects in 2003 clustering are named with the virtual print server name, but they are children off a physical computer account. Which computer account the printers are children of is determined by the physical node that owned the cluster spooler resource when the printer was originally published in AD. As a virtual print server fails between nodes, the printer objects in the directory are not re-published (I assume unless the object is not found in the directory).

It's intuitive that print queue objects would be republished on failover to the node that currently owns the spooler, but that could potentially be hundreds or thousands of printer objects being created/deleted with each failover so it's practical not to. It appears the printer object is confirmed using the virtual print server name, and no change is made if the object is found - regardless of which physical node the print queue object is a child of.

In the scenario of a stand-alone printer server, when a printer is deleted, the spoolsv service also removes the directory object. In a clustered virtual print server this also occurs, however, it appears that in a 2003 cluster the object is not automatically removed from the directory if the node that owns the object when deleted is different than the publishing node.

None of this really matters if everything is working perfectly, but in a 2003 MSCS I have seen the following situations:

  1. Print queues that no longer exist still being visible through a search in AD
  2. Duplicate print queue objects, published against each physical none in the cluster that has hosted the virtual print spooler.

The first was a bigger problem, and I believe the following scenario will result in stale print queue objects persisting:

  1. You have a two node cluster, CL01 and CL02. CL01 owns a virtual print spooler and other cluster groups, under which you create all the print queues.
  2. At a later time you decide that the load could be better split, and move the virtual print spooler to CL02
  3. You then clean up your print queues from the virtual server, also expecting that they will be automatically removed from AD.

In the scenario above, the print queue objects would not be removed from AD, as the physical node that owns the spooler (CL02) does not own the original print queue objects - as they were created when CL01 owned the resources. In this state, the invalid print queue objects will not be purged. Note that this is assuming you aren't using AD printer pruning - by disabling the spooler service on your DCs or using Group Policy to control pruning.

I'm unsure of the exact scenario that caused the duplicate print queue objects, presumably there was some problem finding the existing record, so at some point it was created off the other node as well - resulting in duplicate results in a search (both of which would work, but still).

Some low maintenance ideas to correct this problem:

  1. Use AD printer pruning, which will ensure print queue objects in AD are managed. Note that this sounds like the obvious solution, but does have caveats and may not suit all environments.
  2. Periodically remove published records from all but the designated primary node, toggle the published attribute on those printers no longer having a record in AD, causing the printers to be republished against the primary node. This could easily be scripted and scheduled
  3. Modify printer creation change control processes to ensure that new printers are only created and deleted when the preferred owner is hosting the virtual print server

In an ideal world, three above followed by one make the most sense, but if you needed option two you could do something like this:

  1. dsrm CN=%virtual_server%-%QueueName%,CN=%physical_server%,DC=domainRoot
  2. cscript prncfg.vbs -s -b \\%virtual_server%\%QueueName% -published
  3. cscript prncfg.vbs -s -b \\%virtual_server%\%QueueName% +published
  4. dsquery * -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName distinguishedname find /i "%QueueName%"

This removes the AD object against the 'incorrect' node, toggles the published flag (using prncfg from the Resource Kit Tools - see 'Network Printing Tools and Settings' reference below), and then queries AD to verify the printQueue object has been created.

Printer Pruning in AD

Pruning of printer objects in Active Directory is controlled either by the server that deletes the printer from its local spooler, or Domain Controllers through periodic printer pruning. Printer pruning is a domain/site-wide activity which processes all printQueue objects.

In a clustered solution, I believe when a Domain Controller looks up the printqueue objects, it will connect to the virtual print spooler node to verify the printers still exist. So regardless of which physical is publishing the printer, as long as the printer is contactable through the virtual server it shouldn’t be pruned.

As long as the spooler service is enabled on at least one Domain Controller, it will prune printers (at the default of 3x8 hour checks). There are risks of doing this, primarily that if the print server is down for longer than 24 hours (or if the DC can’t contact the server), all printers will be pruned from the directory. This logs an Event 50 for each pruned printer in the system event log of the DC that pruned the object - at least it’s easy to trace.

Printer Commands

Query and compare the printers published from each server to determine duplicates:

  • dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported > CL1.txt
  • dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported > CL2.txt
  • for /f "skip=1" %i in (CL1.txt) do @find /i "%i" CL2.txt

The following two commands help identify mismatches in printers published in AD versus those shared through the virtual print server.

Count the number of printers published in AD:

  • find /i /c "%virtual_server%" CL?.txt

The number of printers shared against a node:

  • rmtshare \\%physical_server% find /i "\\%virtual_Server%" /c

Query printers published against a physical server:

  • dsquery * "CN=%physical_server%,DC=domainRoot" -limit 0 -filter "(&(objectClass=printQueue)(objectCategory=printQueue))" -attr cn printerName driverName printCollate printColor printLanguage printSpooling driverVersion printStaplingSupported printMemory printRate printRateUnit printMediaReady printDuplexSupported

References:

Network Printing Tools and Settings
http://technet.microsoft.com/en-us/library/cc778201.aspx

Printer Pruner May Prune All the Print Queue Objects on Its Site
http://support.microsoft.com/kb/246906

Printer Pruner May Not Remove Printer Queue Objects from Active Directory
http://support.microsoft.com/kb/246174/

A server does not prune printers on a Microsoft Windows Server 2003-based server cluster
http://support.microsoft.com/kb/908128

Useful Windows Printer command-line operations:
http://waynes-world-it.blogspot.com/2008/09/useful-windows-printer-command-line.html

Wayne's World of IT (WWoIT), Copyright 2009 Wayne Martin.


Read more!

Saturday, January 17, 2009

Virtual 2003 MSCS Cluster in ESX VI3

This post shares a method I've used to create test-lab instances of standard 2003 file and print Microsoft Cluster Services (MSCS) clusters in a VMware ESX VI3 virtual environment. The resultant solution is not supported and definitely not production-ready, but if you want a real multi-node MSCS cluster in an ESX lab environment, this process might be helpful with a minimum set of requirements.

With my usual theme of repeatable command-line execution, most of these operations can be completed via the command-line, either in the ESX service console or a command-prompt from the virtual MSCS nodes.

I followed bits and pieces of the VMware supported method - which is very specific and quite restrictive. Note that I’m a little dubious that this cluster would be particularly stable – the SCSI reservations MSCS uses to lock disks are in no way supported when using a shared VMDK through a shared SCSI adapter (I think RDM is the only supported method), but it does work and at least provided me with a test environment.

The shared nothing model of 2003 MSCS clustering dictates that only one node accesses the partition at any one time, but the disk still needs to be visible to both nodes. A limitation of this solution is that both MSCS nodes need to be hosted on one ESX server – a requirement you could satisfy with a DRS rule to keep the two nodes together. However, if DRS decided to migrate both VMs, the cluster would almost certainly break during the failover (and possibly after).

If you follow the steps below, you should end up with two virtual x64 2003 enterprise servers, both members of a single MSCS cluster. In the cluster there will three shared disks (VMDKs), one for the quorum and one each for file and print – with a virtual server and relevant cluster resources. A test file share is created, along with drivers and a test printer. You'll need to modify the commands that reference the public adapter and IP addresses

Steps involved:

  1. Create an area for storage of the shared disk on your datastore:
    1. mkdir /vmfs/volumes/%datastore%/cluster01
  2. Create a 5GB quorum disk:
    1. vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-Quorum.vmdk
  3. Create a 5GB disk for shared data:
    1. vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-disk01.vmdk
  4. Create two 2003 x64 enterprise virtual machines, either through cloning, deployment with templates or whatever your standard build process may be
  5. If cloning was used, run sysprep on both nodes to give a unique SID and join your lab domain
  6. Shutdown the first node and add the shared disk
    1. Add the quorum disk, mounted under scsi 1:0 (which adds a new SCSI adapter)
    2. Set the newly created SCSI Adapter to SCSI bus sharing virtual
    3. Add disk01, attached as scsi 1:1
  7. In the first VM, use disk administrator (or diskpart) to initialise the quorum and disk01 disks, partitioned with basic. Record the signature of the disk and the drive letter used (although this is the disk volume when the disk is owned by the OS, not the cluster).
  8. Add a service account for the cluster service:
    1. dsadd user "CN=clustersvc,CN=Users,DC=test,DC=local" -pwdneverexpires yes -pwd password -disabled no -desc "MSCS VM cluster service account"
    2. Ensure the service account is an administrator of each virtual 2003 node
  9. Use Cluster Administrator to install the cluster on the first node, with your chosen cluster name, using the created quorum disk and service account
  10. Verify correct operation of the single-node cluster, and then add the second VM node to the cluster.
  11. Create a new port group to allow a second private adapter on each ESX server:
    1. esxcfg-vswitch -A MSCS-Private Private
    2. Add a second interface to each VM cluster node, allocated separate address space
    3. Verify connectivity (ping) and configuration following cluster best practices (no gateway, no DNS etc)
    4. Mark as a private heartbeat connection for the cluster, prioritised above the LAN connection.
  12. Create a virtual resource group, creating IP, network name and disk resources in the group, the following commands will create a group called v01, in the lab01 cluster. For these steps, you’ll need the drive letter to use (M: below), the disk signature, the public network name, IP Address and subnet mask of the virtual server being created:
    1. cluster /cluster:lab01 group "v01" /create
    2. cluster /cluster:lab01 res "v01 Disk01" /create /group:"v01" /type:"physical disk"
    3. cluster /cluster:lab01 res "v01 Disk01" /priv Drive="M:"
    4. cluster /cluster:lab01 res "v01 Disk01" /priv signature=0x%disksignature%
    5. cluster /cluster:lab01 res "v01 Disk01" /prop Description="M: disk01"
    6. cluster /cluster:lab01 res "v01 Disk01" /On
    7. cluster /cluster:lab01 res "v01 IP" /create /group:"v01" /type:"IP Address"
    8. cluster /cluster:lab01 res "v01 IP" /priv Network="%publicNetwork%"
    9. cluster /cluster:lab01 res "v01 IP" /priv Address=192.168.10.10
    10. cluster /cluster:lab01 res "v01 IP" /priv SubnetMask=255.255.255.0
    11. cluster /cluster:lab01 res "v01 IP" /priv EnableNetBIOS=1
    12. cluster /cluster:lab01 res "v01 IP" /priv OverrideAddressMatch=0
    13. cluster /cluster:lab01 res "v01 IP" /AddDep:"v01 Disk01"
    14. cluster /cluster:lab01 res "v01 IP" /On
    15. cluster /cluster:lab01 res "v01" /create /group:"v01" /type:"Network Name"
    16. cluster /cluster:lab01 res "v01" /priv RequireKerberos=1
    17. cluster /cluster:lab01 res "v01" /AddDep:"v01 IP"
    18. cluster /cluster:lab01 res "v01" /priv Name="v01"
    19. cluster /cluster:lab01 res "v01" /On
  13. Install ABEUIamd64.msi on each node if Access Based Enumeration is required
  14. To create a test directory, share and ABE resource on the new virtual server on the cluster (v01):
    1. md \\v01\m$\Dir01
    2. cluster /cluster:lab01 res "v01 Dir01 Share" /create /group:"v01" /type:"File Share"
    3. cluster /cluster:lab01 res "v01 Dir01 Share" /priv path="M:\Dir01"
    4. cluster /cluster:lab01 res "v01 Dir01 Share" /priv Sharename=Dir01
    5. cluster /cluster:lab01 res "v01 Dir01 Share" /priv Remark="Dir01 File Share"
    6. cluster /cluster:lab01 res "v01 Dir01 Share" /prop Description="Dir01 File Share"
    7. cluster /cluster:lab01 res "v01 Dir01 Share" /priv security=Everyone,grant,F:security
    8. cluster /cluster:lab01 res "v01 Dir01 Share" /AddDep:"v01"
    9. cluster /cluster:lab01 res "v01 Dir01 Share" /AddDep:"v01 Disk01"
    10. cluster /cluster:lab01 res "v01 Dir01 Share" /On
    11. cluster /cluster:lab01 res "v01 Dir01 ABE" /create /group:"v01" /type:"Generic Application"
    12. cluster /cluster:lab01 res "v01 Dir01 ABE" /priv CommandLine="cmd.exe /k abecmd.exe /enable Dir01"
    13. cluster /cluster:lab01 res "v01 Dir01 ABE" /priv CurrentDirectory="%SystemRoot%"
    14. cluster /cluster:lab01 res "v01 Dir01 ABE" /priv InteractWithDesktop=0
    15. cluster /cluster:lab01 res "v01 Dir01 ABE" /priv UseNetworkName=0
    16. cluster /cluster:lab01 res "v01 Dir01 ABE" /prop SeparateMonitor=1
    17. cluster /cluster:lab01 res "v01 Dir01 ABE" /prop Description="Access Based Enumeration for Dir01 File Share"
    18. cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01"
    19. cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01 Disk01"
    20. cluster /cluster:lab01 res "v01 Dir01 ABE" /AddDep:"v01 Dir01 Share"
    21. cluster /cluster:lab01 res "v01 Dir01 ABE" /On
  15. Additional shared cluster disks can be created as required, eg:
    1. vmkfstools -d thick -a lsilogic -c 5G /vmfs/volumes/%datastore%/cluster01/MSCS-disk02.vmdk
    2. Add the disks to one node, (scsi 1:2 in this example). Initialise and allocate in the cluster (as in step 7 above)
  16. To create a virtual print server (assuming you’ve mounted disk02 from step 15 for use in the cluster):
    1. cluster /cluster:lab01 group "v02" /create
    2. cluster /cluster:lab01 res "v02 Disk02" /create /group:"v02" /type:"physical disk"
    3. cluster /cluster:lab01 res "v02 Disk02" /priv Drive="P:"
    4. cluster /cluster:lab01 res "v02 Disk02" /priv signature=0x%disksignature%
    5. cluster /cluster:lab01 res "v02 Disk02" /prop Description="P: print01"
    6. cluster /cluster:lab01 res "v02 Disk02" /On
    7. cluster /cluster:lab01 res "v02 IP" /create /group:"v02" /type:"IP Address"
    8. cluster /cluster:lab01 res "v01 IP" /priv Network="%publicNetwork%"
    9. cluster /cluster:lab01 res "v01 IP" /priv Address=192.168.10.11
    10. cluster /cluster:lab01 res "v01 IP" /priv SubnetMask=255.255.255.0
    11. cluster /cluster:lab01 res "v02 IP" /priv EnableNetBIOS=1
    12. cluster /cluster:lab01 res "v02 IP" /priv OverrideAddressMatch=0
    13. cluster /cluster:lab01 res "v02 IP" /AddDep:"v02 Disk02"
    14. cluster /cluster:lab01 res "v02 IP" /On
    15. cluster /cluster:lab01 res "v02" /create /group:"v02" /type:"Network Name"
    16. cluster /cluster:lab01 res "v02" /priv RequireKerberos=1
    17. cluster /cluster:lab01 res "v02" /AddDep:"v02 IP"
    18. cluster /cluster:lab01 res "v02" /priv Name="v02"
    19. cluster /cluster:lab01 res "v02" /On
  17. Create v02 print spooler:
    1. cluster /cluster:lab01 res "v02 Spooler" /create /group:"v02" /type:"print spooler"
    2. cluster /cluster:lab01 res "v02 Spooler" /priv DefaultSpoolDirectory="P:\Spool"
    3. cluster /cluster:lab01 res "v02 Spooler" /prop Description="v02 Print Spooler"
    4. cluster /cluster:lab01 res "v02 Spooler" /AddDep:"v02 Disk02"
    5. cluster /cluster:lab01 res "v02 Spooler" /AddDep:"v02"
    6. cluster /cluster:lab01 res "v02 Spooler" /On
  18. On v02, add a standard Laserjet 4000 retail driver for x64 and x86, run from a cluster node:
    1. rundll32 printui.dll,PrintUIEntry /ia /c \\v02 /m "HP LaserJet 4000 Series PCL6" /h "x64" /v "Windows XP and Windows Server 2003"
    2. rundll32 printui.dll,PrintUIEntry /ia /c \\v02 /m "HP LaserJet 4000 Series PCL6" /h "x86" /v "Windows 2000, Windows XP and Windows Server 2003"
  19. Create a test printer on v02 called printer01 using the LJ 4000 driver, with a record in DNS, published in AD, set to duplex by default, with customised permissions using the standard winprint processor:
    1. dnscmd %DNSserver% /recordadd %zone% printer01 A 192.168.10.100
    2. cscript //nologo portmgr.vbs -a -c \\v02 -p printer01 -h 192.168.10.100 -t LPR -q printer01
    3. cscript //nologo prnmgr.vbs -a -c \\v02 -b printer01 -m "HP LaserJet 4000 Series PCL6" -r printer01
    4. cscript //nologo prncfg.vbs -s -b \\v02\printer01 -h printer01 -l "%Location%" +published
    5. setprinter.exe \\v02\printer01 8 "pDevMode=dmDuplex=2,dmCollate=1,dmFields=duplex collate"
    6. subinacl /printer \\v02\printer01 /grant=%domain%\%group%=F
    7. setprinter \\v02\printer01 2 pPrintProcessor="WinPrint"

References

VMware Support method of running MSCS clusters:
http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_mscs.pdf

Implementing an MSCS 2003 server cluster Cluster
http://waynes-world-it.blogspot.com/2008/03/implementing-mscs-2003-server-cluster.html

subinacl 5.2.3790.1180:
http://www.microsoft.com/downloads/details.aspx?FamilyID=E8BA3E56-D8FE-4A91-93CF-ED6985E3927B

Windows Server 2003 Resource Kit Tools:
http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-b18c4790cffd&DisplayLang=en


Wayne's World of IT (WWoIT), Copyright 2009 Wayne Martin.


Read more!

Thursday, December 11, 2008

MSCS 2003 Cluster Virtual Server Components

This post provides my interpretation of a simple MSCS 2003 virtual server with a file share, including how the cluster interacts with the OS and network services to provide access to the share. This follows on from the last post on low-level detail of file access in an attempt to provide a clearer picture of these often taken-for-granted components.

Note that this is only my opinion, based on less-than complete knowledge and more than likely contains semantic errors if nothing else.

File & Print Cluster Native x64 (EM64T/AMD64)

  1. Cluster Service. Includes Checkpoint Manager, Database Manager, Event Log Replication Manager, Failover Manager, Global Update Manager, Log Manager, Membership Manager, Node Manager.
    1. Operating System Interaction with the LanManServer Service which advertised shares.
    2. NetBIOS registration of the virtual server name through existing network services
    3. DNS registration of the virtual server name through existing network services
    4. Kerberos SPNs registered against an AD computer account through Active Directory
  2. Resource Monitor. Spawned child process of the cluster service, separate resource monitors can exist for resource DLLs
  3. ClusRes.dll Physical Disk <-> IsAlive/LooksAlive SCSI reservation and directory access. LooksAlive issues a SCSI reservation every 3 seconds through ClusDisk.sys against all managed disks. IsAlive performs a ‘dir’ equivalent
  4. ClusRes.dll Network Name <-> IsAlive/LooksAlive check on NetBT/DNS registration. LooksAlive relies on MSCS NIC failure detection. IsAlive queries local TCP/IP stack for virtual IP and the NetBT driver if NetBIOS is enabled
  5. ClusRes.dll IP Address <-> IsAlive/LooksAlive check on cluster NIC. LooksAlive queries NetBT driver and every 24 hours issues a dynamic DNS host record registration. If ‘DNS is required’ resource will fail if DNS registration fails. Same test for IsAlive
  6. ClusRes.dll Resource DLL File Shares <-> IsAlive/LooksAlive check on file share visibility. LooksAlive queries lanmanserver service for the share name. IsAlive does the same, and if unlimited users, the first file on the share is copied
  7. 32-bit Resource Monitor WOW64, Eg. Enterprise Vault Cluster application. Third-party cluster resources, eg Enterprise Vault, which in this case notifies the local FSA placeholder service on each physical node of virtual server changes
  8. ABE enabled by generic cluster resource. Access based enumeration with a generic cluster application running abecmd.exe during virtual server/share creation. Uses 32-bit cluster resource monitor with WOW64, setting SHI1005_FLAGS_ACCESS_BASED_DIRECTORY_ENUM (0x0800) flag set on the otherwise standard share.

Pretty picture view:



References

Server side processes for simple file access
http://waynes-world-it.blogspot.com/2008/11/server-side-process-for-simple-file.html

Access Based Enumeration
http://technet.microsoft.com/en-us/library/cc784710.aspx

SHARE_INFO_1005 Structure
http://msdn.microsoft.com/en-us/library/bb525404.aspx

Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Friday, November 7, 2008

Server-side process for simple file access

This post provides my interpretation of what happens at a low level when a user on a workstation tries to access a file on a server - in this case a Windows server 2003 x64 MSCS cluster. I was trying to demonstrate the complexity of what seems like such a simple action, and in particular trying to incorporate the cluster network/disk elements and highlighting the WOW64 side of things when you are running x64 with x86 third-party software (such as archiving, quotas etc).

Note that I'm reasonably confident this isn't correct because my understanding if lacking, combined with lack of time/information, but even if it were accurate in content, the step-by-step / flowchart views aren't the best way to represent these multi-layered processes. But anyway, somebody else might also think it interesting (nerds!)


  1. Workstation redirector file access to a virtual node in the cluster. Includes DNS/NetBIOS calls to determine the cluster virtual server IP address
  2. LanManServer service. User-mode LAN Manager 2.x server service, providing file and print sharing, and with 2003 SP1, Access Based Enumeration
  3. NDIS layer. Network Driver Interface Specification, Hardware interrupts passing frames to the NIC driver, then passed to the bound transport driver. send and receive raw packets, includes LLC
  4. TDI Layer. Single interface for upper-level clients to access transport providers, such as TCP/IP
  5. ClusNet.sys Driver. Cluster specific driver interpreting and routing intracluster traffic and determining communication failure
  6. Srv.sys Server Service. SMB file server driver, kernel-mode companion to the LanManServer service
  7. I/O Manager. I/O support routines, I/O Request Packets (IRPs) and Fast I/O interfaces
  8. FSD Manager. File System Driver Manager, loads FSDs and legacy file system filters, interacts with the file system cache manager.
  9. Filter Manager. A file system filter that provides a standard interface for minifilters, managing altitudes and connectivity to the file system stack. Interacts with the file system cache manager, and in the case of x86 filters, an instance of WOW64 controls the runspace for the filters (on an x64 platform)
  10. File System Cache Manager. A file system filter, working with kernel memory-mapped and paging routines to provide file level caching
  11. ntfs.sys File System FSD. Windows NT File System driver, creates a Control Device Object (CDO) and Volume Device Object (VDO)
  12. ClusDisk.sys upper-level storage filter driver. Cluster specific storage filter driver maintaining exclusive access to cluster disks using SCSI reserve commands
  13. Volume Snapshot Volsnap.sys. Manages software snapshots through a standard storage filter
  14. Volume Manager ftDisk.sys. Presents volumes and manages I/O for basic and dynamic disk configurations
  15. Partition Manager Partmgr.sys. Filter driver that sits on top of the disk driver, creates partition devices, notifies volume manager and exposes IOCTLs
  16. Class Driver disk.sys. Presents a standard disk interface to the storage stack, creating a Function Device Object (FDO)
  17. Storage Port Driver - Storport. Assists with PnP and power functionality, providing a Physical Device Object (PDO) for the device->bus connection
  18. Miniport Driver. Interface to the storage adapter’s hardware, combining with the storport driver to create the storage stack
  19. SMB response to the client. SMB response to the redirector on the workstation requesting the file
Pretty picture view:



Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Monday, September 8, 2008

Useful Windows MSCS Cluster command-line operations

The commands below are a subset of the complete command list found in Useful command-lines, and are command-line operations for Microsoft Windows MSCS server clusters. Most commands are based around the Microsoft cluster.exe utility, with some using WMI, defrag and diruse to provide information on cluster disk resources.

Each command-line can be copied and pasted at the command prompt, if you use a batch file you'll need to reference variables with double-percent (%%).


Find cluster disk size and free space in CSV format
wmic /node:"%server%","%server%","%server%","%server%" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name,Size,FreeSpace,VolumeName /format:csv

Find cluster disk size and free space in modified CSV format with thousand sep.
wmic /node:"%server%","%server%","%server%","%server%" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET Name,Size,FreeSpace,VolumeName /format:csv2

Report the windows MSCS cluster virtual groups
cluster /cluster:%cluster% group /prop | find /i "description" | find /i /v "pbx" | find /i /v "cluster"

Report folders being archived from Enterprise Vault EV FSA
sqlcmd -S sqlServer%\%instance% -o ArchivedFolders.txt -d %enterprisevaultdirectory% -W -s "," -Q "select FSVP.UncName, FSVP.VolumeName, FSFE.FolderPath, FSVP.UncName + '\' + FSVP.VolumeName + '\' + FSFE.FolderPath as 'Path' from dbo.FileServerFolderEntry FSFE inner join dbo.vw_FileServer_Volume_Policy FSVP on FSFE.VolumeEntryID = FSVP.VolumeEntryID"

Report folders from the one or more servers not being archived compared to FSA export
for %i in (\\%server%\share% \\%server%\share% ) do @for /f "tokens=1-4,*" %m in ('"dir %i\* /ad /tc | find "DIR" | find "-""') do @find /i "%q" ArchivedFolders.txt >nul & @If errorlevel 1 (echo %q,%i,%m %n %o) >> NotArchived.csv

Delete a cluster resource type
cluster restype "%resource_name%" /delete /type

Find cluster disk size and free space
echo clusnode1 > clusternodes.txt & echo clusnode2 >> clusternodes.txt & echo clusnode3 >> clusternodes.txt & echo clusnode4 >> clusternodes.txt & wmic /node:@clusternodes.txt path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name,Size,FreeSpace,VolumeName

show the MSCS cluster multicast address properties
cluster /cluster:%Cluster% network "%PublicNetwork%" /priv

Find the MSCS cluster resources
cluster /cluster:%Cluster% res /prop find /i "sr"

Find the disks currently owned by each cluster node
for %i in (%server1% %server2%) do @wmic /node:"%i" path Win32_LogicalDisk WHERE "FileSystem='NTFS' AND Name != 'C:' AND Name != 'D:'" GET SystemName,Name find /i "%server_prefix%"

In a 2003 cluster, find each disk volume and analyse file fragmentation
for /f "tokens=2,5,6,8" %i in ('"cluster /cluster:%cluster% resource /prop find /i "disk" find /i "description" find /i "%CommonTag%""') do echo \\%i\%k %j %l>> Defrag_%i_%j.txt && psexec \\%i defrag %k -a -v >> Defrag_%i_%j.txt

From cluster defrag analysis, print out details for each cluster volume
for /f "tokens=1,* delims=:" %i in ('"findstr /i /c:%server% /c:"Total files" /c:"Volume size" /c:"Used space" /c:"Percent free space" /c:"Total fragmented files" defrag*"') do @echo %j

Create a cluster file share:
cluster /cluster:%cluster% res "%share_res_name%" /create /group:"%group%" /type:"File Share"
cluster /cluster:%cluster% res "%share_res_name%" /priv path="%path%"
cluster /cluster:%cluster% res "%share_res_name%" /priv Sharename=%share_name%
cluster /cluster:%cluster% res "%share_res_name%" /priv Remark="File Share Remark"
cluster /cluster:%cluster% res "%share_res_name%" /prop Description="File Share Description"
cluster /cluster:%cluster% res "%share_res_name%" /priv security=Everyone,grant,F:security
cluster /cluster:%cluster% res "%share_res_name%" /AddDep:"%networkname_res%"
cluster /cluster:%cluster% res "%share_res_name%" /AddDep:"%disk_res%"
cluster /cluster:%cluster% res "%share_res_name%" /On

Create an ABE resource for the file share
cluster /cluster:%cluster% res "%shareabe_res_name%" /create /group:"%group%" /type:"Generic Application"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv CommandLine="cmd.exe /k abecmd.exe /enable %share_name%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv CurrentDirectory="%SystemRoot%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv InteractWithDesktop=0
cluster /cluster:%cluster% res "%shareabe_res_name%" /priv UseNetworkName=0
cluster /cluster:%cluster% res "%shareabe_res_name%" /prop SeparateMonitor=1
cluster /cluster:%cluster% res "%shareabe_res_name%" /prop Description="Access Based Enumeration for %share_name% File Share"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%networkname_res%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%disk_res%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /AddDep:"%share_res_name%"
cluster /cluster:%cluster% res "%shareabe_res_name%" /On



Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Sunday, June 22, 2008

FSRM and NTFS Quotas in 2003 R2

This post discusses several methods of using File Server Resource Manager (FSRM) auto-quotas with a single share for many home directories, and how you can bypass the limitation with FSRM quotas over SMB and return a reduced amount of disk space through the single share. The two methods discussed are reparse points, and combined FSRM and NTFS quotas.

There is an inherent problem with FSRM quotas in Windows Server 2003 R2 – when accessed remotely, a hard quota is used to report disk free space to the client only when a quota is set on the root of the disk or share. The share overwrites volume root if both have hard quotas set.

Unfortunately this is not practical in this scenario, as the free space from the quota root down will be affected by a hard quota. For example, a hard quota set on the root of the share, where that share contains user home directories, the total space would be limited based on the quota, rather than limiting each home directory. No method could be found to prevent inheritance of a quota setting to sub-folders.

Note that this does not occur when accessing the quota locally on a machine; the problem exists due to the SMB call for QUERY_FS_INFO is querying the free space at the root, not the free space at the folder (historically there was no difference). File screening has the capability to include a blocking exception entry deeper in the tree to override policies above, but quotas do not have the same interface through the GUI.

The following methods were tried (and failed) to see if there was an easy workaround for this issue:

However, if this functionality is required, there are at least two methods to work around the problem – using reparse points or using a combination of NTFS quotas and FSRM quotas.

Reparse Points

Testing was conducted to see whether reparse points, junctions, mount points or symbolic links could be used to return a different amount of free space from the root of the volume compared to the quota applied to each home drive folder.

Using one directory junction, one share, one hard quota and one autoquota, it is possible to use FSRM R2 quotas to report the free disk space based on a hard quota at a root folder, while still providing different per-folder quotas.

For example, in the following scenario, it’s possible to report a reduced disk free space limit, using only FSRM quotas and a directory junction point on the same volume.

  1. Cluster share Root: f:\QuotaTest - file://server/QuotaTest
  2. User Home Root: file://server/f$/users
  3. User home drive: \\server\quotatest\junction\user1 (f:)
  4. FSRM Hard Quota on the share root: 10MB
  5. FSRM Hard or Soft autoquota on the home directory root: 20MB
  6. Junction Directory: f:\quotatest\junction
  7. Junction Target: f:\users
  8. Create the directory junction/reparse point: junction.exe f:\quotatest\junction f:\users

Tests completed under this scenario from a workstation:

  1. Directory of H: on reports 10MB free space, based on the hard quota set at the root of the share
  2. Explorer view of H: reports 10MB free space, with the drive mapped through the junction (AD)
  3. Copy a 13MB file to H: succeeds, still 10MB reported free, FSRM warning triggered based on 50% usage (of the 20MB)
  4. Copy another 13MB file to H: fails, 20MB hard autoquota set on h:\users prevents copy

Notes:

  1. Apparently Windows Vista clients using SMB 2.0 do not have this issue
  2. Windows 2000 and later support directory junctions – reparse points. When accessing a reparse point, the processing occurs on the server, unlike Vista/2008 which has a modified MUP and network redirector architecture, supporting client-side processing of file and directory symbolic links.
  3. This still has at least one major disadvantage in that free space will not change for users, they would always see the free space available at the root of the share, 10MB in the example above. However, if hard FSRM autoquotas were used without this method, the free space reported to users would be the total free space on the volume, regardless of the 10MB hard limit that they would be limited to. This is potentially confusing in both scenarios.

Combined FSRM and NTFS quotas

Being completely different technologies, it doesn’t seem that NTFS quotas and FSRM quotas conflict with each other. Therefore one method of providing soft/hard FSRM quotas and also reducing the disk space seen by users is to also use NTFS hard quotas.

There are several caveats with this approach:

  1. NTFS quotas are only relevant for user-owned data, where each user has data in one directory, ideal for home directories, but not suitable for shared data directories.
  2. The two different quota systems would have to be separately maintained and aligned as configuration changes in the other. While all users conform to the standard template this would not be challenging, but as individual quotas are changed this will become problematic (as always happens).

Overall this solution provides a more realistic disk-free result for each user, provided the FSRM hard quota matches the NTFS hard quota, and file ownership is correctly set.

The following testing was completed with FSRM and NTFS quotas working together in a 2003 MSCS cluster:

  1. Hard NTFS quota of 15MB
  2. Soft auto-quota of 20MB
  3. Writing a file using user1 to the H: drive, automatically creates a quota entry in NTFS quotas
  4. Writing a second file which takes it over 10MB (50%), the FSRM quota event/command takes place
  5. The user doing a directory of the filesystem reports only the NTFS hard quota disk free space.
  6. Trying to copy another file as user1 to the H: drive fails with not enough disk space according to the hard NTFS quota
  7. Moved the cluster group to verify this follows on a cluster
  8. After the group was moved to another server, conducted same tests, NTFS quotas still apply and hard limites being returned to the client as total space.


Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Sunday, May 25, 2008

Automated Cluster File Security and Purging

If you have a cluster share that contains temporary data in separate top-level directories, this post may help you automate the security and purging of that shared data. This is useful for transient data such as drop directories for scanners and faxes, or scratch directories for general sharing.

To summarise, this will provide:

  1. A cluster-based scheduled task that runs each day, dependant on the network name and physical disk resource currently hosting the directory
  2. A batch file run by the scheduled task that secures each directory, and purges files older than 30 days, logging results to the physical node hosting the resource.

Creating the Scheduled Task

  1. Create the scheduled task cluster resource:
    cluster /cluster:%cluster% res "%resource_name%" /create /group:"%cluster_group%" /type:"Volume Shadow Copy Service Task"
    cluster /cluster:%cluster% res "%resource_name%" /priv ApplicationName="cmd.exe"
    cluster /cluster:%cluster% res "%resource_name%" /priv ApplicationParams="/c c:\admin\SecureAndPurge.bat"
    cluster /cluster:%cluster% res "%resource_name%" /priv CurrentDirectory=""
    cluster /cluster:%cluster% res "%resource_name%" /prop Description="%resource_name%"
    cluster /cluster:%cluster% res "%resource_name%" /AddDep:"%network_name_resource%"
    cluster /cluster:%cluster% res "%resource_name%" /AddDep:"%disk_resource%"
    cluster /cluster:%cluster% res "%resource_name%" /On
    cluster /cluster:%cluster% res "%resource_name%" /prop RestartAction=1
  2. Set the schedule for the cluster resource:
    • Use the cluster administrator GUI, this cannot currently be set with cluster.exe with the VSS scheduled task cluster resource
  3. Restart the resource to pickup the schedule change:
    cluster /cluster:%cluster% res "%resource_name%" /Off
    cluster /cluster:%cluster% res "%resource_name%" /On

Note that the cluster resource providing scheduled task capability is the ‘Volume Shadow Copy Service Task’ resource. This is a recommended solution from Microsoft for providing scheduled task capability on a cluster. See the ‘Cluster Resource’ document in the references below.

The LooksAlive and IsAlive functions for the VSSTask.dll simply check that the scheduled task is known to the local task scheduler. To further reduce the impact of resource failure, the resource should be marked as not affecting the cluster, preventing potential failover if this task were to fail more than three times (by default).

The scheduled task should run a simple batch file on the local disk of the cluster node. Keeping the batch file local further reduces the risk that problems with the batch file could cause the cluster group to fail. The theory is that if the batch file is on local disk, it can be modified/deleted before bringing the cluster resources online.

Creating the batch file

Create a batch file and set some environment variables for %directory%, %purgeDir%, %domain%, %logFile%, %AdminUtil%, %FileAge% to fit your environment, and then include at least the three commands below:

  • Set the security on each directory within the directory. Note that this assumes that for each directory, there is a matching same-named security group, prefixed with l (for local), eg lDirectory1.

    for /d %%i in (%Directory%\*) do cacls %%i /e /g %Domain%\l%%~ni:C >> %LogFile%
  • Move the files with robocopy that are older than %FileAge% days:

    %AdminUtil%\robocopy %Directory% "%PurgeDir%" *.* /minage:%FileAge% /v /fp /ts /mov /e /r:1 /w:1 /log+:%LogFile%
  • Delete the files that were moved:

    If Exist "%PurgeDir%" rd /s /q "%PurgeDir%"

Note that depending on the size of data, you might want to ensure that the purgedir is on the same volume as the source files, which won't use any disk space as the files are moved. If the purgedir was on a different drive you would temporarily need as much free space as the size of data being purged.

References:

Cluster resource
http://technet2.microsoft.com/windowsserver/en/library/f6b35982-b355-4b55-8d7f-33127ded5d371033.mspx?mfr=true

Volume Shadow Copy Service resource type
http://technet2.microsoft.com/windowsserver/en/library/bc7b7f3a-d477-42b8-8f2d-a99748e3db3b1033.mspx?mfr=true

Using Shadow Copies of Shared Folders in a server cluster
http://technet2.microsoft.com/windowsserver/en/library/66a9936d-2234-411f-87b4-9699d5401c8c1033.mspx?mfr=true

Scheduled task does not run after you push the task to another computer
http://support.microsoft.com/kb/317529

Scheduled Task for the Shadow Copies of Shared Folders Feature May Not Run on a Windows Server 2003 Cluster
http://support.microsoft.com/kb/828259

Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows Server Clustering component of Windows Server 2003
http://support.microsoft.com/kb/914458

Generic Cluster-enabled Scheduled Tasks:
http://waynes-world-it.blogspot.com/2008/04/2003-cluster-enabled-scheduled-tasks.html



Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Tuesday, May 20, 2008

Access Based Enumeration in 2003 and MSCS

This post provides an overview of Access Based Enumeration on Windows Server 2003, some limitations, advantages and information on controlling ABE in an Windows MSCS environment.

With a standard Windows file server, for users to map the share and view the directories they have access to, all users require the list directory right at the root of the share. The client would then see all directories, and receive access denied errors if they try to navigate to any sub-folder without NTFS access.

To provide access control similar to Netware, Microsoft have introduced Access Based Enumeration in Windows Server 2003 SP1. This provides a method of displaying only files and folders that a user has access to, rather than displaying all objects in the tree.

The best description I can give is that ABE will hide what you don't have access to, as opposed to ensuring you can see what you do have access to. For example, if you have .\A .\A\B and .\A\B\C, and you have access to C but you don't have access to B, C will be hidden through the explorer GUI.

While this allows for a seamless migration from Netware-based file servers, there are several potential limitations:

  • ABE has to be installed as a separate component to the Operating System, which must be documented and managed for implementation and recovery scenarios.
  • ABE is not cluster-aware, and as enabling ABE is a per-share operation, cluster failovers resulting in the dynamic creation of shares on a physical cluster node will not create ABE-enabled shares. A generic cluster application could be created to enable ABE on all shares as they are created on each cluster node. This is non-standard and not ideal.
  • Increased CPU usage on the file server and slower response times to the client, processing the file data to provide information to the client on which files and directories are visible. Depending on the algorithm used, the size and depth of data structures and file system security, this may be an issue.
  • There are known issues with DFS and Access Based Enumeration
  • There are known issues with the cache on multi-user workstations, which will provide a view of any files and directories that have been viewed by any user of a computer. Client cache characteristics such as retention and location are not known.

Advantages:

  • If looking at migrating from Netware to Microsoft file sharing, the migration will be seamless for users. The file/directory structure and security will be similar, along with end-user access.
  • Using ABE is a documented solution for managing the sharing and access control for clustered home folders, rather than using the share sub-directories feature of MSCS.

Notes:

  • There are known issues with backup applications that do not use the ‘back up files and directories’ right backing up data through standard file shares.
  • It is unknown whether navigating with explorer to a top-level directory causes server-side processing of all files/folders in the share to determine the access path to all items in the tree, or whether the algorithm will process per-directory. This may be relevant in determining the test-cases to apply to assess performance. Based on testing, it is assumed that if access is denied at a top-level, sub-folders and files are not processed deeper in that branch.
  • Windows Server 2008 includes cluster-aware ABE, enabled by default on all shares. Going forward this minimises the risk that this is a non-standard solution.

Controlling ABE in a cluster environment

Access Based Enumeration is controlled through SMB share settings for each instance of the lanmanserver service – each physical node in the cluster. These settings are not cluster-aware, and will be lost during a cluster fail-over operation.

To ensure that ABE follows virtual cluster nodes during failovers, a generic cluster application can be created, running the abecmd.exe to verify that ABE is enabled after failover. The cluster application will be dependant on the file share resource, run for each file share.

Resource Type - Generic Cluster Application
Resource Name - <share> ABE
Description - <share> ABE
Dependencies - <share>
Command - cmd.exe /k abecmd.exe /enable <share>
Current Directory - %SystemRoot%
Interact with Desktop - De-selected

Notes:

  1. This assumes abecmd.exe is available in the path of each physical cluster node (this is the case when you install the Microsoft package).
  2. The /k switch is required to ensure that the cmd.exe application remains open, after the abecmd.exe process terminates. This ensures that the cluster resource monitor does not detect the resource as failed. This also leaves a command shell running for each instance of ABE being enabled, certainly not ideal and potentially misleading.

Other solutions considered

Other solutions I've considered for controlling ABE in a Windows file and print cluster environment included:

  1. Creating a generic cluster application dependent on all shares within a particular group, using the 'abecmd.exe /all' method. This is potentially less maintenance, but does not offer granular control of particular shares
  2. Creating a generic script resource type, with a VBScript using the supported cluster entry points to enable ABE on shares when they are created. This requires VBScript knowledge to create and maintain the solution, as opposed to a standard Microsoft provided executable.
  3. Creating an external watcher than determine cluster failovers and share re-creation, running the appropriate abecmd.exe commands as required. This requires an external server process, either a compiled application or script, watching the cluster, not intuitive and adding a single point of failure
  4. Controlling Access Based Enumeration through Group Policy. Third-party software exists to control the enforcement of Access Based Enumeration to file servers. However, unless the scheduled GPO refresh period was very regular, this would not be relevant in a failover cluster scenario.

CPU usage and end-user response times

The biggest concern is response times, as the Microsoft whitepaper on ABE determines that with 150,000 files, the operation of ‘reading shared directories’ increase from 12 seconds to up to 58 seconds. However, there is no detail on the type of test performed or the hardware used.

On a 2003 cluster with several million files and more than a terabyte of data, no performance impacts have been noticed when accessing folder structures through shares with ABE enabled.

References:

Access Mask:
http://msdn2.microsoft.com/en-us/library/ms790780.aspx

Access Based Enumeration whitepaper:
http://www.microsoft.com/windowsserver2003/techinfo/overview/abe.mspx

Access Based Enumeration:
http://technet2.microsoft.com/WindowsServer/en/library/f04862a9-3e37-4f8c-ba87-917f4fb5b42c1033.mspx

Enabling ABE with DFS:
http://support.microsoft.com/kb/907458

Implementing home folders on a cluster server:
http://support.microsoft.com/kb/256926

Windows Server 2008 failover clustering:
http://technet2.microsoft.com/windowsserver2008/en/library/13c0a922-6097-4f34-ac64-18820094128b1033.mspx?mfr=true

Scripting Entry Points:
http://msdn2.microsoft.com/en-us/library/aa372846.aspx

Create a generic application resource type:
http://technet2.microsoft.com/windowsserver/en/library/ad0bd83d-6144-45b5-8dda-3e599d1edfdb1033.mspx

Generic script cluster resource:
http://www.microsoft.com/windowsserver2003/technologies/clustering/resources.mspx
http://msdn2.microsoft.com/en-us/library/aa369599.aspx
http://msdn2.microsoft.com/en-us/library/aa373089.aspx





Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Friday, May 16, 2008

Comparing MSCS/VMware/DFS File & Print

The following table shows information I was using to compare various Windows HA file and print solutions, including MSCS, VMware, VMware+MSCS, DFS, VMware+DFS and stand-alone servers. There are no recommendations, and most need to be adjusted or at least considered for your environment, but it might help crystalise your thoughts as it did mine.


Comparison Microsoft server ClusteringVMware HAMicrosoft Clustering on VMware HADFSStand-alone server(s)VMware HA with DFS for file shares
Highly AvailableYYYNNY
Satisfies SLAs??????
Maximum nodes8Limited by host hardware2N/AN/ALimited by host
Failover time<2 minutesVMotion or server startup time<2 minutesSPFSPFVMotion or server startup time
Single server Disaster Recovery – Software Failure<2 minutesSnapshot, server rebuild or manual recovery procedure<2 minutesSPFSPFpshot, server rebuild or manual recovery procedure
Single server Disaster Recovery – Hardware Failure<2 minutes< 30 seconds< 30 secondsSPFSPF< 30 seconds
Licensing2003 Enterprise per nodeDataCentre + CALs (depending on VM design)DataCentre + CALs (depending on VM design)2003 Standard2003 Standard2003 Standard
Hardware Failure – Data CommunicationsSingle/teamed NIC for prod interfaceNIC redundancy depending on virtual solution NIC redundancy depending on virtual solution + cluster-specific requirementsTeamed NICTeamed NICNIC redundancy depending on virtual solution
Hardware Failure – HBASingle HBA per nodeHBA redundancy depending on virtual solutionHBA redundancy depending on virtual solutionSingle HBASingle HBAHBA redundancy depending on virtual solution
OS Disk ConfigurationBasic DynamicBasicDynamicDynamicDynamic
Hardware utilisationPhysical serversVirtual serversVirtual serversPhysical serversPhysical serversVirtual servers
Cost allocationCost model requiredCost model requiredCost model requiredPer server/LUNPer server/LUNCost model required
Scalability/Flexibility – adding new nodes/LUNsYYYNNY, DFS for file
ManageabilityMSCS skills requiredVMware skills requiredComplex combination of both technologiesDFS skills requiredExisting skills, but increased per serverDFS and VMware skills required
User access to shares via UNCSingle nameMultiple namesSingle nameSingle nameMultiple namesDFS namespace
Future proofing – migration to new hardware/OSModerately complicated migrationRelatively simple upgrade path, reattaching LUNs or adding another VMModerately complicated migrationRelatively simpleRelatively simpleRelatively simple
Hardware on Vendor HCL??????
Backup/restore?Standard file backupVCB or ?Standard file backupStandard file backupVCB or ?
Printer administrationSimplified with Cluster 2003Duplicated effort on each print serverSimplified with Cluster 2003N/ADuplicated effort on each print serverDuplicated effort on each print server
Service and Event MonitoringCluster monitoring requiredStandard monitoring for virtual servers, host monitoring requiredCluster monitoring for virtual servers, VMware host monitoring requiredStandard monitoringStandard monitoringStandard monitoring for virtual servers, host monitoring required


1. Basic disks on a Microsoft server cluster can be extended if new space is visible on the LUN. The disks cannot be dynamic in MSCS.
See http://technet2.microsoft.com/windowsserver/en/library/cd4d0a84-6712-4fbc-b099-2e8fefeb694c1033.mspx?mfr=true


Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Wednesday, April 16, 2008

2003 Cluster-enabled scheduled tasks

Creating a cluster-aware scheduled task has several benefits and has historically been quite difficult. The volume shadow copy service task resource type in Windows Server 2003 clusters provides a mechanism to allow scheduled task capability as a cluster resource. Despite the name, this resource type seems to be a generic cluster resource that provides access to the standard task scheduler interface to schedule and run any command within a resource group.

This post provides information on creating the cluster resource using the cluster.exe command-line interface, some best practices (in my opinion) - preventing this resource from affecting the group, network and disk dependencies, using local scripts and some background in the LooksAlive/IsAlive functions provided by this resource type.

  1. Create the scheduled task cluster resource:

    cluster /cluster:%Cluster% res "TaskName" /create /group:"BNE-VFP03-CL4" /type:"Volume Shadow Copy Service Task"
    cluster /cluster:%Cluster% res "%TaskName%" /priv ApplicationName="cmd.exe"
    cluster /cluster:%Cluster% res "%TaskName%" /priv ApplicationParams="/c Command-Batch-Or-Script"
    cluster /cluster:%Cluster% res "%TaskName%" /priv CurrentDirectory=""
    cluster /cluster:%Cluster% res "%TaskName%" /prop Description="Task Description"
    cluster /cluster:%Cluster% res "%TaskName%" /AddDep:"%NetworkName%"
    cluster /cluster:%Cluster% res "%TaskName%" /AddDep:"%PhysicalDisk%"
    cluster /cluster:%Cluster% res "%TaskName%" /prop RestartAction=1
    cluster /cluster:%Cluster% res "%TaskName%" /On

  2. Set the schedule for the cluster resource:

    Use the cluster administrator GUI, this cannot currently be set with cluster.exe in Windows Server 2003 clusters.

  3. Restart the resource to pickup the schedule change:

    cluster /cluster:%Cluster% res "%TaskName%" /Off
    cluster /cluster:%Cluster% res "%TaskName%" /On

Notes:

  1. The cluster resource providing scheduled task capability is the 'Volume Shadow Copy Service Task' resource. This is a recommended solution from Microsoft for providing scheduled ask capability on a cluster. See the 'Volume Shadow Copy Service resource type' reference.
  2. The LooksAlive and IsAlive functions for the VSSTask.dll simply check that the scheduled task is known to the local task scheduler. However, to further reduce the impact of resource failure, this resource has been marked as not affecting the cluster, preventing potential failover if this task were to fail more than the default of three times.
  3. This causes the creation of a schedule job using the %TaskName% you have chosen in (by default) the %systemroot%\tasks folder. Any existing local computer task with the same name would be overwritten.
  4. When specifying a command to run, it is safer to run a command from local disk. If something in the command were to cause the cluster resource to fail, you would have to modify the task resource before bringing the disk online. Having the commands on local disk allows easy changes. The drawback of this approach is that you'll have to either manually copy or have a process that copies the commands to each node of your cluster.

References:

Cluster resource
http://technet2.microsoft.com/windowsserver/en/library/f6b35982-b355-4b55-8d7f-33127ded5d371033.mspx?mfr=true

Volume Shadow Copy Service resource type
http://technet2.microsoft.com/windowsserver/en/library/bc7b7f3a-d477-42b8-8f2d-a99748e3db3b1033.mspx?mfr=true

With the Volume Shadow Copy Service Task resource type, you can create jobs in
the Scheduled Task folder that must be run on the node that is currently hosting
a particular resource group. In this way, you can define a scheduled task that
can failover from one cluster node to another. However, in the Microsoft®
Windows Server 2003 family of products, the Volume Shadow Copy Service Task
resource type has limited capabilities for scheduling tasks and serves primarily
to support Shadow Copies of Shared Folders in a server cluster. If you need to
extend the capabilities of this resource type, consider using the Generic Script
resource type

Using Shadow Copies of Shared Folders in a server cluster
http://technet2.microsoft.com/windowsserver/en/library/66a9936d-2234-411f-87b4-9699d5401c8c1033.mspx?mfr=true

Scheduled task does not run after you push the task to another computer
http://support.microsoft.com/kb/317529

Scheduled Task for the Shadow Copies of Shared Folders Feature May Not Run on a Windows Server 2003 Cluster
http://support.microsoft.com/kb/828259

Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows Server Clustering component of Windows Server 2003
http://support.microsoft.com/kb/914458



Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

Saturday, March 29, 2008

Implementing an MSCS 2003 server cluster

When implementing a Windows Server 2003 MSCS server cluster there are several common issues that can easily be avoided by extra planning and configuration. I've compiled a list of pre-configuration, installation and post-installation steps to reduce the risk of encountering issues when installing an MSCS cluster in a SAN environment.

This is mostly a summary of MS documents and general best practice, but I've not seen all of these in one place before so I thought I would post them.

Pre-installation steps for each server:

  • Unplug all HBA's from all cluster nodes.
  • Set the network adapter binding order to external and then internal.
  • Manually set speed, duplex, IP for all NICs, no gateway/DNS/WINS for private network
  • Verify connectivity between each node on public and private adapters
  • Turn off any APM/ACPI power saving features relating to disk drives
  • Create the cluster service account in the domain
  • Ensure the cluster service account is an administrator of the physical cluster nodes. (especially if Kerberos authentication is enabled for virtual servers, but general best practice)
  • Ensure the windows firewall is disabled on both adapters
  • Ensure security auditing is enabled on each node
  • Verify correct drivers are installed on each node (HBA, NIC, chassis backplane etc.) and no device manager errors exist.
  • Shutdown all nodes. Patch HBA’s on the first node, turn on the first node and check storage is visible. Repeat the step for each node, ensuring that only one node has visibility of the storage at any one time. Verify all nodes see the same target paths/disk in the same order.
  • Ensure backup agent is installed and functioning on all servers.
  • Ensure anti-virus agent is installed and functioning on all servers.
  • Configure permissions and role-based security on the servers as required
  • Install Access-Based enumeration on each server (if required)

Cluster first node installation:

  • Shut down all but the first node, so that only the first server has visibility of the storage
  • Re-verify all the SAN disk is visible to the OS
  • Partition and format disks using MBR before adding the first node to the cluster, disable compression. Q: for quorum is a defacto standard, other disks starting after leaving a few letters early in the alphabet for any removable devices/KVM virtual devices etc if they are auto-mounted
  • Use Cluster Administrator to install the cluster, use typical (full) installation when creating a new cluster (there should be no reason not to if the disk is presented the same to each server). Do not use ‘Manage Your Server’ to configure cluster nodes
  • This is where you'll need the cluster name. Use a naming convention that makes sense, linking the physical nodes in the cluster to the virtual cluster name(s)
  • Ensure that the all disks managed by the cluster have associated disk resources before adding the second and subsequent nodes, this ensures disk locking
  • Verify the cluster is functioning, cluster service is running, no event errors, all resources available and functioning etc.

Second and subsequent nodes:

  • Plug in HBA to all other nodes, turn on second node
  • Re-verify all the SAN disk is visible to the OS on the second node
  • Add second node to cluster using Cluster Administrator (the first node will lock the disk)
  • Verify the cluster is functioning (cluster service is running, no event errors, all resources available and functioning etc).
  • Add subsequent nodes using Cluster Administrator
  • Verify the cluster is functioning (cluster service is running, no event errors, all resources available and functioning etc).

Post-installation configuration:

  • Set the role of the private network to be only for internal communication (with mixed for failover according to the design)
  • Set the role of the public network to public network
  • Place the private network at the top of the priority list for internal node-to-node communication
  • Do not use the default cluster group for any resources
  • Do not use the quorum disk for anything else in the cluster
  • Do not install scripts used by generic script resources on cluster disk (easier to recover if they're on local disk)
  • Enable kerberos authentication for network name resources, after taking the network name resources offline). Enabling Kerberos will ensure a computer account is created for the virtual servers and adds Service Principal Names for Kerberos lookup and authentication
  • For the first node, set the startup and recovery settings to start within 5 seconds. For the other nodes, set to 30 seconds, to reduce the risk if all cluster nodes are starting at the same time that there would be quorum conflict/contention.
  • Create and test all resources, resource groups and virtual servers, dependencies, failover/failback policies, including file shares/ABE and print spooler
  • Configure backups appropriate for all cluster nodes
  • Configure performance and service monitoring
  • Configure quotas and file screening using FSRM if required

Other general thoughts:

  • Access Based Enumeration is useful in some file structures, but does not fully equate to functionality provided in Netware. The easiest way I can describe ABE is that it hides what you do not have access to, rather than ensuring you can see what you do have permissions for. For example, in the tree A\B\C, if you have permissions to A and C, but not permissions to B, you will not see C. This is because ABE has hidden what you don’t have access to (B), a by-product of which is that (C) won’t be visible in a default explorer navigation.
  • Having a single virtual print spooler still has a single point of failure – the spoolsv.exe process running on the host system. If that dies because of a configuration error, that error will most likely fail over to any other nodes that can host that resource group. Regardless of ensuring you don’t use kernel-mode (version 2), and only use user-mode (version 3) drivers, any number of issues can occur somewhere in the print process, whether it’s a third-party print processor causing issues, a non-standard port type, or just a poorly written unidrv support DLL. Everything is a lot more transparent with 2003 clustering – drivers, processors and ports all follow the virtual spooler, which most of the time is good, except when you have a problem.

Testing, reproduced from the standard Microsoft confclus.doc document:

Test: Start Cluster Administrator, right-click a resource, and then click “Initiate Failure”. The resource should go into an failed state, and then it will be restarted and brought back into an online state on that node.
Expected Result: Resources should come back online on the same node

Test: Conduct the above “Initiate Failure” test three more times on that same resource. On the fourth failure, the resources should all failover to another node in the cluster.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node. Start Computer Management, and then click Services under Services and Applications. Stop the Cluster service. Start Cluster Administrator on another node and verify that all resources failover and come online on another node correctly.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node. On that node, click Start, and then click Shutdown. This will turn off that node. Start Cluster Administrator on another node, and then verify that all resources failover and come online on another node correctly.
Expected Result: Resources should failover to another node in the cluster

Test: Move all resources to one node, and then press the power button on the front of that server to turn it off. If you have an ACPI compliant server, the server will perform an “Emergency Shutdown” and turn off the server. Start Cluster Administrator on another node and verify that all resources failover and come online on another node correctly. For additional information about an Emergency Shutdown, see the following articles in the Microsoft

Knowledge Base:
325343 HOW TO: Perform an Emergency Shutdown in Windows Server 2003
297150 Power Button on ACPI Computer May Force an Emergency Shutdown
Expected Result: Resources should failover to another node in the cluster
Warning: Performing the Emergency Shutdown test may cause data corruption and data loss. Do not conduct this test on a production server

Test: Move all resources to one node, and then pull the power cables from that server to simulate a hard failure. Start Cluster Administrator on another node, and then verify that all resources failover and come online on another node correctly
Expected Result: Resources should failover to another node in the cluster
Warning: Performing the hard failure test may cause data corruption and data loss. This is an extreme test. Make sure you have a backup of all critical data, and then conduct the test at your own risk. Do not conduct this test on a production server

Test: Move all resources to one node, and then remove the public network cable from that node. The IP Address resources should fail, and the groups will all failover to another node in the cluster. For additional information, see the following articles in the Microsoft Knowledge Base:
286342 Network Failure Detection and Recovery in Windows Server 2003 Clusters
Expected Result: Resources should failover to another node in the cluster

Test: Remove the network cable for the Private heartbeat network. The heartbeat traffic will failover to the public network, and no failover should occur. If failover occurs, please see the “Configuring the Private Network Adaptor” section in earlier in this document
Expected Result: There should be no failures or resource failovers


References:


Guide to Creating and Configuring a Server Cluster Under Windows Server 2003

http://www.microsoft.com/downloads/details.aspx?familyid=96F76ED7-9634-4300-9159-89638F4B4EF7&displaylang=en

Best practices for installing and upgrading cluster nodes

http://technet2.microsoft.com/windowsserver/en/library/87f23f24-474b-4dea-bfb5-cfecb3dc5f1d1033.mspx?mfr=true

Best practices for configuring and operating server clusters

http://technet2.microsoft.com/windowsserver/en/library/2798643f-427a-4d26-b510-d7a4a4d3a95c1033.mspx?mfr=true

Before Installing Failover Clustering

http://msdn2.microsoft.com/en-us/library/ms189910.aspx

Cluster Configuration Best Practices for Windows Server 2003

http://www.microsoft.com/downloads/details.aspx?FamilyID=98BC4061-31A1-42FB-9730-4FAB59CF3BF5&displaylang=en

Server Cluster Best Practices

http://technet2.microsoft.com/windowsserver/en/library/8c91dba9-edfc-48b5-8d98-48d6536e0db81033.mspx?mfr=true

Cluster architecture

http://download.microsoft.com/download/0/a/4/0a4db63c-0488-46e3-8add-28a3c0648855/ServerClustersArchitecture.doc

Creating and Configuring a Highly Available Print Server

http://download.microsoft.com/download/2/a/9/2a9c5a6b-472a-40b0-942f-3ba50240ccd9/ConfiguringAHighlyAvailablePrintServer.doc

Disk quotas and clusters

http://technet2.microsoft.com/windowsserver/en/library/1ee8754e-48d6-4472-9b53-29e8d1de09f81033.mspx?mfr=true



Wayne's World of IT (WWoIT), Copyright 2008 Wayne Martin.


Read more!

All Posts

printQueue AD objects for 2003 ClusterVirtualCenter Physical to VirtualVirtual 2003 MSCS Cluster in ESX VI3
Finding duplicate DNS recordsCommand-line automation – Echo and macrosCommand-line automation – set
Command-line automation - errorlevels and ifCommand-line automation - find and findstrBuilding blocks of command-line automation - FOR
Useful PowerShell command-line operationsMSCS 2003 Cluster Virtual Server ComponentsServer-side process for simple file access
OpsMgr 2007 performance script - VMware datastores...Enumerating URLs in Internet ExplorerNTLM Trusts between 2003 and NT4
2003 Servers with Hibernation enabledReading Shortcuts with PowerShell and VBSModifying DLL Resources
Automatically mapping printersSimple string encryption with PowerShellUseful NTFS and security command-line operations
Useful Windows Printer command-line operationsUseful Windows MSCS Cluster command-line operation...Useful VMware ESX and VC command-line operations
Useful general command-line operationsUseful DNS, DHCP and WINS command-line operationsUseful Active Directory command-line operations
Useful command-linesCreating secedit templates with PowerShellFixing Permissions with NTFS intra-volume moves
Converting filetime with vbs and PowerShellDifference between bat and cmdReplica Domain for Authentication
Troubleshooting Windows PrintingRenaming a user account in ADOpsMgr 2007 Reports - Sorting, Filtering, Charting...
WMIC XSL CSV output formattingEnumerating File Server ResourcesWMIC Custom Alias and Format
AD site discoveryPassing Parameters between OpsMgr and SSRSAnalyzing Windows Kernel Dumps
Process list with command-line argumentsOpsMgr 2007 Customized Reporting - SQL QueriesPreventing accidental NTFS data moves
FSRM and NTFS Quotas in 2003 R2PowerShell Deleting NTFS Alternate Data StreamsNTFS links - reparse, symbolic, hard, junction
IE Warnings when files are executedPowerShell Low-level keyboard hookCross-forest authentication and GP processing
Deleting Invalid SMS 2003 Distribution PointsCross-forest authentication and site synchronizati...Determining AD attribute replication
AD Security vs Distribution GroupsTroubleshooting cross-forest trust secure channels...RIS cross-domain access
Large SMS Web Reports return Error 500Troubleshooting SMS 2003 MP and SLPRemotely determine physical memory
VMware SDK with PowershellSpinning Excel Pie ChartPoke-Info PowerShell script
Reading web content with PowerShellAutomated Cluster File Security and PurgingManaging printers at the command-line
File System Filters and minifiltersOpsMgr 2007 SSRS Reports using SQL 2005 XMLAccess Based Enumeration in 2003 and MSCS
Find VM snapshots in ESX/VCComparing MSCS/VMware/DFS File & PrintModifying Exchange mailbox permissions
Nested 'for /f' catch-allPowerShell FindFirstFileW bypassing MAX_PATHRunning PowerSell Scripts from ASP.Net
Binary <-> Hex String files with PowershellOpsMgr 2007 Current Performance InstancesImpersonating a user without passwords
Running a process in the secure winlogon desktopShadow an XP Terminal Services sessionFind where a user is logged on from
Active Directory _msdcs DNS zonesUnlocking XP/2003 without passwords2003 Cluster-enabled scheduled tasks
Purging aged files from the filesystemFinding customised ADM templates in ADDomain local security groups for cross-forest secu...
Account Management eventlog auditingVMware cluster/Virtual Center StatisticsRunning scheduled tasks as a non-administrator
Audit Windows 2003 print server usageActive Directory DiagnosticsViewing NTFS information with nfi and diskedit
Performance Tuning for 2003 File ServersChecking ESX/VC VMs for snapshotsShowing non-persistent devices in device manager
Implementing an MSCS 2003 server clusterFinding users on a subnetWMI filter for subnet filtered Group Policy
Testing DNS records for scavengingRefreshing Computer Account AD Group MembershipTesting Network Ports from Windows
Using Recovery Console with RISPAE Boot.ini Switch for DEP or 4GB+ memoryUsing 32-bit COM objects on x64 platforms
Active Directory Organizational Unit (OU) DesignTroubleshooting computer accounts in an Active Dir...260+ character MAX_PATH limitations in filenames
Create or modify a security template for NTFS perm...Find where a user is connecting from through WMISDDL syntax in secedit security templates

About Me

I’ve worked in IT for over 20 years, and I know just about enough to realise that I don’t know very much.