IT Hero Blog: Upgrading and Expanding a Storage Spaces Direct Cluster

Bill Webster is an Assistant Director for Administrative Computing at Florida Polytechnic University. Located in Lakeland, Florida, in the heart of Florida’s High-Tech Corridor, Florida Poly is engineered from the ground up to push the boundaries of education in science, technology, engineering, and math (STEM).

Bill Webster
Assistant Director for Administrative Computing – Florida Polytechnic University

We were building out a new disaster recovery site and needed a resilient, cost-effective solution that could be easily expanded as our needs grew. Microsoft’s Storage Spaces Direct solution met all those needs. I had previous experience with DataON’s Cluster-in-a-Box solution (CiB) and was impressed with the level of support that they provided. So in June 2018, I deployed a DataON 2-node cluster in our DR data center, expecting this to get us up and running while also proving whether the technology could be an eventual replacement for our existing SAN-based infrastructure.

In the Fall of 2019, Microsoft announced Windows Server 2019 at Microsoft Ignite. I was excited to get some of the new functionality and features that were coming with Server 2019. I started conversations with DataON about upgrading the DR cluster to Windows Server 2019 and also expanding to a 3-node cluster for improved resilience and capacity. Based on their feedback and Microsoft’s recommended best practice, I broke this process into two main phases:

  1. Upgrade the cluster from Windows Server 2016 to Windows Server 2019
  2. Expand the cluster with a third node and additional storage

Upgrading a Storage Spaces Direct Cluster to Windows Server 2019

There are multiple ways to upgrade your Storage Spaces Direct cluster to Windows Server 2019. Looking at our environment, I decided on the “in-place upgrade while VMs are running” option. I didn’t want to incur any downtime and was okay with the extra time for the mirror repair jobs.

In general, this process is similar to applying routine Windows Server updates to a cluster. However, while a node is placed in maintenance mode, you run the Windows Server 2019 setup and perform an in-place upgrade. All the system settings are retained, and the node comes back up as a member of the cluster. Once complete with the first node, you drain roles and repeat on the second node. When all cluster nodes have been updated, you then need to update the cluster functional level.

Microsoft has fully documented this process in the article, Upgrade a Storage Spaces Direct cluster to Windows Server 2019.

Tips

  • Read through the entire documentation, pick the best option for your needs, and follow it closely.
  • Always be sure to apply updates, starting with the latest Windows Server 2016 updates prior to beginning and the latest Windows Server 2019 updates prior to resuming roles on an upgraded node.
  • Make sure storage repair jobs are finished and all volumes are healthy before proceeding through steps.
  • Have a plan in case things go unexpectedly.
  • Contact DataON for assistance with firmware/driver updates and any other questions.

Expanding a Storage Spaces Direct Cluster

Expanding a Storage Spaces Direct cluster is also documented very well. I would recommend reading through Microsoft’s documentation, Adding servers or drives to Storage Spaces Direct.

It’s important to reiterate that if you plan to both upgrade and expand a Windows Server 2016 cluster, the best practice is to first upgrade, and then expand the cluster. There are improvements in Windows Server 2019 which will make cluster expansion a smoother process.

The process is relatively simple. Once you’ve deployed the new server, joined the domain, and configured networking, you’ll need to use the Test-Cluster command to check for issues that would arise from adding it to the existing cluster. Remediate any issues that show up in the report and then use Add-ClusterNode to add the new server to the cluster.

One extra step when going from a 2-node to 3-node cluster is that you will likely want to migrate your volumes from a 2-way to 3-way mirror for improved resilience. I had left extra space on the 2-node cluster so that I could create a new volume as a 3-way mirror, across all 3 nodes. I then performed a live storage migration to move virtual machines from one of the existing 2-way mirror volumes to the new 3-way mirror volume. After repeating the process, I had migrated all of the 2-way mirrors to 3-way mirrors without any downtime.

Tips

  • Read the documentation from Microsoft.
  • Apply the latest Windows Server updates.
  • Use this command to check the status of storage migration jobs:

Get-WmiObject -Namespace root\virtualization\v2 -Class Msvm_MigrationJob | ft Name, JobStatus, PercentComplete, VirtualSystemName

Extras

These are unexpected things that I ran into. They could have been specific to our environment or issues that have since been fixed in Windows Server updates. Either way, it’s worth being aware of these.

 

Cluster Performance History

Performance history is new with Windows Server 2019 and can be turned on after upgrading with Run Start-ClusterPerformanceHistory

 

Error in Server Manager

I received an error in Server Manager that stated ‘The xsi:type attribute (p1:MSCluster_Property_Node_PrivateProperties) does not identify an existing class.’

When I ran Get-ClusterNode | Get-ClusterParameter, nothing was returned for my upgraded nodes but I saw 2 parameters on my newly added node. I ran these commands to add the missing parameters (repeating for each node as needed):

REG ADD HKEY_LOCAL_MACHINE\Cluster\Nodes\1\Parameters /f /v  “S2DCacheBehavior” /t
REG_QWORD /d “9223372036854775808”
REG ADD HKEY_LOCAL_MACHINE\Cluster\Nodes\1\Parameters /f /v “S2DCacheDesiredState” /t
REG_DWORD /d “2147483648”

Once complete, Get-ClusterNode | Get-ClusterParameter returned the following for my 3-node cluster:

PS C:\Windows\system32> get-clusternode | Get-ClusterParameter

Object             Name                                      Value                                        Type
——                —-                                           —–                                          —-
Server01         S2DCacheBehavior             9223372036854775808       UInt64

Server01         S2DCacheDesiredState      2147483648                            UInt32

Server02        S2DCacheBehavior              9223372036854775808      UInt64

Server02        S2DCacheDesiredState       2147483648                           UInt32

Server03        S2DCacheDesiredState       2147483648                           UInt32

Server03        S2DCacheBehavior              9223372036854775808      UInt64