Wednesday, November 14, 2012

How to switch cluster nodes in Windows 2003 with SQL 2005 installed

This is not the most modern products anymore. Anyway I went in to a problem where we were forced to switch the old server to new one.
If some of you out there need to do it I describe the steps here. Not in detail but some aspect to think about.

  • Add the new servers to the cluster, so in our case we had now a 4 node cluster. Be sure that the disks also are shred to the new server.
  • Expand the SQL installation to the new servers. This is done from the first cluster node. In add remove program you change the SQL installation. Chose the correct instance and add the new server. One important thing when doing this is to NOT be logged on with the same account that you install with on the new servers. The Remote execution will fail if you do. When the installation is success SQL is now installed on the new servers.
  • Install same SQL servicepack and patches as the old servers. Run also those from the active cluster node.
  • If you need any SQL client tool on the new nodes you have to install this one by on respective server and also the servicepack and patches for those.

If everything went well it’s now some issues to take care of :-)

If you have used a certificate in SQL be sure this is installed on the new servers. If not the engine would not start. You receive an error like: Unable to load user-specified certificate. The server will not accept a connection. You should verify that the certificate is correctly installed. See "Configuring Certificate for Use by SSL" in Books Online.
In our case there was some old ssl certificate added that was not even in use anymore. I tried to remove the string in register HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Certificate but that was not so straight forward as I thought. When failing over the resource it added back the old certificate key anyway. After some testing I recognized that it has to be done on the active nodes in the cluster before you move the resources. In our case the certificate was not used so I just leaved the value blank.

I also had some issues with the transaction cordinator. It was impossible to start it on the new servers.  In the windows log I got some errors like MS DTC has detected that a DC Promotion has happened since the last time the MS DTC service was started and MS DTC Transaction Manager start failed. LogInit returned error 0x8007000e. Our clusternodes was also domain controllers. And the new one had been added as new domain controllers as well (I know it’s not a nice solution). I found a KB on this After change the register permission on for NETWORK SERVICES account it worked well.

Service account cant be verified
I you have a trailingspace in the clustergroups name where SQL reside you get a error says that the account you type for SQL service is wrong or can´t be verified. 
"SQL Server could not validate the service accounts. Either the service accounts have not been provided for all of the services being installed, or the specified username or password is incorrect." See for more info.

Hope this can help someone :-)