MS DTC failed on Production SQL Cluster – an experience from the field.



Recently I was summoned for a severe production issue where MS DTC failed to come online after a failed attempt of SQL Cluster failover. By the time I logged in, it was already a seriously escalated issue so I was under tremendous pressure to bring everything online immediately.

I found many critical errors logged for the failed MS
DTC resource and a normal attempt to bring MS DTC online failed. For a quickest fix, I decided to recreate the MSDTC resource after deleting the failed one on the cluster. (Please refer to http://blog.consultdba.com/2010/04/sql-server-ms-dtc-installation-and.html#links if you wish to refer to the steps to configure MS DTC on a cluster). Then I made sure MSDTC security settings in Windows Component Service is fine. At last we had to restart cluster to resolve couple of Distributed transaction errors even after new MS DTC was online.

Though it was not an ideal approach, it was a quick and practical resolution with in 5 minutes and everyone was happy and back in business. Of course this issue is later investigated in-depth (which lead to some serious issue which needed attention in this cluster) but moral of this post is when there is a serious production issue and you know a solution to fix it, then first fix it and later find the root cause.

Comments