Friday, March 8, 2013

High number of LATCH_EX waits

During an implementation of a new SQL environment for a customer I went into some problem with high nr of waits of LATCH_EX type when they run some heavy batches. I was not sure what this really was. I quite familiar with PAGELATCH and so on. Those are much more common, at least in server I have seen.
The customer asked me to investigate what the problem was. First I read the whitepaper “Diagnosing and Resolving Latch Contention on SQL Server” from Microsoft.
There were some ideas coming up after that. The server did have many CPU cores, 32 to be exact. The database had only one datafile.
First I tried to add more files to the database. I tried to use 4 but there was no big change. I went down from 32 ms in average to 27ms. After investigate sys.dm_os_latch_stats is was quite obvious what the problem was. ACCESS_METHODS_DATASET_PARENT was huge.



Regarding BOL it means Used to synchronize child dataset access to the parent dataset during parallel operations.

I did some try to set the Max Degree of Parallelism from default 0 to 4 or 8. This change had the desired effect. In the graf below we can see the effect very clearly on the black line. We see the batch starts, then a change to 8 for the Max Degree of Parallelism. Then I change it back, and then to 4 and then to 8.



No comments: