Friday, November 30, 2007

Undetected Deadlocks in SQL Server

Every once in awhile, things that are hard to explain happen when running resource-intensive queries against SQL Server. Yesterday (and by extension, this morning) I had one of those happenstances.

There's a phenomenon in the SQL Server engine known as an "Undetected Deadlock". They have been around for awhile, and there have been numerous patches and hotfixes related to the problem.

Basically, these undetected deadlocks can happen in many places, but the most common is related to parallelism. For example, a query gets parallelized, then CXPacket waits occur, and then somehow the SQLOS CPU Scheduler gets "out of whack" and one or more CPUs forget that they're participating in a parallel query and try to move on to other things. In rare cases, this leads to a deadlock because not all CPUs are in sync with one another. At this point, all activity on the server comes to an end because of a "Suspended" process. Checking the SQL Server message log (I will still call it an error log!) results in the following:

 

SQLServerError

The only way out of this mess is to kill the process that caused it in the first place.. If you look closely at this error log, you'll see that the problem actually started at 8pm and wasn't resolved until 6am when I killed the process.

No comments: