Thursday, April 5, 2007

CX Packet Waits and SQL Server Performance

There have been many times throughout my career when people have told me that an in-depth understanding of the internals and architecture of SQL Server just isn't necessary anymore, that the software has become so smart that there is no real need to do anything other than let the system choose the best way to handle things…



The Demise of Architectural Knowledge



Every time someone says that to me, I cite several examples of how an understanding of the guts of SQL Server has helped me in one situation or another. Recently, the topic came up again (I was asked to write an internals chapter in an upcoming book on SQL Server, and spent a lot of time on architecture) and somehow got steered into a discussion on NUMA architecture and SQL Server 2005. At the time I wrote the chapter, I lamented that I didn't have any direct, relevant real-world examples to use related to SQL Server 2005. (Most of our customers are still running SQL 2000) Sometimes you just shouldn't wish for things, because they have a way of coming true!



Customers to the Rescue



Well, I have been onsite this week with a very large customer of ours, doing some beta-testing of a new product that Configuresoft is launching soon, and that I have architected, designed and led the creation of. The customer was nice enough to provide us with a 16-core machine (8 dual core Xeon processors and 16GB of RAM!) as the data source, and a 4 way (4 Xeon processors) box as the platform for Analysis Services and SSIS. Part of the product that we are testing is an SSIS package that extracts data (24 million rows of source data in this particular case), does some lookup information, and loads it into a Data Warehouse. One of the queries is derived through an expression that looks like this:



" SELECT
cl.property_name AS
PropertyName
,cl.path_name
AS PathName
,cl.agent_change_type_id AS ChangeTypeID
,cl.datetime_collected_to AS DateCollected
,me.machine_id
AS MachineID
,cl.data_class_id AS
DataClassID
,cl.changed_by_user_login AS ChangedBy
,me.uptime
AS MachineUpTime
,COUNT(*) AS ChangeCount
,SUM ( af.Alert) AS AlertCount
FROM ecm_dat_master_change_history_log AS cl
INNER JOIN ecm_dat_machine_environment_flat AS me
ON cl.machine_id = me.machine_id
LEFT OUTER JOIN( SELECT 0 AS Alert
,ecm_dat_master_change_history_log.row_id
FROM ecm_dat_master_change_history_log
LEFT OUTER JOIN ecm_dat_alerts_found
ON ecm_dat_master_change_history_log.row_id = ecm_dat_alerts_found.row_id
WHERE ecm_dat_alerts_found.row_id IS NULL
UNION
SELECT CASE
WHEN row_id IS NULL THEN 0 ELSE 1 END AS Alert
,row_id
FROM ecm_dat_alerts_found ) af
ON af.row_id = cl.row_id
WHERE datetime_collected_to = '" + (DT_WSTR, 28) @[User::CurrentDateEnumerator] + "'
GROUP BY cl.property_name
,cl.path_name
,cl.agent_change_type_id
,cl.datetime_collected_to
,me.machine_id
,cl.data_class_id
,cl.changed_by_user_login
,me.uptime "

When executed, the query is heavily parallelized by the source system (the 16 core machine) and results in a high number of CXPacket waits. (See the picture below for an example)






In looking at system performance, it was pretty obvious that the CXPacket waits were causing performance degradation. This was further proven by examining the overall load performance on the destination system through task manager.

No comments: