Category Archives: Performance

Application timing out due to excessive blocking on tempdb PFS

Problem

Applications Connecting to SQL Server timing out

Symptoms

  1. Excessive blocking with the wait resource being tempdb PFS page
  2. High CPU utilization on the server

Troubleshooting

sys.dm_exec_requests reported blocking and the head blocker looked like this

session_id ecid wait_type resource
---------- ----- --------------- --------------------
58 4 PAGELATCH_UP 2:7:32352
58 1 CXPACKET
58 5 CXPACKET
58 6 CXPACKET
58 3 CXPACKET
58 2 CXPACKET
58 0 CXPACKET exchangeEvent id=Pipe7dd014b80

The wait resource was always reported at tempdb PFS page (2:7:(8088*x)).

The head blocker and the blocking session were executing a SELECT against a XML document.

As explained here parsing a XML document would create work table in tempdb. In this situation the application was parsing a lot of XML documents. Hence the contention on tempdb was justified.

  • tempdb had 5 data files and the data files had initial size of 100 MB (with autogrowth of 10%). As per the recommendations here, increased the tempdb files to 8 and increased the initial size to a higher value to reduce the auto growth.
  • Also enabled trace flag 1118. This forces uniform extent allocations instead of mixed page allocations.
  • Enabled trace flag 1117. This will auto grow all the files in the filegroup whenever one of the files tries to auto grow.
  • Since this instance was running on SQL Server 2012, applied the latest Service Pack + CU as recommended in http://support.microsoft.com/kb/2964518

The tempdb contention got reduced to some extent still the CPU utilization was very high(above 85%). Because of this the application was still timing out.

  • Changed the Power Plan to “High Performance” as explained in http://support.microsoft.com/kb/2207548. This didn’t help much.
  • As you notice in the sys.dm_exec_requests output, all threads in the session are waiting on CXPACKET wait type for one thread to complete its work. Hence set the appropriate value for max degree of parallelism (3 in this case) as explained here.

CPU utilization came down from 85% + to 20-30%

Status

Mission accomplished!

Find the SQL Server Instance name using Task Manager

Yesterday one of the servers which hosts 5 instances of SQL Server 2005, the CPU constantly hitting 100% utilization. The Application team using the database on one of the 5 instances started complaining of poor performance. The DBA connected to that instance to find out the session(s) causing the CPU to spike. But none of the processes on that instance were utilizing more CPU time. The Task Manager looked like this.

The PID column in Task Manager is not displayed by default. It needs to be selected via Task Manager –> View –> Select Columns menu item.

In my case Process ID 2652 was using more CPU. How to find out to which instance does this Process belong to? I find the following two options to track down the Instance Name.

Using SQL Server Configuration Manager:

On clicking on the SQL Server Services in SQL Server Configuration Manager, the details of SQL Server and related services along with the Process ID of each service is listed. The SQL Server Instance name is listed as SQL Server (InstanceName).

Using the SQL Server Error Log

In SQL Server 2000, SQL Server Configuration Manager is not available. The information regarding the Process ID can be obtained through the SQL Server Error Log.

Once the SQL Server Instance was located yesterday, the high CPU utilization was fixed relatively quickly. One of the scheduled job for UPDATE STATISTICS had run outside the Maintenance Window. Stopping this job brought down the CPU utilization drastically.

UMS Context appears to be non-yielding on Scheduler

Yesterday one of our critical production instances froze for a few seconds. This runs on SQL Server 2000 (SP4+). Even a little downtime would cause lot of turbulence on this instance. The error log did not have any glaring errors except for this one.

DESCRIPTION:	Error: 17883, Severity: 1, State: 0
Process 282:0 (1bb8) UMS Context 0x0A37C010
appears to be non-yielding on Scheduler 6

Around the same time frame all applications including the SQL Server Agent timed out while trying to connect to the instance. The services did not restart and the instance was operational in a few seconds.

Some information about User Mode Scheduler (UMS) and SQL Server Operating System Scheduler (SOS) in SQL Server 2005 and above.

  • UMS Schedulers are the logical schedulers used by SQL Server against which each worker thread of a SPID are mapped and all the I/O related operations are processed through these schedulers.
  • The number of schedulers are usually equivalent to the number of CPUs reported by the Operating System. This will not be true if the affinity mask value is changed from the default in SQL Server 2000.
  • An SPID is generated only after the scheduler receives a request. After the SPID has finished the work it releases the ownership of the scheduler and the worker threads associated with that SPID are treated as “yielded”.
  • If the owner worker thread of the shceduler does not respond (yield) in 60 seconds, 17883 error (UMS cotext…non-yielding…) is rasied. Only the first occurance of this error will also generate the stack dump.

This article on TechNet has very useful information on 17883 and related errors.

Now coming back to the issue on our server, I had quickly I checked the following.

  • CPU and Memory utilization on the instance were normal
  • There were no blocking sessions found
  • The number of active sessions was ~200 which was less than the average for this instance
  • There were no Backup or Maintenance jobs running at that time
  • No disk related errors in the System Event Log
  • Network team also confirmed that there was no network glitch

Still the schedulers were treated as non-yielding. The output of DBCC SQLPERF(umsstats) also confirmed that the scheduler had exhausted all the worker thread allocation.

Statistic      		Value
------------------------
Scheduler ID   			6.0
num users    			15.0
num runnable 			0.0
num workers  			15.0
idle workers 			15.0
work queued  			0.0
cntxt switches        1.3390778E+7
cntxt switches(idle)  2.4731256E+7

Since all the symptoms mentioned in this article were ruled out, this article seemed to be more relevant to the issue that we were facing. However, there were no ADD DATAFILE or CREATE DATABASE commands were running on the instance. The only probable issue could have been the data files growing automatically.

In order to rule out this possibility, quickly scanned through the Auto Growth settings on all the databases on the instance. One of the database which was 25 GB in size had only one data file and the auto growth for data file was 100 MB. This database had a growth of ~500 MB per day in the last one month. This meant that this data file would grow 5 times a day and since this instance is SQL Server 2000 the database would be locked during the auto growth. This probably could have been one of the contributed factors for the error in the log.

Now the auto growth setting for this database has been changed to 1 GB which means that the data file would auto grow once in two days. With this change, all the possible contributing factors for 17883 error have been ruled out from Database perspective. After this I have not seen that error on the instance yet. Hopefully this will turn out to be a permanent fix!

Database 150% Full | sp_spaceused reports negative unallocated space

A while ago, the monitoring tool reported that one of the databases was 150% full ! Logged on to the server and issued sp_spaceused command against that database. The output was as below.

As highlighted, the unallocated space was showing negative values. This was because the space usage information was not updated. After running DBCC UPDATEUSAGE against that database, the unallocated space value updated correctly. Alternatively the following command can also be executed to fix this issue.

sp_spaceused @updateusage = 'TRUE'

The below command can be used to update the usage of all the databases in one go.

EXEC SP_MSFOREACHDB @COMMAND1 ='use [?] print db_name() DBCC UPDATEUSAGE(0)'

Using Process Explorer to resolve SQL Server CPU Spikes

As a DBA, one usually comes across a situation when the CPU usage on a server spikes because of SQL Server. The knee jerk reaction would be to check sysprocesses or sys.dm_exec_requests to check which process is consuming more CPU time. The value of CPU column in sysprocesses is cumulative in nature, hence may not give the right picture everytime.

The Process Explorer utility from Microsoft also comes in handy in such situations. Here is how it can be done.

Open Process Explorer and navigate to sqlservr.exe

Right click on sqlservr.exe and select Properties. Click on Threads tab and sort by CPU.

Make a note of the TID which is consuming more CPU. In this case it is 11744. Open SSMS and get the SPID. The query can be something like this.

select SPID from sysprocesses where kpid=11744

The details of the SPID can be obtained by issuing

DBCC INPUTBUFFER(SPID)

Based on the output of the Inputbuffer, a call to KILL the SPID can be taken.