Tag Archives: performance

Application timing out due to excessive blocking on tempdb PFS

Problem

Applications Connecting to SQL Server timing out

Symptoms

  1. Excessive blocking with the wait resource being tempdb PFS page
  2. High CPU utilization on the server

Troubleshooting

sys.dm_exec_requests reported blocking and the head blocker looked like this

session_id ecid wait_type resource
---------- ----- --------------- --------------------
58 4 PAGELATCH_UP 2:7:32352
58 1 CXPACKET
58 5 CXPACKET
58 6 CXPACKET
58 3 CXPACKET
58 2 CXPACKET
58 0 CXPACKET exchangeEvent id=Pipe7dd014b80

The wait resource was always reported at tempdb PFS page (2:7:(8088*x)).

The head blocker and the blocking session were executing a SELECT against a XML document.

As explained here parsing a XML document would create work table in tempdb. In this situation the application was parsing a lot of XML documents. Hence the contention on tempdb was justified.

  • tempdb had 5 data files and the data files had initial size of 100 MB (with autogrowth of 10%). As per the recommendations here, increased the tempdb files to 8 and increased the initial size to a higher value to reduce the auto growth.
  • Also enabled trace flag 1118. This forces uniform extent allocations instead of mixed page allocations.
  • Enabled trace flag 1117. This will auto grow all the files in the filegroup whenever one of the files tries to auto grow.
  • Since this instance was running on SQL Server 2012, applied the latest Service Pack + CU as recommended in http://support.microsoft.com/kb/2964518

The tempdb contention got reduced to some extent still the CPU utilization was very high(above 85%). Because of this the application was still timing out.

  • Changed the Power Plan to “High Performance” as explained in http://support.microsoft.com/kb/2207548. This didn’t help much.
  • As you notice in the sys.dm_exec_requests output, all threads in the session are waiting on CXPACKET wait type for one thread to complete its work. Hence set the appropriate value for max degree of parallelism (3 in this case) as explained here.

CPU utilization came down from 85% + to 20-30%

Status

Mission accomplished!

Find the SQL Server Instance name using Task Manager

Yesterday one of the servers which hosts 5 instances of SQL Server 2005, the CPU constantly hitting 100% utilization. The Application team using the database on one of the 5 instances started complaining of poor performance. The DBA connected to that instance to find out the session(s) causing the CPU to spike. But none of the processes on that instance were utilizing more CPU time. The Task Manager looked like this.

The PID column in Task Manager is not displayed by default. It needs to be selected via Task Manager –> View –> Select Columns menu item.

In my case Process ID 2652 was using more CPU. How to find out to which instance does this Process belong to? I find the following two options to track down the Instance Name.

Using SQL Server Configuration Manager:

On clicking on the SQL Server Services in SQL Server Configuration Manager, the details of SQL Server and related services along with the Process ID of each service is listed. The SQL Server Instance name is listed as SQL Server (InstanceName).

Using the SQL Server Error Log

In SQL Server 2000, SQL Server Configuration Manager is not available. The information regarding the Process ID can be obtained through the SQL Server Error Log.

Once the SQL Server Instance was located yesterday, the high CPU utilization was fixed relatively quickly. One of the scheduled job for UPDATE STATISTICS had run outside the Maintenance Window. Stopping this job brought down the CPU utilization drastically.

sp_sqlagent_get_perf_counters consuming more CPU

While troubleshooting High CPU conditions on a SQL Server instance, many of us would have stumbled upon a session which was executing sp_sqlagent_get_perf_counters stored procedure. As the stored procedure name suggests this is executed approximately every 20 seconds to get the Performance Counters’ values defined in SQL Server. Based on the values of the counters, the defined Alerts will be triggered.

But at times even though no Alerts are defined on the instance, this particular stored procedure hogs CPU. Why? Because when SQL Server is installed, Demo Alerts are created by default.

These alerts are which demonstrate how SQL Server Alerts work and are enabled by default.

Since SQL Server polls the Performance Counters at regular intervals, these may cause High CPU conditions. What is the fix? Just delete those Demo Alerts. Why delete them when there is an option to Disable these alerts? As per this feedback on Connect, even when the alerts are disabled, SQL Server continues to poll them!

The next time you see this stored procedure consuming more CPU, check if there are any unwanted alerts configured on the instance and delete them.

Windows does not recognize the Paging File

For sometime now we were receiving alerts for high CPU usage on one of the servers running SQL Server 2005 instance. The CPU utilization would continuously be spiking and sqlservr.exe was the one using the maximum CPU. Even though the number of user connecting to this instance is relatively high but not a particular query or a database was contributing to it. SQL Server instance as a whole was spiking the CPU. Here is the snapshot of the Task Manager.

Not a pretty sight for a DBA. CPU at 99% and Memory usage at 90%. Closer look at the memory value told me that something was wrong with the Memory Setting. Here is how I arrived at that conclusion.

The Physical Memory (K) section in Task Manager gives information about the Physical Memory (RAM) installed on the server.

The Total is 46067232 KB (43.9 GB).

The Commit Charge (K) section looked like this.

The Total value in this section is the Total Memory in use currently (Physical Memory + Virtual Memory). In this case it was 42413400 KB = 40.4 GB. The Limit value is the Total Memory available to the server ( Physical Memory + Virtual Memory). It had a value of 47467864 KB = 45.3 GB.

This was the key value which made me suspect that something was wrong with the Virtual Memory setting. I remembered that sometime back the DBAs had informed the System Administrators team that the Virtual Memory (Paging File) on this server was not correctly set. Usually the Virtual Memory is set at 1.5 times the Physical Memory installed on the server. Based on that request the System Administrators had added a new page file on a dedicated drive.

To verify that I navigated to Computer Properties –> Advanced –> Performance –> Settings to check the Page File settings.

On the top of the above screen shot we can see that 2 page files have been configured on the system. One of C drive (2 GB) and the other one on F drive (58 GB). The catch is at the bottom of the screen. The Currently allocated value shows only 2 GB, which means it does not detect the 58 GB page file located on the F drive. Gotcha! This is the reason why the Task Manager is showing a Commit Charge Limit of only 45.3 GB instead of the configured 104 GB.

Inadequate Virtual Memory can also be a major contributor to the poor performance of a server. On this server it definitely seems so. Team had a tough time in making the System Admins understand that the server is not recognizing the configured Virtual memory. Yesterday night they understood what we were trying to emphasize (after a lot of e-mails and phone calls) and they will be reconfiguring the Page File on the F drive.

I am curious to see the performance of the server after this issue is fixed.


UMS Context appears to be non-yielding on Scheduler

Yesterday one of our critical production instances froze for a few seconds. This runs on SQL Server 2000 (SP4+). Even a little downtime would cause lot of turbulence on this instance. The error log did not have any glaring errors except for this one.

DESCRIPTION:	Error: 17883, Severity: 1, State: 0
Process 282:0 (1bb8) UMS Context 0x0A37C010
appears to be non-yielding on Scheduler 6

Around the same time frame all applications including the SQL Server Agent timed out while trying to connect to the instance. The services did not restart and the instance was operational in a few seconds.

Some information about User Mode Scheduler (UMS) and SQL Server Operating System Scheduler (SOS) in SQL Server 2005 and above.

  • UMS Schedulers are the logical schedulers used by SQL Server against which each worker thread of a SPID are mapped and all the I/O related operations are processed through these schedulers.
  • The number of schedulers are usually equivalent to the number of CPUs reported by the Operating System. This will not be true if the affinity mask value is changed from the default in SQL Server 2000.
  • An SPID is generated only after the scheduler receives a request. After the SPID has finished the work it releases the ownership of the scheduler and the worker threads associated with that SPID are treated as “yielded”.
  • If the owner worker thread of the shceduler does not respond (yield) in 60 seconds, 17883 error (UMS cotext…non-yielding…) is rasied. Only the first occurance of this error will also generate the stack dump.

This article on TechNet has very useful information on 17883 and related errors.

Now coming back to the issue on our server, I had quickly I checked the following.

  • CPU and Memory utilization on the instance were normal
  • There were no blocking sessions found
  • The number of active sessions was ~200 which was less than the average for this instance
  • There were no Backup or Maintenance jobs running at that time
  • No disk related errors in the System Event Log
  • Network team also confirmed that there was no network glitch

Still the schedulers were treated as non-yielding. The output of DBCC SQLPERF(umsstats) also confirmed that the scheduler had exhausted all the worker thread allocation.

Statistic      		Value
------------------------
Scheduler ID   			6.0
num users    			15.0
num runnable 			0.0
num workers  			15.0
idle workers 			15.0
work queued  			0.0
cntxt switches        1.3390778E+7
cntxt switches(idle)  2.4731256E+7

Since all the symptoms mentioned in this article were ruled out, this article seemed to be more relevant to the issue that we were facing. However, there were no ADD DATAFILE or CREATE DATABASE commands were running on the instance. The only probable issue could have been the data files growing automatically.

In order to rule out this possibility, quickly scanned through the Auto Growth settings on all the databases on the instance. One of the database which was 25 GB in size had only one data file and the auto growth for data file was 100 MB. This database had a growth of ~500 MB per day in the last one month. This meant that this data file would grow 5 times a day and since this instance is SQL Server 2000 the database would be locked during the auto growth. This probably could have been one of the contributed factors for the error in the log.

Now the auto growth setting for this database has been changed to 1 GB which means that the data file would auto grow once in two days. With this change, all the possible contributing factors for 17883 error have been ruled out from Database perspective. After this I have not seen that error on the instance yet. Hopefully this will turn out to be a permanent fix!

Where did the SQL Server Profiler Template disappear?

Yesterday evening the DBA Team wanted to run a trace against a Production as requested by the Application Team. Since this server is in a strictly controlled environment, a Change Control had to be prepared. The Implementer of the Change Control would not be the one documenting it. Hence the documentation has to be clear and should meet the requester’s expectations. So one of the DBA started off with the documentation process for the Change Control. Half way through it, the process seemed to cumbersome. Why? He had to write “Click Event Selection, Click TSQL, Select Application Name” etc.

Since the implementer of the Change Control will be different, it was decided that the Implementer will have to execute the trace based on a custom template. The new template was created as below.

Opened SQL Server Profiler, chose Templates –> New Template.

Since the server that we had to run the trace against was a SQL Server 2005 instance, the Server Type was selected as SQL Server 2005 in the Trace Template properties.

Then selected the requisite events

After all this selection, clicked on Save to save the trace template. The first step was to open a new trace to see if the template is listed there or not.

Cool. The new template was getting listed. However, to confirm the DBA asked his buddy to logon to the server and see if the New Template is getting listed for him as well. No, he could not see it. The DBA who did the documentation checked the screen rubbing his eyes (already late in the evening) and he could see it.

Now the question was where did the trace template disappear for the other user? It was clear that the trace template was saved locally on the user profile of the one who created it. By default, SQL Server Profiler saves the Trace Templates under C:\Documents and Settings\Administrator\Application Data\Microsoft\SQL Profiler\[VersionNumber]\Templates\Microsoft SQL Server\[Version# SelectedWhileCreatingTemplate] directory. In this case it was saved in

Now the location of Trace Template has been found, all he had to do was to move it to a common location, so that anyone can double click on it and Import. In order to have this template available to all users without having to manually import it, a copy of the Trace Template needs to be copied to the below location.

[SQL Server installation directory]\[VersionNumber]\Tools\Profiler\Templates\Microsoft SQL Server\[VersionNumber]\

In our case, this directory was

Despite the last minute rush, the documentation was prepared and the Change Control was ready to be implemented later in the day

Process ID owns resources that are blocking processes on Scheduler

A few days back one of the regular readers of my blog $sammy (not sure how he wants to be called, I will call Dollar Sammy ;-)) saw continuous  error messages on one of his SQL Server 2000 instances which read

Process ID 61:924 owns resources that are blocking processes on Scheduler 2

This messages is one of the newly introduced features in SQL Server 2000 SP3. This indicates problem with the Scheduler. This article on MS site explains it better.

To find out the root cause, I checked the following.

  • Whether SPID 61 was causing any blockings on the other transactions. No, it was not
  • There were no parallel queries being executed at the time of this error message.

Then went on to check the locks held by this process. This process was holding a whopping 312 locks on different objects on that database! The pattern that I observed was that 25 worker threads were waiting for one worker thread to release the lock. There were 12 such locks held on different objects (25 + 1) * 12. The locks were acquired as soon as this query was fired.

Obviously these worker threads had put pressure on the Logical Schedulers. Once the query got executed successfully, this error message ceased to appear in the error log. I am yet to look into the query which in question, for sure it requires some tuning.

$sammy, I hope this answers your question !