Today morning a differential backup job failed on one of the instances running SQL Server 2005. The error message with which it failed is as below.
Msg 845, Level 17, State 1, Server [ServerName], Line 2 Time-out occurred while waiting for buffer latch type 4 for page (9:8677305), database ID 7. Msg 3013, Level 16, State 1, Server [ServerName], Line 2 BACKUP DATABASE is terminating abnormally.
The first thing I did was to check which processes are currently active on that database.
SELECT * FROM sys.dm_exec_requests WHERE database_id = 7
The query gave this output.
Look at what session id 109 is doing. DBCC SHRINKFILE! There was a request from the Application team sometime back, that we keep shrinking the primary file group until there is no free space left. We had informed them of the performance issue that this process will have on the instance. But they wanted to reclaim the space on that filegroup, since they had moved the bigger tables to a new filegroup. After all, every MB of SAN storage is $. We had setup a job which runs DBCC SHRINKFILE on each of the data files in the Primary File group.
How is SHRINKFILE affecting the backup? As per this KB article when the instance is under heavy stress load, users will experience Error 844 and Error 845. It also mentions that Frequent database AutoGrow or AutoShrink operations can also be one of the causes. In our case AutoGrow property was properly set. But the “shrink job” was causing more damage even though the AutoShrink option was not enabled. In order to check the number of locks held on the database, I executed the below query.
select request_mode, count(*) as Total from sys.dm_tran_locks where resource_database_id= 7 group by request_mode order by count(*) desc
The output was
The number of X (exclusive) locks held on the database was 107. Since we were sure of the reason for the backup failure, we went ahead and stopped the “Shrink Job”. I executed the same query after stopping the job and the number of X locks dropped to just 1!
As expected the differential backup job completed without any issues after stopping the shrink job. This is one of the many negative impacts of shrinking a database.
Since this database is shrinking at a rate of 20 GB/day, my team will have to deal with these issues at least for the next one year!