Tag Archives: data file

I/O error 21(The device is not ready.) on an ONLINE database

I am not yet ready!

Sunday late night, I got a call stating that one of the warehouse databases was throwing this error while the Application job was running.

"Microsoft OLE DB Provider for SQL Server:
I/O error 21(The device is not ready.) detected during read at offset 0x0000a7a239c000 in file 'DataFileName'.
- HY000 - HRESULT=0x80004005 “

Since the error message mentions that the drive on which one of the data file resided was not ready, it was obvious that the underlying storage was the problem. But the System Event Log did not have any error messages related to storage. The database was online and accessible. But when the DBA tried accessing the Properties of this database, the same error was thrown. But queries like sp_helpfile and select * from sysobjects were executing successfully. When any query tried accessing the objects located on that particular data file, they would fail returning the drive error.

The reason for this error was not something that happened at that time. Another email chain from the Windows Administrators sent a few hours ago said it all. They had found that the drive letter associated with the Mount Point hosting this data file was missing and they had *mapped* it somehow. Not sure how they had done the drive mapping. Most likely the drive hosting this data file got disconnected while the database was online.

This issue was not that serious to make the database go into suspect mode but came into picture only when someone tried accessing the objects on that data file. Since the alerts on this server were not getting delivered due to an issue with SQLMail about which I had posted sometime back.

How to fix this issue? The easiest option is to take the database offline and bring it online. Since I had bad experiences with storage on these data warehouse servers, I was worried if some other mount points hosting these databases had similar issues. Hence it was decided to restart the SQL Server services. The database came online and the *device not ready* error message disappeared. I was happy that my worst fears did not come true and started my Monday on a good note.

File cannot be shrunk as it is either being shrunk by another process or is empty

On Monday on one of the Production Instances we had a High Severity Incident. The problem was that a database which never exceeded 40 GB in size had started growing at a rate of ~10 GB per hour! This database was sort of an audit database which logged events from the application. It was logging events abnormally eating up all space on the database. The database ran out of space and the DBA team added one more data file on another drive to make some room for the database.

After sometime it turned out that the Application Team had changed the logging level in the Application which increased the inserts in the only table in that database. They decided to change the logging to the original setting. This stopped the database from growing abnormally. The table size was only 60 MB now but the database had grown to 300 GB+ in size. At the end of the incident it was decided to reclaim the unused space in the database and the DBA was asked to Shrink the data files.

Shrink me if you can

Since the Incident was resolved, it was a relatively easy task for the DBA to shrink the data files. We started by shrining the data file which was added recently. Since there is hardly any data in the database, the second data file which was ~200GB in size got shrunk in no time. But while trying to shrink the first data file, SQL Server reported the following error message.

File ID 1 of database ID 7 cannot be shrunk as it is either being shrunk by another process or is empty.
Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.

Going by the Error Message details the DBA checked if someone else was shrinking th data file. It was not the case. As per this KB article, if one tries to shrink the database when the database backup is underway, it would throw the same error message. But no database backups were running at that time. Moreover the issue addressed by this hotfix was applicable to SQL Server 2005 SP2. The instance in question was running SQL Server 2005 Sp3+. Even the truncateonly parameter in the SHRINKFILE command resulted in the same error.

DBCC SHRINKFILE ('logicalname' , 0, TRUNCATEONLY)
GO

At this time I had two options in mind.

  1. To create a table in the database and insert sample data into it. Then try shrinking.
  2. Increase the file size of the data file in question by 1 MB and try shrinking.

Since the database was *owned* by the Application Team and it was one of the most critical systems for the customer, I did not want to go by the first option. The second option seemed to be more feasible. Hence I increased the data file size by 1 MB.

ALTER DATABASE [dbname] MODIFY FILE ( NAME = N'datafile', SIZE = 141313MB )
GO

After modifying the data file size, I issued the query to shrink the data file to 15 GB. Surprise! Within no time, the data file got shrunk to 15 GB!

PS: Data file should never be shrunk unless it is a one time activity to reclaim space as in my case. Shrinking data file causes indexes to be fragmented and will undo all the good work done by Rebuilding the Indexes. This article by Paul Randal (b | t) explains it with an example.

CHECKDB running every minute? Check Database Properties

Sometime back I came across a question in the forums where the user was getting this message in the SQL Server Error Log every minute.

CHECKDB for database 'DBName' finished without errors on [date and time].
This is an informational message only; no user action is required.

Starting up database 'DBName'

He had not scheduled CHECKDB to run every minute and wanted to know what does this message mean? Quick glance at the informational message clearly indicates that SQL Server is not reporting the results of DBCC CHECKDB. This message is reported in the Error Log whenever a database starts up. This is a new feature in SQL Server 2005 onwards about which I had posted here. The third line in the above message confirms that database is indeed starting up every minute.

Why does the database starts up every minute?

This is because the AutoClose property for that database is set to True.

With this property set to True, when the last user connection disconnects, the database is closed. When a user connects back to the database, the database again starts up and the informational message is logged in the SQL Server Error Log. When a database starts up resources are assigned to it and when it is closed the resources are released. The AutoClose option is useful on a database which is not frequently used like in a database running on SQL Server Express Edition. But if this property is set to True on a busy OLTP database it will have negative impact on the performance of the instance.

Even I had come across few databases in my client’s environment where the AutoClose property was set to True. Since these databases were small in size and did not have much importance, there was no impact. This property can be turned off using Database Properties dialogue in SSMS or using the following query.

ALTER DATABASE [DBName] SET AUTO_CLOSE OFF

How many times did my database auto grow?

Today as I entered the Office, I came know that we were receiving alerts stating that the transaction log files of tempdb on one of the instances was almost full. This instance was running SQL Server 2000. I quickly looked at the transaction log file of tempdb and shocked to see that the initial size was set at 250 MB and auto growth of 100 MB! Also there were a lot of 833 errors in the Event Log. Everytime templog would take a lot of time to grow because of this and as a result the transaction which needed more tempdb transaction log space would get delayed/terminated. Even though I was able to diagonize the issue, I was not able to get the number of times the templog file tried to “Auto Grow”. I wish this instance was running SQL Server 2005 or above.

What changed in SQL Server 2005 in this regard? The Default Trace. Since SQL Server 2005 and above have the ability to run the Default Trace. This trace captures almost all the events that a DBA would need to torubleshoot.

Check if the Default Trace is running:

select name, value_in_use
from sys.configurations
where name='default trace enabled'

If value_in_use = 1 then the Default Trace is enabled.

Enable Default Trace if not running:

sp_configure 'default trace enabled', 1
go

reconfigure with override
go

Once the default trace starts capturing the data, executing the below query will give the details of how many times did the database auto grow.

DECLARE @trcfilename VARCHAR(1000);
SELECT @trcfilename = path FROM sys.traces WHERE is_default = 1
SELECT StartTime,
DB_NAME(databaseid)as DatabaseName,
Filename,
SUM ((IntegerData*8)/1024) AS [Growth in MB],
(Duration/1000)as [Duration in seconds]
FROM ::fn_trace_gettable(@trcfilename, default)
WHERE (EventClass = 92 OR EventClass = 93)
GROUP BY StartTime,Databaseid, Filename, IntegerData, Duration
order by StartTime

This output can be further analyzed to check if the initial size and auto growth settings of a database has been properly set.

Find free space in a file group

Today morning the monitoring tool threw an error that one of the databases was 90% full. Space was needed to be added to that database. Since it had 54 data files spread across 10 file groups, the need was to find out which file group was running out of space. As a production support DBA this is one of the most common tasks. I use the below script to find out the space allocation on a database grouped by the file groups. The script makes use of the undocumented DBCC showfilestats command to get the space details.

declare @dbname varchar(256)
IF (OBJECT_ID('tempdb..#space') IS NOT NULL)
            drop table #space

IF (OBJECT_ID('tempdb..#filestats') IS NOT NULL)
            drop table #filestats

IF (OBJECT_ID('tempdb..#filegroup') IS NOT NULL)
            drop table #filegroup

create table #filestats
(fileid int,
filegroup int,
totalextents int,
usedextents int,
name varchar(255),
filename varchar(1000))

create table #filegroup
(groupid int,
groupname varchar(256))

	insert into #filestats
	exec ('DBCC showfilestats with no_infomsgs')

	insert into #filegroup
	select  groupid, groupname
    from sysfilegroups

select g.groupname,
	sum(TotalExtents)*64.0/1024 as TotalSpaceMB,
	sum((TotalExtents - UsedExtents) * 64.0 / 1024.0) AvailSpaceMB
from #filestats f
join #filegroup g on f.filegroup = g.groupid
group by g.groupname

drop table #filestats
drop table #filegroup
set nocount off

The output of this script looks like this

Get Table name from data file offset address

I am having really tough time on two of the production servers in our environment. The storage related issues on these servers are causing problems for the databases. There have been many changes to the environment from storage perspective like firmware upgrade, storage migration to name a few. But the issues seems to be going from bad to worse.

Last week, on the instance which hosts the biggest data warehouse database of 10 TB in size we started getting messages related to disk subsystem. The worst part was that the storage demon had caught the 10 TB database itself! The error message read

Error: 823, Severity: 24, State: 2
I/O error (bad page ID) detected during read
at offset 0x00005a0b020000 in file 'M:Data************_Data.NDF'

This error message would appear in the SQL Server Error Log at least 10 times a day. No corresponding error message in the System Event Log from the Disk Subsystems. As usual the storage team informed that “all looks good from Storage perspective”.  Even a single occurrence of this message is enough send shivers down the spines of a DBA. Recurring messages meant that the DBA had to act quick and fast.

The first that a DBA a would do is to run DBCC CHECKDB against the database. As mentioned earlier, this database is 10 TB in size and running CHECKDB against this database would take at least 48 hours (from the recent experience on this database). Since the customer does not want his “load” jobs to be held for such a long duration, the other option was to run DBCC CHECKFILEGROUP. The file group on which this datafile was residing constituted 90% of the total database size! Hard luck.

The last option was to get to the table which was affected by this issue. But the error message did not have the Page ID, so that we could run the life saving DBCC PAGE command to get the table name. All it had was the hexadecimal offset id 0x00005a0b020000 for that file. Is there an option to get the Page ID from the offset?

The official documentation gets the due respect only at the time of crisis. In this case this KB article on 823 error was more than useful. With the help of this article, I was able to get to the affected Page ID. Here is how.

Launched Calc.exe and divied the hexadecimal value 0x00005a0b020000 by 8192

Now that I got a value of B1E749 all I need to do was to convert it to Decimal by clicking on the Dec radio button. The decimal equivalent of this was 11659081.

Now the Page ID is ready to be fired using the below query. The database ID is 8 and the File ID of the file in question is 37.

DBCC TRACEON (3604)
go
DBCC PAGE (8,37,11659081,3)
go

This query gave the m_objid, i.e. the ID of the object which resides on this page.

The object id is also ready now and the table name is just a F5 away!

SELECT OBJECT_NAME(1711345161)

Now the task is simpler and quicker. A Change Control to run DBCC CHECKTABLE against this table is awaiting customer approval now.

The hole has already been dug up in this database, the output of CHECKTABLE will confirm how deep it is. Keeping my fingers crossed!

Property Owner is not available for Database

Couple of days back, on one of the databases encountered a strange error. When clicked on Properties of the database in SSMS, it threw an error which read

****************

Property Owner is not available for Database ‘[DatabaseName]’. This property may not exist for this object, or may not be retrievable due to insufficient access rights.  (Microsoft.SqlServer.Smo)

****************

sp_helpdb executed successfully. However for that particular database, the owner field was NULL.

Now the root cause has been found. Since the database owner was NULL, SSMS was unable to display the properties of the database. We have already pushed a Change Control which will execute sp_changedbowner ‘sa’ against that database. This is the permanent fix for this issue.

How did the Database Owner name change to NULL? Here is an example.

Let us make a Windows Login the database owner for ownertest database.

CREATE LOGIN [ADIGA\testdbownerissue] FROM WINDOWS WITH DEFAULT_DATABASE=[ownertest]
go

use ownertest
go
sp_changedbowner 'ADIGA\testdbownerissue'
go

Now let us delete that login from the Operating System /Active Directory (not at the instance). Now the output of sp_helpdb ownertest looks like this

This concludes that the Database Owner value changed to NULL because the Windows Account was removed from the Active Directory, without changing the DBOwner for the database for which this account was holding the ownership. This usually happens when a database owner leaves the organization and the database owner is not changed accordingly.