Tag Archives: transaction log

Error: 9001, Severity: 21, State: 1 | The log for database ‘tempdb’ is not available

Today one of the production servers running SQL Server 2000 was rebooted during maintenance window. All the databases came online after reboot. But few minutes later, the monitoring tool started sending the below alert.

spid51    Error: 9001, Severity: 21, State: 1
spid51    The log for database 'tempdb' is not available

The DBA’s time has come! First headed to the drive on which tempdb transaction log was located to see if it had disappeared. No, that disk was online. Then something was wrong at the instance. To check the databases opened up Query Analyzer and fired sp_helpdb command. Same error. However, tempdb database getting listed under the databases. So issued use tempdb command and it was successful. Then issued sp_helpfile and it listed the data file and the transaction log file as well.

Now everything seems normal still tempdb reports that the log file is missing. Any query which makes tempdb use the transaction log would fail with the same error. It was very clear that SQL Server had issues with the transaction log because of the disk. The entries in the System Event Log at that time confirmed that indeed the storage had some serious issues. Here are some of those.

{Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere

The driver detected a controller error on \Device\Harddisk0.

The device, \Device\Scsi\vmscsi1, is not ready for access yet.

Since it was maintenance hours, no users were impacted. Since tempdb was having issues, DBA’s task was simplified. As tempdb gets recreated after every services restart, all we had to do was to restart the SQL Services. This time the storage seems have calmed down and the instance as well as the tempdb came online. Most importantly, all the databases continued to be online !

The storage team is already looking into this issue and hopefully they will fix it before this catches on to some other database.

How I conquered a corrupt database !

Today morning when I was rushing at my best to office, my cell phone started ringing. In a hurry I take the call, surprise ! A friend of mine who was too busy to call me up for the last two years was on the line. “I am doing good” was the one word answer for my question and the next sentence was “Pradeep, need your help“. I guessed it right, it was related to SQL Server.

One of the databases has gone suspect when the server rebooted unexpectedly ” was the next sentence. I checked if it was really urgent and came to know that it can wait. I a hurry to catch the cab, I asked him to send me the data and log files along with the error log and dropped off the call.

An hour ago when I came back from Office, started checking the database.  The error log of that SQL Server 2005 instance read

Error: 9003, Severity: 20, State: 1.
The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Error: 3414, Severity: 21, State: 1.
An error occurred during recovery, preventing the database ‘router’ (database ID 12) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

Wow! The router database has data corruption. Since I had the liberty to play around with the database, started my “Operation Corruption”. Here is what I did.

First tried attaching the .mdf and .ldf files (even after knowing that it will be a futile exercise). Got this error.

The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Could not open new database ‘router’. CREATE DATABASE is aborted. (Microsoft SQL Server, Error: 9003)

The next step was to fool SQL Server. Created a new database with the same name ‘router’. Then took the database offline and  using Windows Explorer overwrote the data files with the corrupt .mdf and .ldf files. Then tried bringing the database online using SSMS. As expected got this error.

Error: 9003, Severity: 20, State: 1.
The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication.

Then put the database in Emergency mode using this command.

ALTER DATABASE router SET EMERGENCY

The next step was to run DBCC CHECKDB against that database.

DBCC CHECKDB (router) WITH ALL_ERRORMSGS

Holding my breath waited for the query to complete. The output was

Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1, partition ID 72057594056343552, alloc unit ID 72057594060341248 (type In-row data). Parent node for page (1:351) was not encountered.
There are 171 rows in 3 pages for object “DCL.dataexp_soho_filelog”.
CHECKDB found 0 allocation errors and 1 consistency errors in table ‘DCL.dataexp_soho_filelog’ (object ID 1899153811)

The index id 1 in the table “DCL.dataexp_soho_filelog” had data inconsistency.  Fortunately the table has only 1712 rows. Now I had two options. Since the corruption had occurred in the index, I could manually drop and recreate the index. The other option was to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS option. Out of the two, I chose the latter and let the expert handle the situation. For CHECKDB to run with repair_allow_data_loss option the database has to be in Single User mode. So did it and ran CHECKDB with allow data loss.

ALTER DATABASE [router] SET  SINGLE_USER WITH ROLLBACK IMMEDIATE
go
USE MASTER
go
DBCC CHECKDB (router, REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS
go

While I was hoping for the best, SQL Server gave me the output.

Msg 8945, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1 will be rebuilt.
The error has been repaired.
Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1, partition ID 72057594056343552, alloc unit ID 72057594060341248 (type In-row data). Parent node for page (1:351) was not encountered.
The error has been repaired.
There are 1712 rows in 3 pages for object “DCL.dataexp_soho_filelog”.
CHECKDB found 0 allocation errors and 1 consistency errors in table ‘DCL.dataexp_soho_filelog’ (object ID 1899153811)

Cool, the error has been repaired ! What was done? As evident in the output index id 1 was rebuilt for the table “DCL.dataexp_soho_filelog”. However to confirm that there is no more data corruption I ran DBCC CHECKDB again and got this result.

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘router’.

Now the database is reported to be clean, all I had to do was to bring it ONLINE using the below commands.

ALTER DATABASE [router] SET  MULTI_USER WITH ROLLBACK IMMEDIATE
go

ALTER DATABASE router SET ONLINE
go

I have already sent across the “clean” data and log files to my friend. For sure I will get a “Pradeep, you saved my b***” call tomorrow 😉