How I conquered a corrupt database !

Today morning when I was rushing at my best to office, my cell phone started ringing. In a hurry I take the call, surprise ! A friend of mine who was too busy to call me up for the last two years was on the line. “I am doing good” was the one word answer for my question and the next sentence was “Pradeep, need your help“. I guessed it right, it was related to SQL Server.

One of the databases has gone suspect when the server rebooted unexpectedly ” was the next sentence. I checked if it was really urgent and came to know that it can wait. I a hurry to catch the cab, I asked him to send me the data and log files along with the error log and dropped off the call.

An hour ago when I came back from Office, started checking the database.  The error log of that SQL Server 2005 instance read

Error: 9003, Severity: 20, State: 1.
The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Error: 3414, Severity: 21, State: 1.
An error occurred during recovery, preventing the database ‘router’ (database ID 12) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

Wow! The router database has data corruption. Since I had the liberty to play around with the database, started my “Operation Corruption”. Here is what I did.

First tried attaching the .mdf and .ldf files (even after knowing that it will be a futile exercise). Got this error.

The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
Could not open new database ‘router’. CREATE DATABASE is aborted. (Microsoft SQL Server, Error: 9003)

The next step was to fool SQL Server. Created a new database with the same name ‘router’. Then took the database offline and  using Windows Explorer overwrote the data files with the corrupt .mdf and .ldf files. Then tried bringing the database online using SSMS. As expected got this error.

Error: 9003, Severity: 20, State: 1.
The log scan number (227:4088:3) passed to log scan in database ‘router’ is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication.

Then put the database in Emergency mode using this command.

ALTER DATABASE router SET EMERGENCY

The next step was to run DBCC CHECKDB against that database.

DBCC CHECKDB (router) WITH ALL_ERRORMSGS

Holding my breath waited for the query to complete. The output was

Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1, partition ID 72057594056343552, alloc unit ID 72057594060341248 (type In-row data). Parent node for page (1:351) was not encountered.
There are 171 rows in 3 pages for object “DCL.dataexp_soho_filelog”.
CHECKDB found 0 allocation errors and 1 consistency errors in table ‘DCL.dataexp_soho_filelog’ (object ID 1899153811)

The index id 1 in the table “DCL.dataexp_soho_filelog” had data inconsistency.  Fortunately the table has only 1712 rows. Now I had two options. Since the corruption had occurred in the index, I could manually drop and recreate the index. The other option was to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS option. Out of the two, I chose the latter and let the expert handle the situation. For CHECKDB to run with repair_allow_data_loss option the database has to be in Single User mode. So did it and ran CHECKDB with allow data loss.

ALTER DATABASE [router] SET  SINGLE_USER WITH ROLLBACK IMMEDIATE
go
USE MASTER
go
DBCC CHECKDB (router, REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS
go

While I was hoping for the best, SQL Server gave me the output.

Msg 8945, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1 will be rebuilt.
The error has been repaired.
Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 1899153811, index ID 1, partition ID 72057594056343552, alloc unit ID 72057594060341248 (type In-row data). Parent node for page (1:351) was not encountered.
The error has been repaired.
There are 1712 rows in 3 pages for object “DCL.dataexp_soho_filelog”.
CHECKDB found 0 allocation errors and 1 consistency errors in table ‘DCL.dataexp_soho_filelog’ (object ID 1899153811)

Cool, the error has been repaired ! What was done? As evident in the output index id 1 was rebuilt for the table “DCL.dataexp_soho_filelog”. However to confirm that there is no more data corruption I ran DBCC CHECKDB again and got this result.

CHECKDB found 0 allocation errors and 0 consistency errors in database ‘router’.

Now the database is reported to be clean, all I had to do was to bring it ONLINE using the below commands.

ALTER DATABASE [router] SET  MULTI_USER WITH ROLLBACK IMMEDIATE
go

ALTER DATABASE router SET ONLINE
go

I have already sent across the “clean” data and log files to my friend. For sure I will get a “Pradeep, you saved my b***” call tomorrow ;-)

11 thoughts on “How I conquered a corrupt database !

  1. manjot

    Hey Pradeep,
    what if someone prefers to manually drop and recreate the index rather than DBCC CHECKDB (, REPAIR_ALLOW_DATA_LOSS) ?
    What would you do in that case?

    1. PradeepAdiga Post author

      Hi Manjot,
      Like I mentioned in the article “Since the corruption had occurred in the index, I could manually drop and recreate the index”. I could have recreated the index. Since I had the luxury of trying out different things, chose to run Repair_Allow_Data_Loss. In this case, CHECKDB did the same thing, drop and re-created the Index ID 1.

      Thanks,
      Pradeep

  2. karthik

    Hi Praddep,

    Excellent Blog .Keep up the good work.

    Isn’t restoring from backups (if u have one ) the first step to do when you have a suspect database.

    Do not get me wrong but probably you should include that step as the first step .

    1. PradeepAdiga Post author

      Hi Karthik,
      Thank you. You are right, restoring it from a good backup is the best and safest method to recover a suspect database. I missed to inform in my post, that he had a good backup which was taken a week ago !

  3. Guru Charan

    Something is not right.. it was giving error messages related to Log file and how come DBCC CHECKDB resolve the corruption.
    Questions: What was the recovery model of the database? Is it Simple or Full?
    AFAIK, a database can not ignore log file and rebuild a new one unless it is simple.

    Strange!! good work though.

    1. PradeepAdiga Post author

      The error reads “This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf).” It does not necessarily mean that the log file is corrupt. Instead it means that the entry in the ldf does not match that of the data file. The database recovery model was FULL. So what I attempted was remove the log file so there is no question of mismatch between the data and log file. Then corrected the corruption which was on data file.

  4. optradba

    So what happened to the lost data as a result of the option REPAIR_ALLOW_DATA_LOSS that you took? How large was the amount of data lost?

    1. PradeepAdiga Post author

      Since the index was found to be corrupt, rebuilding the index fixed the issue. As mentioned in this post, the number of rows before and after DBCC remained the same.

Comments are closed.