DBCC CHECKDB in tough and easy situations.

Every production database should be subject to an integrity check with DBCC CHECKDB.  Every week at least.  It’s not just a best practice; the failure to do so should be considered dereliction of DBA duty.  The longer you go with an integrity error in your database, the higher the likelihood of a catastrophic, call Paul Randal for help, situation.

DBCC CHECKDB in a HA tight-disk situation

However, what happens when you have a situation where you have a high-uptime VLDB with little to no disk to spare?  You might have the feeling that you are in a catch-22; If you use TABLOCK to avoid the snapshots, you need downtime, but if you don’t use TABLOCK to do the DBCC, you have no disk (on the drive with the file where your object’s pages are stored) and your DBCC crashes with error 17053, root cause OS Error 112.

There is a way around this:  Take a full backup of the database, and restore it to another server, where the database may be accessed without blocking other applications.  Once restore is complete, run DBCC CHECKDB with TABLOCK.  BACKUP DATABASE copies database pages with data into the backup, and RESTORE DATABASE copies them back.  Running DBCC CHECKDB will give you a reliable picture of the integrity of the production database.  Moreover, this strategy gives you a reliable test of your backup strategy.

DBCC CHECKDB as a Scheduled Job

What if you can run DBCC CHECKDB without TABLOCK and not have problems, or if you can get the database into single-user mode for a clean run with TABLOCK?  Then you need to make a scheduled job out of it, for each and every database on your server.  SQL Server Maintenance plans will do this for you.

You should schedule DBCC CHECKDB weekly.  I recommend that you schedule this job to notify DBA whether or not the job succeeds. It is good to put the DBCC output in your face every single week, to give you assurance that it has been completed.

What if DBCC CHECKDB finds a problem?

If you have a consistency error, then, regardless of any other issues, you get with management and you come down as soon as possible! Get that database into single-user mode and start to fix the issue. The exact things you should do to fix the errors depend on the errors, and those repair attempts are best left to a more detailed post.

The important thing to remember is that time is of the essence.  Every second you are up with an integrity error is another second where the opportunity exists for that error to become serious and perhaps irreparable.

Once the error(s) are corrected and a DBCC CHECKDB with TABLOCK passes in single-user mode, then take a full backup at once.  Assume that all previous backups are corrupt.

Your managers may be grumpy that a production system had to come down for hours to repair a data inconsistency, but that is better than failure to report a potential problem that escalates to disaster.  If your managers refuse to bring the system down after you warn them of the error and its potential risks, then the responsibility lies with them and not you.

Whatever you do, running and verifying regular integrity checks on all production databases should be considered a core responsibility of your job.

Leave a Reply

Your email address will not be published. Required fields are marked *