SQL Server 数据页损坏修复

当我们发现数据库数据页损坏了,或者执行 DBCC CHECKDB 发现有损坏的数据页时,大部分人都执行如下操作进行修复。

ALTER DATABASE db_name SET EMERGENCY;
DBCC CHECKDB (‘db_name’);
ALTER DATABASE db_name SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
–{ REPAIR_ALLOW_DATA_LOSS | REPAIR_FAST | REPAIR_REBUILD }
DBCC CHECKDB (‘db_name’, REPAIR_ALLOW_DATA_LOSS);
ALTER DATABASE db_name SET MULTI_USER;
不过,上面的修复可能不成功,也可能使数据丢失。因为数据页的损坏也分多钟情况,如日志损坏、索引损坏、数据损坏、系统对象损坏等。现在我们考虑的是用户库数据损坏的情况修复。

为了能进行数据修复,数据库须使用完整模式,先进行一次完整备份。

ALTER DATABASE [TestDBSubA] SET RECOVERY FULL WITH NO_WAIT
BACKUP DATABASE [TestDBSubA] TO DISK = N’E:\DatabaseFile\Backup\TestDBSubA.bak

我们先任意找一个数据页(如页ID=179)进行写入破坏。更改的偏移量为100(96页头+前4个数据字符),替换了10个字符

–找一个数据页
DBCC TRACEON(3604,-1)
DBCC IND(TestDBSubA,Test,-1)
DBCC PAGE(‘TestDBSubA’, 1, 179,3)

–破坏该数据页
DBCC WRITEPAGE(‘TestDBSubA’, 1, 179, 100, 10, 0x65656565656565656565)

–检查DB
DBCC CHECKDB(‘TestDBSubA’)
DBCC results for ‘TestDBSubA’.

…………(此处省略)

DBCC results for ‘sys.syssoftobjrefs’.

There are 4 rows in 1 pages for object “sys.syssoftobjrefs”.

Msg 8933, Level 16, State 1, Line 1

Table error: Object ID 1019150676, index ID 1, partition ID 72057594051100672, alloc unit ID 72057594059948032 (type In-row data).

The low key value on page (1:179) (level 0) is not >= the key value in the parent (1:431) slot 6.

…………(此处省略)

CHECKDB found 0 allocation errors and 1 consistency errors in database ‘TestDBSubA’.

repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (TestDBSubA).

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

此时我们再看看该数据页存储的信息。

DBCC PAGE(‘TestDBSubA’, 1, 179,3)
图片

页面内容有10个字符被替换了,十六进制为 65,转换十进制为 101即为ASCII值,对应的字符为字母 e。这10个字符改动的,为该表该行字段 GUID 的部分值。也就是说,该行的字段GUID数据丢失了!

现在使用修改后的值操作该行数据,更改时发生错误。

–查看该行记录,正常
SELECT [GUID],[SID],[NAME],[VALUE]
FROM [TestDBSubA].[dbo].[Test]
WHERE GUID=’65656565-6565-6565-6565-005056c00008’

–第一列[SID]被writepage更改了,所以此时更改[SID]将报错!
UPDATE T SET [SID]=SUSER_SID()
FROM [TestDBSubA].[dbo].[Test] T
WHERE GUID=’65656565-6565-6565-6565-005056c00008’
Msg 8646, Level 21, State 1, Line 1

Unable to find index entry in index ID 1, of table 1019150676, in database ‘TestDBSubA’.

The indicated index is corrupt or there is a problem with the current update plan.

Run DBCC CHECKDB or DBCC CHECKTABLE. If the problem persists, contact product support.

Msg 0, Level 20, State 0, Line 0

当前命令发生了严重错误。应放弃任何可能产生的结果。

当发生问题时,我们没能及时发现和修复,其他表或数据又有新的操作,我们模拟如下。

DELETE TOP(10) FROM [TestDBSubA].[dbo].[Test]
WHERE GUID<>’65656565-6565-6565-6565-005056c00008’
GO
UPDATE T SET VALUE=100 FROM [TestDBSubA].[dbo].[Test] T
GO

那么该如何修复呢?

修复之前,我们要习惯进行一次日志备份。

BACKUP LOG [TestDBSubA] TO DISK = N’E:\DatabaseFile\Backup\TestDBSubA_LOG.trn’

刚开始之前,我们有进行过一次完整备份。那时的完整备份的数据还没有损坏,所有我们可以用最近的完整备份进行某个数据页的修复。

RESTORE DATABASE [TestDBSubA]
PAGE = ‘1:179’
FROM DISK = ‘E:\DatabaseFile\Backup\TestDBSubA.bak’
WITH NORECOVERY
Processed 1 pages for database ‘TestDBSubA’, file ‘TestPub’ on file 1.

RESTORE DATABASE … FILE= successfully processed 1 pages in 0.072 seconds (0.108 MB/sec).

接下来,我们需要还原刚刚备份的事务日志,将数据还原到最近状态。

RESTORE LOG [TestDBSubA]
FROM DISK = ‘E:\DatabaseFile\Backup\TestDBSubA_LOG.trn’
WITH RECOVERY;
Processed 0 pages for database ‘TestDBSubA’, file ‘TestPub’ on file 1.

The roll forward start point is now at log sequence number (LSN) 597000000036800001.

Additional roll forward past LSN 597000000038800001 is required to complete the restore sequence.

RESTORE LOG successfully processed 0 pages in 0.035 seconds (0.000 MB/sec).

此时再进行对该表操作,发现2个语句都报错了!是不是有些慌?

UPDATE T SET [SID]=SUSER_SID()
FROM [TestDBSubA].[dbo].[Test] T
WHERE GUID=’65656565-6565-6565-6565-005056c00008’
GO
SELECT [GUID],[SID],[NAME],[VALUE]
FROM [TestDBSubA].[dbo].[Test]
GO
Msg 829, Level 21, State 1, Line 1

Database ID 7, Page (1:179) is marked RestorePending, which may indicate disk corruption.

To recover from this state, perform a restore.

检查数据库,仍然报错。

DBCC CHECKDB(‘TestDBSubA’)
Msg 8939, Level 16, State 98, Line 1

Table error: Object ID 1019150676, index ID 1, partition ID 72057594051100672, alloc unit ID 72057594059948032 (type In-row data), page (1:179).

Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 133129 and -6.

Msg 8928, Level 16, State 1, Line 1

Object ID 1019150676, index ID 1, partition ID 72057594051100672, alloc unit ID 72057594059948032 (type In-row data): Page (1:179) could not be processed.

See other errors for details.

Msg 8978, Level 16, State 1, Line 1

Table error: Object ID 1019150676, index ID 1, partition ID 72057594051100672, alloc unit ID 72057594059948032 (type In-row data).

Page (1:419) is missing a reference from previous page (1:179). Possible chain linkage problem.

Msg 8976, Level 16, State 1, Line 1

Table error: Object ID 1019150676, index ID 1, partition ID 72057594051100672, alloc unit ID 72057594059948032 (type In-row data).

Page (1:179) was not seen in the scan although its parent (1:431) and previous (1:420) refer to it. Check any previous errors.

怎么解决呢?很简单,再进行一次事务日志的备份和还原即可!

USE MASTER
GO
BACKUP LOG [TestDBSubA]
TO DISK = N’E:\DatabaseFile\Backup\TestDBSub_LOG.trn’
WITH INIT,FORMAT
GO
RESTORE LOG [TestDBSubA]
FROM DISK = ‘E:\DatabaseFile\Backup\TestDBSub_LOG.trn’
WITH RECOVERY;
GO

本作品采用《CC 协议》,转载必须注明作者和本文链接
讨论数量: 0
(= ̄ω ̄=)··· 暂无内容!

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!