一次意外停电导致的MySQL无法启动修复案例

背景

公司机房(杂物间)意外停电,没有UPS,停电就立马断电。来电之后服务器自动启动,但是访问项目的时候,提示无法连接数据库“SQLSTATE[HY000] [2002] Connection refused”,登入数据库所在的服务器,原来服务器没跑起来,看一下日志(部分)

2022-06-29T05:44:54.949983Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.29-0ubuntu0.20.04.3) starting as process 1137

2022-06-29T05:44:54.969806Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.

2022-06-29T05:44:55.896713Z 0 [ERROR] [MY-011906] [InnoDB] Database page corruption on disk or a failed file read of page [page id: space=4621, page number=1515]. You may have to recover from a backup.
 len 16384; hex XXXXXXXX一大串十六进制XXXXXXXXX
 InnoDB: End of page dump
InnoDB: Page may be an index page where index id is 5481

2022-06-29T05:48:51.191701Z 1 [ERROR] [MY-011937] [InnoDB] [FATAL] Apparent corruption of an index page [page id: space=4621, page number=1917] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.

2022-06-29T05:48:51.191727Z 1 [ERROR] [MY-013183] [InnoDB] Assertion failure: buf0dblwr.cc:1031:ib::fatal triggered thread 140691670173440
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
05:48:51 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x55dbfee28630
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff55505fce0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x55dbf9d1ead1]
/usr/sbin/mysqld(print_fatal_signal(int)+0x2fb) [0x55dbf8bbf7db]
/usr/sbin/mysqld(my_server_abort()+0x76) [0x55dbf8bbf926]
/usr/sbin/mysqld(my_abort()+0xe) [0x55dbf9d18ace]
/usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x349) [0x55dbf9f947b9]
/usr/sbin/mysqld(ib::fatal::~fatal()+0xd5) [0x55dbf9f96c75]
/usr/sbin/mysqld(Double_write::croak(buf_block_t const*)+0x25e) [0x55dbfa00a51e]
/usr/sbin/mysqld(Double_write::prepare(buf_page_t const*, void**, unsigned int*)+0x15d) [0x55dbfa00a75d]
/usr/sbin/mysqld(Double_write::enqueue(buf_flush_t, buf_page_t*, file::Block const*)+0x97) [0x55dbfa011fc7]
/usr/sbin/mysqld(dblwr::write(buf_flush_t, buf_page_t*, bool)+0x1fb) [0x55dbfa01280b]
/usr/sbin/mysqld(buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool)+0x277) [0x55dbfa01ca67]
/usr/sbin/mysqld(+0x2591657) [0x55dbfa01f657]
/usr/sbin/mysqld(buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, unsigned long*)+0x653) [0x55dbfa020583]
/usr/sbin/mysqld(buf_flush_lists(unsigned long, unsigned long, unsigned long*)+0xc7) [0x55dbfa021907]
/usr/sbin/mysqld(buf_flush_sync_all_buf_pools()+0x5a) [0x55dbfa0219ca]
/usr/sbin/mysqld(srv_start(bool)+0x2b77) [0x55dbf9f34f27]
/usr/sbin/mysqld(+0x22ce94e) [0x55dbf9d5c94e]
/usr/sbin/mysqld(dd::bootstrap::DDSE_dict_init(THD*, dict_init_mode_t, unsigned int)+0x9e) [0x55dbf9aa24ae]
/usr/sbin/mysqld(dd::upgrade_57::do_pre_checks_and_initialize_dd(THD*)+0x1a9) [0x55dbf9cf1449]
/usr/sbin/mysqld(+0x11f6d86) [0x55dbf8c84d86]
/usr/sbin/mysqld(+0x27e1bcd) [0x55dbfa26fbcd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7ff5643e0609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7ff563630133]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

2022-06-29T05:48:51.971301Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.29-0ubuntu0.20.04.3) starting as process 1384
循环错误
...

分析(仅供参考,如有错误欢迎指正)

提炼一下上面日志的精髓:

  • [ERROR] [MY-011906] [InnoDB] Database page corruption on disk or a failed file read of page [page id: space=4621, page number=1515]. You may have to recover from a backup.
  • [ERROR] [MY-011937] [InnoDB] [FATAL] Apparent corruption of an index page [page id: space=4621, page number=1917] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.

MySQL有个redo Log,应该是数据库断电的时候,有个事物没做完就断电了,正常情况下再次启动数据库的时候,会进行事务恢复。
但是,好巧不巧有个表的page坏了,重做事务的时候做不了,好像是重做事务的时候提示表有错误(这里没有截图)导致事务无法重做。
然后可以加innodb_force_recovery的参数来强制启动存储引擎,但是加了也没用1-5都还是“We intentionally crash the server to prevent corrupt data from ending up in data files.”,数据库跑不起来。

尝试解决

分析一下表,看到有个表确实有问题
一次意外停电导致的MySQL无法启动修复案例

InnoDB: The B-tree of index PRIMARY is corrupted.
InnoDB: Index ‘transaction_id_idx’ contains 35824 entries, should be 18446744073709551615
参考了一下百度与官方文档,大概就是没救了,只能通过备份恢复。

解决

使用innodb_force_recovery参数来启动数据库,支持0-6(默认0),我从1设置到5数据库都是跑起来立马就崩,设置成6 ( SRV_FORCE_NO_LOG_REDO),才能把数据库跑起来。但是这个状态值运行起来的数据库是只读的,只能读取/备份数据,并不能执行其他操作(mysqldump备份的时候遇到错误的表可能中断,建议其他方式)。

确认数据库的数据都导出来了之后,开始尝试删掉这张表。但是除了只读模式能跑起来,其他(1-5)都跑不起来,怎么删掉?
经过多次尝试,发现数据库进程虽然表面上跑不起来,但是启动的时候,有那么零点几秒是跑起来了的,这个时候是能执行命令的。所以我的方案是开两个ssh登入服务器。
一个ssh负责不停地启动数据库

sudo services start mysql

或者

sudo systemctl start mysql

另外一个负责登入数据库,选择数据库,删除数据表这三个关键环节。

sudo mysql
use operation_data
drop ebay_finances

因为mysql跑起来可能就零点几秒,可能得重试几次才能操作成功,多试几次就OK了
删掉这个表之后,innodb_force_recovery 记得设置成0或者注释/删掉这一行。然后重启数据库就恢复了正常运行

经验总结

上云

参考文档:

MySQL官方文档对innodb_force_recovery的相关说明 dev.mysql.com/doc/refman/8.0/en/fo...

本作品采用《CC 协议》,转载必须注明作者和本文链接
Buffett-Cai
讨论数量: 3

你看这种方法行不行:

首先编辑一个文件 init.sql

use operation_data;
drop ebay_finances;

然后使用如下命令启动:

mysqld --init-file=/path/init.sql --datadir=/path/
1年前 评论
Buffett-Cai (楼主) 1年前
mowangjuanzi (作者) 1年前

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!