事故原因
由于意外断电导致数据库从库崩溃,无法启动
处理方式
重建数据库从库
具体步骤
备份还原
完整备份主库并还原到从库
- 在主库上查看 master 状态 - 1 
 2
 3
 4
 5
 6
 7
 8- MySQL [(none)]> show master status \G 
 *************************** 1. row ***************************
 File: mysql-bin.000761
 Position: 805513039
 Binlog_Do_DB:
 Binlog_Ignore_DB:
 Executed_Gtid_Set:
 1 row in set (0.00 sec)- 记住当前状态的 binlog 文件以及 Position 值; 
- 完全备份数据库 - 1 - mysqldump -uroot -p --routines --single_transaction --source-data=2 --all-databases > all_1210.sql - 注意, MySQL 8.0.26 版本以后 mysqldump 参数 - --master-data已弃用,建议使用- --source-data,否则生成的备份文件会提示无法使用,需要删除文件第一行内容,如下:- 1 - WARNING: --master-data is deprecated and will be removed in a future version. Use --source-data instead. 
- 恢复备份到从库 - 1 - mysql -uroot -p < all_1210.sql 
重建主从
- 在从库执行以下命令加入主从 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11- -- 停止 slave 服务 
 stop slave;
 -- 重置 slave 状态
 reset slave;
 -- 配置主从
 change master to master_host='10.1.40.13',master_user='repl',master_port=3306,master_password='********',master_log_file='mysql-bin.00000761',master_log_pos=805513039;
 -- 启动 slave
 start slave
- 查看 slave 状态 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62- show slave status \G 
 *************************** 1. row ***************************
 Slave_IO_State: Waiting for source to send event
 Master_Host: 10.1.40.13
 Master_User: repl
 Master_Port: 3306
 Connect_Retry: 60
 Master_Log_File: mysql-bin.000761
 Read_Master_Log_Pos: 767635683
 Relay_Log_File: base-noroot1-relay-bin.000003
 Relay_Log_Pos: 11834768
 Relay_Master_Log_File: mysql-bin.000761
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Replicate_Do_DB:
 Replicate_Ignore_DB:
 Replicate_Do_Table:
 Replicate_Ignore_Table:
 Replicate_Wild_Do_Table:
 Replicate_Wild_Ignore_Table: %.%xxl_job_log
 Last_Errno: 0
 Last_Error:
 Skip_Counter: 0
 Exec_Master_Log_Pos: 11834553
 Relay_Log_Space: 767639149
 Until_Condition: None
 Until_Log_File:
 Until_Log_Pos: 0
 Master_SSL_Allowed: No
 Master_SSL_CA_File:
 Master_SSL_CA_Path:
 Master_SSL_Cert:
 Master_SSL_Cipher:
 Master_SSL_Key:
 Seconds_Behind_Master: 57739
 Master_SSL_Verify_Server_Cert: No
 Last_IO_Errno: 0
 Last_IO_Error:
 Last_SQL_Errno: 0
 Last_SQL_Error:
 Replicate_Ignore_Server_Ids:
 Master_Server_Id: 1
 Master_UUID: c01abe46-b588-11ec-9bde-005056960420
 Master_Info_File: mysql.slave_master_info
 SQL_Delay: 0
 SQL_Remaining_Delay: NULL
 Slave_SQL_Running_State: Waiting for dependent transaction to commit
 Master_Retry_Count: 86400
 Master_Bind:
 Last_IO_Error_Timestamp:
 Last_SQL_Error_Timestamp:
 Master_SSL_Crl:
 Master_SSL_Crlpath:
 Retrieved_Gtid_Set:
 Executed_Gtid_Set:
 Auto_Position: 0
 Replicate_Rewrite_DB:
 Channel_Name:
 Master_TLS_Version:
 Master_public_key_path:
 Get_master_public_key: 0
 Network_Namespace:
异常处理
根据 Last_Error 中提示的 master log 和 end_log_pos 的位置查找这条从库上缺失的数据
| 1 | Last_Errno: 1062 | 
如上信息所示:得到
master log文件为mysql-bin.000761,end_log_pos为1501634
- 1062错误的主要原因是主键冲突,出现这种情况就是从库出现插入操作,主库又插入相同的数据,iothread 没问题,sqlthread 出错
 处理此种错误一般有两种思路:- 直接跳过错误执行语句
- 找到错误执行语句,修复主库2数据
 - 第一种解决方案会有造成主从不一致的隐患(delete语句可以跳过),第二种是从根本上解决问题比较推荐 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20- mysqlbinlog --no-defaults -v --base64-output=decode-rows /data/mysql/mysql-bin.000761 --stop-position=1501634 |tail -20 
 ####### 以上命令会输出如下信息:#######
 #231210 21:15:15 server id 1 end_log_pos 1501534 CRC32 0x7d06e91f Table_map: `pet_xxl_job_dev`.`xxl_job_registry` mapped to number 130
 # at 1501534
 #231210 21:15:15 server id 1 end_log_pos 1501634 CRC32 0x36ba3dc5 Write_rows: table id 130 flags: STMT_END_F
 ### INSERT INTO `pet_xxl_job_dev`.`xxl_job_registry`
 ### SET
 ### @1=3217
 ### @2='EXECUTOR'
 ### @3='pet-search-dev'
 ### @4='http://192.168.244.161:8080/'
 ### @5='2023-12-10 21:10:30'
 ROLLBACK /* added by mysqlbinlog */ /*!*/;
 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
 DELIMITER ;
 # End of log file
 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;- 手动转变为 delete 语句,然后在从库执行语句, - 1 - delete from `pet_xxl_job_dev`.`xxl_job_registry` where id = '3217'; 
- 1032错误的主要原因是主库更新或者是删除的记录在从库上不存在引起的。
 处理此种错误一般有两种思路:- 直接跳过错误执行语句
- 找到错误执行语句,修复从库数据
 - 第一种解决方案会有造成主从不一致的隐患(delete语句可以跳过),第二种是从根本上解决问题比较推荐 - 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38- mysqlbinlog --no-defaults -v --base64-output=decode-rows --stop-position=6919 /data/mysql/mysql-bin.000761 |tail -40 
 ####### 以上命令会输出如下信息:#######
 #231210 21:13:25 server id 1 end_log_pos 6693 CRC32 0x4e891fb2 Table_map: `pet_easyretry_sit`.`server_node` mapped to number 91
 # at 6693
 #231210 21:13:25 server id 1 end_log_pos 6919 CRC32 0x6901bbcf Update_rows: table id 91 flags: STMT_END_F
 ### UPDATE `pet_easyretry_sit`.`server_node`
 ### WHERE
 ### @1=758102
 ### @2='erp_jushuitan_biz_sit'
 ### @3='1733780260060938240'
 ### @4='192.168.127.254'
 ### @5='/'
 ### @6=6026
 ### @7='2023-12-10 21:08:07'
 ### @8=1
 ### @9=NULL
 ### @10='2023-12-10 20:44:36'
 ### @11='2023-12-10 21:09:57'
 ### SET
 ### @1=758102
 ### @2='erp_jushuitan_biz_sit'
 ### @3='1733780260060938240'
 ### @4='192.168.127.254'
 ### @5='/'
 ### @6=6026
 ### @7='2023-12-10 21:09:07'
 ### @8=1
 ### @9=NULL
 ### @10='2023-12-10 20:44:36'
 ### @11='2023-12-10 21:13:25'
 ROLLBACK /* added by mysqlbinlog */ /*!*/;
 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
 DELIMITER ;
 # End of log file
 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;- 手动将 update 语句转换为 insert 语句,然后在从库执行语句 - 1 - insert into `pet_easyretry_sit`.`server_node` values(758102,'erp_jushuitan_biz_sit','1733780260060938240','192.168.127.254','/',6026,'2023-12-10 21:08:07',1,NULL,'2023-12-10 20:44:36','2023-12-10 21:09:57'); 
 
		