服务器死机重启后,redis的AOF文件损坏。
359013:C 16 Nov 2024 10:17:16.317 * DB saved on disk
359013:C 16 Nov 2024 10:17:16.317 * RDB: 22 MB of memory used by copy-on-write
1693499:M 16 Nov 2024 10:17:16.331 * Background saving terminated with success
107016:C 16 Nov 2024 12:07:34.038 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
107016:C 16 Nov 2024 12:07:34.039 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=107016, just started
107016:C 16 Nov 2024 12:07:34.039 # Configuration loaded
107017:M 16 Nov 2024 12:07:34.068 * Node configuration loaded, I'm f7488d141611784206d1221e713fc4aa86964844
_._
_.-`__ ''-._
_.-` . _. ''-._ Redis 5.0.3 (00000000/0) 64 bit
.-` .-. \/ _.,_ ''-._
( ' , .- | , ) Running in cluster mode
|-._-...- __...-.-._|' _.-'| Port: 7000
| -._ ._ / _.-' | PID: 107017
-._ -._ -./ _.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' | http://redis.io
-._ -._-.__.-'_.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' |
-._ -._-.__.-'_.-' _.-'
-._ -.__.-' _.-'
-._ _.-'
-.__.-'
107017:M 16 Nov 2024 12:07:34.071 # Server initialized
107017:M 16 Nov 2024 12:07:34.071 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
107017:M 16 Nov 2024 12:07:34.072 * Reading RDB preamble from AOF file...
107017:M 16 Nov 2024 12:07:34.247 * Reading the remaining AOF tail...
107017:M 16 Nov 2024 12:07:34.301 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
107132:C 16 Nov 2024 12:07:42.882 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
107132:C 16 Nov 2024 12:07:42.882 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=107132, just started
107132:C 16 Nov 2024 12:07:42.882 # Configuration loaded
107133:M 16 Nov 2024 12:07:42.884 * Node configuration loaded, I'm 1b0bc4a3b7ca6b985b9496df6ab0f2e1d8306d38
_._
_.-`__ ''-._
_.-` . _. ''-._ Redis 5.0.3 (00000000/0) 64 bit
.-` .-. \/ _.,_ ''-._
( ' , .- | , ) Running in cluster mode
|-._-...- __...-.-._|' _.-'| Port: 7001
| -._ ._ / _.-' | PID: 107133
-._ -._ -./ _.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' | http://redis.io
-._ -._-.__.-'_.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' |
-._ -._-.__.-'_.-' _.-'
-._ -.__.-' _.-'
-._ _.-'
-.__.-'
107133:M 16 Nov 2024 12:07:42.884 # Server initialized
107133:M 16 Nov 2024 12:07:42.885 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
107133:M 16 Nov 2024 12:07:42.885 * Reading RDB preamble from AOF file...
107133:M 16 Nov 2024 12:07:43.016 * Reading the remaining AOF tail...
107133:M 16 Nov 2024 12:07:43.054 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
117154:C 16 Nov 2024 12:19:58.406 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
117154:C 16 Nov 2024 12:19:58.406 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=117154, just started
117154:C 16 Nov 2024 12:19:58.406 # Configuration loaded
117155:M 16 Nov 2024 12:19:58.407 * Node configuration loaded, I'm f7488d141611784206d1221e713fc4aa86964844
_._
_.-`__ ''-._
_.-` . _. ''-._ Redis 5.0.3 (00000000/0) 64 bit
.-` .-. \/ _.,_ ''-._
( ' , .- | , ) Running in cluster mode
|-._-...- __...-.-._|' _.-'| Port: 7000
| -._ ._ / _.-' | PID: 117155
-._ -._ -./ _.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' | http://redis.io
-._ -._-.__.-'_.-' _.-'
|-._-._ -.__.-' _.-'_.-'|
| -._-._ _.-'_.-' |
-._ -._-.__.-'_.-' _.-'
-._ -.__.-' _.-'
-._ _.-'
-.__.-'
117155:M 16 Nov 2024 12:19:58.407 # Server initialized
117155:M 16 Nov 2024 12:19:58.407 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
117155:M 16 Nov 2024 12:19:58.407 * Reading RDB preamble from AOF file...
117155:M 16 Nov 2024 12:19:58.540 * Reading the remaining AOF tail...
117155:M 16 Nov 2024 12:19:58.577 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
根据日志内容,当前问题主要集中在以下两点:
- AOF文件损坏导致Redis无法正常启动。
- 透明大页(Transparent Huge Pages, THP)问题引发性能警告,但不会直接导致Redis不可用。
以下是详细的修复步骤:
步骤 1:修复AOF文件损坏问题
1.1 备份当前的AOF文件
在操作AOF文件前,请务必进行备份,以防止数据丢失:
cp /path/to/appendonly.aof /path/to/appendonly.aof.bak
1.2 使用redis-check-aof工具修复AOF文件
Redis日志中明确提示了AOF文件存在损坏,可以使用redis-check-aof
工具尝试修复:
redis-check-aof --fix /path/to/appendonly.aof
- 修复完成后重新启动Redis:
redis-server /path/to/redis.conf
1.3 验证修复效果
- 连接Redis实例,验证是否恢复正常:
如果返回redis-cli -c -p <端口号> ping
PONG
,说明Redis已经正常启动。
1.4 如果修复失败
如果redis-check-aof
工具无法修复AOF文件,可以采取以下措施:
-
重命名损坏的AOF文件,使用RDB数据恢复:
mv /path/to/appendonly.aof /path/to/appendonly.aof.damaged redis-server /path/to/redis.conf
Redis会根据RDB文件重新加载数据并生成新的AOF文件。
-
从其他健康节点同步数据:
如果当前节点是从节点,可以重新配置为从节点以同步数据:redis-cli -c -p <从节点端口> cluster replicate <主节点ID>
-
使用集群模式重新分片:
如果节点为主节点且无法修复,可以通过迁移分片到其他节点来保持集群健康:redis-cli --cluster reshard <健康节点IP>:<端口>
步骤 2:禁用透明大页(THP)
THP会影响Redis性能,需将其禁用。
2.1 临时禁用
以下命令可立即禁用THP,但仅在当前会话有效:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
2.2 永久禁用
编辑系统配置文件/etc/rc.local
,添加以下内容以在每次启动时禁用THP:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
确保/etc/rc.local
文件有执行权限:
chmod +x /etc/rc.local
2.3 验证禁用状态
执行以下命令确认THP已禁用:
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag
输出应为:always madvise [never]
。
步骤 3:监控和优化集群
3.1 检查集群状态
在其他健康的节点上检查当前集群状态,确认所有节点的连接状态:
redis-cli -c -p <健康节点端口> cluster nodes
redis-cli -c -p <健康节点端口> cluster info
- 确保
cluster_state
为ok
。 - 确认没有节点处于
fail
或fail?
状态。
3.2 配置自动AOF重写
在配置文件中启用AOF文件的自动重写,避免文件过大导致损坏:
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
3.3 定期备份
定期备份RDB和AOF文件,确保可以在出现问题时快速恢复。
步骤 4:总结
完成以上步骤后,Redis节点应该能够正常启动和运行。如果问题仍未解决,请提供以下信息以进一步分析:
- 修复后的日志输出。
- 当前集群的
cluster nodes
和cluster info
结果。 - 是否有其他错误提示。
通过以上方式可以确保Redis节点的健康状态恢复正常,同时降低后续发生问题的风险。