文章目录
- 0.BG
- 1. 编写docker-compose.yml文件
- 2. 哨兵配置文件sentinel.conf
- 3.启动容器
- 4.模拟故障转移
0.BG
redis环境有多中模式,包括Standalone,Cluster和Sentinel模式等。这里介绍一种简单搭建Sentinel模式的方法,搭建一个一主两重两哨兵的redis环境。
1. 编写docker-compose.yml文件
version: '3.8'
services:
redis-master:
container_name: redis-master
image: redis:6.0
networks:
redis-yy:
ipv4_address: 172.28.5.2
ports:
- "6379:6379"
command: redis-server --appendonly yes
redis-slave-1:
container_name: redis-slave-1
image: redis:6.0
networks:
redis-yy:
ipv4_address: 172.28.5.3
ports:
- "6380:6379"
command: redis-server --slaveof redis-master 6379 --appendonly yes
depends_on:
- redis-master
redis-slave-2:
container_name: redis-slave-2
image: redis:6.0
networks:
redis-yy:
ipv4_address: 172.28.5.4
ports:
- "6381:6379"
command: redis-server --slaveof redis-master 6379 --appendonly yes
depends_on:
- redis-master
redis-sentinel-1:
container_name: redis-sentinel-1
image: redis:6.0
networks:
redis-yy:
ipv4_address: 172.28.5.5
ports:
- "26379:26379"
command: redis-sentinel /etc/redis/sentinel.conf
volumes:
- $PWD/sentinel.conf:/etc/redis/sentinel.conf
depends_on:
- redis-master
- redis-slave-1
- redis-slave-2
redis-sentinel-2:
container_name: redis-sentinel-2
image: redis:6.0
networks:
redis-yy:
ipv4_address: 172.28.5.6
ports:
- "26380:26379"
command: redis-sentinel /etc/redis/sentinel.conf
volumes:
- $PWD/sentinel.conf:/etc/redis/sentinel.conf
depends_on:
- redis-master
- redis-slave-1
- redis-slave-2
networks:
redis-yy:
ipam:
driver: default
config:
- subnet: 172.28.0.0/16
ip_range: 172.28.5.0/24
gateway: 172.28.5.254
简单介绍一下上述文件:
- 只是给容器命名,指定了其redis镜像的版本。
- Command是容器启动后执行的指令:
- master节点只是以AOF方式启动。
- slave节点要指明其master节点是哪个。
- sentinel节点启动指明了配置文件,具体配置内容在配置文件中。
- 这里都没有设置密码
2. 哨兵配置文件sentinel.conf
sentinel monitor redis-yy 172.28.5.2 6379 2
sentinel down-after-milliseconds redis-yy 30000
sentinel parallel-syncs redis-yy 1
sentinel failover-timeout redis-yy 180000
sentinel monitor redis-yy 172.30.0.2 6379 2
这条指令的意思是哨兵监控一个名为redis-yy的redis,这个redis的IP与端口号是172.30.0.2与6379。最后一个2表示的是,由主观下线转变为客观下线的条件,即当两个哨兵认为当前master节点下线,那么就客观认为当前master节点已经下线。
sentinel down-after-milliseconds redis-yy 30000
要down超过30s才会认为其主观下线
sentinel down-after-milliseconds redis-yy 30000
选举超时时间
3.启动容器
使用如下指令启动docker容器
docker-compose -f docker-compose.yml up -d
Running 5/5
✔ Container redis-master Started 1.3s
✔ Container redis-slave-2 Started 1.5s
✔ Container redis-slave-1 Started 1.4s
✔ Container redis-sentinel-1 Started 1.6s
✔ Container redis-sentinel-2 Started
看到上述结果就表示容器已经启动成功。
可以连接到master节点查看信息:
➜ ~ docker exec -it 6473c293bda4 bash
root@6473c293bda4:/data# redis-cli
127.0.0.1:6379> info
# Server
redis_version:6.0.20
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:7cb942700da8c107
redis_mode:standalone
os:Linux 5.15.49-linuxkit-pr x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:12.2.0
process_id:1
run_id:0f2dc11b814f4b3d39edc15554a3b8e7fe182d30
tcp_port:6379
uptime_in_seconds:6775
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:2576091
executable:/data/redis-server
config_file:
io_threads_active:0
# Clients
connected_clients:5
client_recent_max_input_buffer:8
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:2041584
used_memory_human:1.95M
used_memory_rss:9568256
used_memory_rss_human:9.12M
used_memory_peak:2143616
used_memory_peak_human:2.04M
used_memory_peak_perc:95.24%
used_memory_overhead:1995264
used_memory_startup:803160
used_memory_dataset:46320
used_memory_dataset_perc:3.74%
allocator_allocated:2101520
allocator_active:2428928
allocator_resident:5189632
total_system_memory:8345645056
total_system_memory_human:7.77G
used_memory_lua:36864
used_memory_lua_human:36.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.16
allocator_frag_bytes:327408
allocator_rss_ratio:2.14
allocator_rss_bytes:2760704
rss_overhead_ratio:1.84
rss_overhead_bytes:4378624
mem_fragmentation_ratio:4.79
mem_fragmentation_bytes:7569184
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:41024
mem_clients_normal:102496
mem_aof_buffer:8
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1713845348
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:339968
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:93
aof_base_size:93
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
# Stats
total_connections_received:13
total_commands_processed:17403
instantaneous_ops_per_sec:3
total_net_input_bytes:841071
total_net_output_bytes:4686757
instantaneous_input_kbps:0.11
instantaneous_output_kbps:0.01
rejected_connections:0
sync_full:2
sync_partial_ok:6
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:58
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:349
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:17089
total_writes_processed:23546
io_threaded_reads_processed:0
io_threaded_writes_processed:0
# Replication
role:master
connected_slaves:2
slave0:ip=172.28.5.4,port=6379,state=online,offset=480313,lag=0
slave1:ip=172.28.5.3,port=6379,state=online,offset=480313,lag=0
master_replid:c092ef83b572c548e189eeae4c2cb6da8d64e616
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:480313
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:480313
# CPU
used_cpu_sys:7.974892
used_cpu_user:3.743266
used_cpu_sys_children:0.006901
used_cpu_user_children:0.002612
# Modules
# Cluster
cluster_enabled:0
# Keyspace
127.0.0.1:6379>
从上面的结果可以看到,当前master节点以Standalone mode在运行,其replication部分显示了只是个master节点,以及连接到master节点的两个slave节点,节点的IP、port、state和offset以及lag都可以看到。
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.28.5.4,port=6379,state=online,offset=540257,lag=0
slave1:ip=172.28.5.3,port=6379,state=online,offset=540257,lag=0
master_replid:c092ef83b572c548e189eeae4c2cb6da8d64e616
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:540257
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:540257
4.模拟故障转移
如果这个时候,master节点宕机了,那么哨兵会在判断master节点客观下线后,从slave节点中选举出新的master节点。
➜ ~ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0e6b49cc97f2 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 6379/tcp, 0.0.0.0:26380->26379/tcp redis-sentinel-2
0c3f8f8f4dbd redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 6379/tcp, 0.0.0.0:26379->26379/tcp redis-sentinel-1
951e5d81fde6 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6381->6379/tcp redis-slave-2
e3be7632c5f9 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6380->6379/tcp redis-slave-1
6473c293bda4 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp redis-master
➜ ~ docker stop 6473c293bda4
6473c293bda4
➜ ~ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0e6b49cc97f2 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 6379/tcp, 0.0.0.0:26380->26379/tcp redis-sentinel-2
0c3f8f8f4dbd redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 6379/tcp, 0.0.0.0:26379->26379/tcp redis-sentinel-1
951e5d81fde6 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6381->6379/tcp redis-slave-2
e3be7632c5f9 redis:6.0 "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6380->6379/tcp redis-slave-1
此时已经master节点宕机,这个时候的查看sentinel的日志,可以看到:
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.416 # +odown master redis-yy 172.28.5.2 6379 #quorum 2/2
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.416 # +new-epoch 1
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.416 # +try-failover master redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.421 # Could not rename tmp config file (Device or resource busy)
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.421 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.421 # +vote-for-leader ac58eb282da2eedff5843b98043caf42576dc273 1
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.427 # 7e1f67f1ed0cbc2763d8fa9b66913d3972d5c7c6 voted for ac58eb282da2eedff5843b98043caf42576dc273 1
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.493 # +elected-leader master redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.493 # +failover-state-select-slave master redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.584 # +selected-slave slave 172.28.5.3:6379 172.28.5.3 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.584 * +failover-state-send-slaveof-noone slave 172.28.5.3:6379 172.28.5.3 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.667 * +failover-state-wait-promotion slave 172.28.5.3:6379 172.28.5.3 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.951 # Could not rename tmp config file (Device or resource busy)
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.951 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.951 # +promoted-slave slave 172.28.5.3:6379 172.28.5.3 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:25 1:X 23 Apr 2024 07:40:25.951 # +failover-state-reconf-slaves master redis-yy 172.28.5.2 6379
2024-04-23 15:40:26 1:X 23 Apr 2024 07:40:26.012 * +slave-reconf-sent slave 172.28.5.4:6379 172.28.5.4 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:26 1:X 23 Apr 2024 07:40:26.514 # -odown master redis-yy 172.28.5.2 6379
2024-04-23 15:40:26 1:X 23 Apr 2024 07:40:26.970 * +slave-reconf-inprog slave 172.28.5.4:6379 172.28.5.4 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:26 1:X 23 Apr 2024 07:40:26.970 * +slave-reconf-done slave 172.28.5.4:6379 172.28.5.4 6379 @ redis-yy 172.28.5.2 6379
2024-04-23 15:40:27 1:X 23 Apr 2024 07:40:27.022 # +failover-end master redis-yy 172.28.5.2 6379
2024-04-23 15:40:27 1:X 23 Apr 2024 07:40:27.022 # +switch-master redis-yy 172.28.5.2 6379 172.28.5.3 6379
2024-04-23 15:40:27 1:X 23 Apr 2024 07:40:27.022 * +slave slave 172.28.5.4:6379 172.28.5.4 6379 @ redis-yy 172.28.5.3 6379
2024-04-23 15:40:27 1:X 23 Apr 2024 07:40:27.022 * +slave slave 172.28.5.2:6379 172.28.5.2 6379 @ redis-yy 172.28.5.3 6379
从日志中已经可以看到,master节点已经切换到172.28.5.3这个节点。那么登录节点确认一下:
# redis-cli
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.28.5.4,port=6379,state=online,offset=90354,lag=0
master_replid:3f4312a3678a4e771d6a9a2668728b7202198c05
master_replid2:459d6b396219637670fa0b38dfe7a4ecf4227f89
master_repl_offset:90354
second_repl_offset:24036
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:90354
127.0.0.1:6379>
可以确认,目前这个节点role已经是master了。目前只有一个slave节点。
此时,再重新启动172.28.5.2这个已经宕机的节点,它会再加入集群中,成为新的slave节点:
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.28.5.4,port=6379,state=online,offset=110283,lag=0
slave1:ip=172.28.5.2,port=6379,state=online,offset=110283,lag=0
master_replid:3f4312a3678a4e771d6a9a2668728b7202198c05
master_replid2:459d6b396219637670fa0b38dfe7a4ecf4227f89
master_repl_offset:110283
second_repl_offset:24036
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:110283
127.0.0.1:6379>