维护ceph集群

1. set: 设置标志位

# ceph osd set <flag_name>
# ceph osd set noout
# ceph osd set nodown
# ceph osd set norecover

2. unset: 清除标志位

# ceph osd unset <flag_name>
# ceph osd unset noout
# ceph osd unset nodown
# ceph osd unset norecover

3. 标志位

noout: 该标志位将使 ceph 集群不会将任何 OSD 标记为 out(集群外)，无论其实际状态如何。这将会把所有的 OSD 保留在 ceph 集群中。

nodown: 该标志位将使得 ceph 集群不会将任何 OSD 标记为 down(服务已停止)，无论其实际状态如何。这将会使集群中的所有 OSD 保持 UP(服务运行中)状态，面不会是 DOWN 状态。

noup:

4. "Too many repaired reads on 1 OSDs" 告警处理

# ceph -s
  cluster:
    id:     dfcdf8de-f388-4c84-adc2-ee721da8df84
    health: HEALTH_WARN
            nodeep-scrub flag(s) set
            Too many repaired reads on 1 OSDs
            3 pgs not deep-scrubbed in time
 
  services:
    mon: 1 daemons, quorum server (age 9w)
    mgr: server(active, since 9w)
    osd: 4 osds: 3 up (since 9w), 3 in (since 4M)
         flags nodeep-scrub
 
  data:
    pools:   3 pools, 320 pgs
    objects: 155.80k objects, 605 GiB
    usage:   1.2 TiB used, 3.4 TiB / 4.6 TiB avail
    pgs:     320 active+clean
 
  io:
    client:   0 B/s rd, 19 KiB/s wr, 1 op/s rd, 1 op/s wr

# ceph health detail 
HEALTH_WARN nodeep-scrub flag(s) set; Too many repaired reads on 1 OSDs; 3 pgs not deep-scrubbed in time
OSDMAP_FLAGS nodeep-scrub flag(s) set
OSD_TOO_MANY_REPAIRS Too many repaired reads on 1 OSDs
    osd.3 had 13 reads repaired
PG_NOT_DEEP_SCRUBBED 3 pgs not deep-scrubbed in time
    pg 4.48 not deep-scrubbed since 2025-02-05 07:05:25.334392
    pg 5.44 not deep-scrubbed since 2025-02-05 07:33:33.616573
    pg 6.29 not deep-scrubbed since 2025-02-05 01:01:12.492269