三黍运维命令
-
ceph tell osd.* injectargs '--osd_recovery_max_active 10'
ceph tell osd.* injectargs '--osd_max_backfills 4'
ceph tell osd.* injectargs '--osd_recovery_op_priority 15' 怎么查看默认值
-
查看 osd.0 的实时参数
ceph config show osd.0 osd_recovery_max_active
ceph config show osd.0 osd_max_backfills
ceph config show osd.0 osd_recovery_op_priority -
ceph config set osd osd_max_backfills 4
-
high-priority.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high
value: 10000
globalDefault: false
description: "用于核心业务的高优先级任务"
preemptionPolicy: PreemptLowerPriorityerror: error validating "priority.yaml": error validating data: ValidationError(PriorityClass): unknown field "name" in io.k8s.api.scheduling.v1.PriorityClass; if you choose to ignore these errors, turn validation off with --validate=false
-
-
ceph config get osd osd_max_backfills
ceph config get osd osd_recovery_max_active
ceph config get osd osd_recovery_op_priority -
moren:
bash-4.4$ ceph config get osd osd_max_backfills
1
bash-4.4$
bash-4.4$ ceph config get osd osd_recovery_max_active
0
bash-4.4$ ceph config get osd osd_recovery_op_priority
3 -
提高每个 OSD 允许的最大并发恢复操作数(默认通常是 3 或 5)
ceph config set osd osd_max_backfills 16
ceph config set osd osd_recovery_max_active 16提高恢复线程的优先级(值越小优先级越高,默认通常是 10)
ceph config set osd osd_recovery_op_priority 3
-
2026-05-18 01:13:11.562102 I | clusterdisruption-controller: all "host" failure domains: [node1 node2 node3 node5 node6 node7 node8]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:1972} {StateName:active+remapped+backfilling Count:104} {StateName:active+clean+scrubbing+deep Count:21}]"
-
csi-cephfsplugin-2hwcz csi-cephfsplugin-2zjmb csi-cephfsplugin-djtn9 csi-cephfsplugin-fpr72 csi-cephfsplugin-kltj4 csi-cephfsplugin-lz5gv csi-cephfsplugin-provisioner-7769f7b7fb-pk44w csi-cephfsplugin-provisioner-7769f7b7fb-zg99t csi-cephfsplugin-ptq9l csi-rbdplugin-8cb99 csi-rbdplugin-bl9vv csi-rbdplugin-dlxlg csi-rbdplugin-fc6r7 csi-rbdplugin-h4jl4 csi-rbdplugin-provisioner-6585465959-k9hr6 csi-rbdplugin-provisioner-6585465959-nwzxf csi-rbdplugin-v2sb8 csi-rbdplugin-w9skr
-
kubectl -n rook-ceph get cephcluster -o yaml | grep -A 5 -B 2 network
-
kubectl -n rook-ceph get pod -l app=rook-ceph-mon -o jsonpath='{.items[0].spec.hostNetwork}'
kubectl -n rook-ceph get pod -o wide -l app=rook-ceph-mon
kubectl -n rook-ceph logs -l app=rook-ceph-operator --tail=200 | grep -Ei "network|error|failed"
kubectl -n rook-ceph get events --sort-by='.metadata.creationTimestamp' | grep -i network
-
kubectl -n rook-ceph get cephcluster rook-ceph -o jsonpath='{.spec.network}'
kubectl -n rook-ceph describe cephcluster rook-ceph | grep -A 10 -i "Events:"
kubectl -n rook-ceph get cephcluster rook-ceph -o jsonpath='{.status.conditions}'
kubectl -n rook-ceph edit cephcluster rook-ceph
-
network:
provider: host
selectors:
public: "192.168.x.0/24" # 换成你 node1-node8 物理内网实际的 IP 段
cluster: "192.168.x.0/24" # 如果是单网卡,写一样的;双网卡写心跳专属网段 -
kubectl -n rook-ceph edit deployment rook-ceph-mon-bu
-
kubectl -n rook-ceph get deployment rook-ceph-mon-bu -o yaml | grep -E "hostNetwork|dnsPolicy"
-
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet -
kubectl -n rook-ceph get pod -A | grep osd-prepare | grep node1
kubectl -n rook-ceph get job | grep osd-prepare | grep node1
kubectl -n rook-ceph delete job -l app=rook-ceph-osd-prepare
watch "kubectl -n rook-ceph get pod -o wide | grep osd-prepare" -
kubectl -n rook-ceph delete job rook-ceph-osd-prepare-node1
-
2026-05-21 13:28:38.632651 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node1" with maxUnavailable=0 for "host" failure domain "node1"
2026-05-21 13:28:38.635906 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node2" with maxUnavailable=0 for "host" failure domain "node2"
2026-05-21 13:28:38.638890 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node5" with maxUnavailable=0 for "host" failure domain "node5"
2026-05-21 13:28:38.641451 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node6" with maxUnavailable=0 for "host" failure domain "node6"
2026-05-21 13:28:38.644184 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node7" with maxUnavailable=0 for "host" failure domain "node7"
2026-05-21 13:28:38.646663 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-host-node8" with maxUnavailable=0 for "host" failure domain "node8"