2025-03-29 14:18:49.220928 I | ceph-cluster-controller: cluster "rook-ceph": version "19.2.1-0 squid" detected for image "192.168.1.100:8000/public/ceph:v19.2.1" 2025-03-29 14:18:49.240952 E | ceph-spec: failed to update cluster condition to {Type:Progressing Status:True Reason:ClusterProgressing Message:Configuring the Ceph cluster LastHeartbeatTime:2025-03-29 14:18:49.235454089 +0000 UTC m=+312.957685966 LastTransitionTime:2025-03-29 14:18:49.235454018 +0000 UTC m=+312.957685899}. failed to update object "rook-ceph/rook-ceph" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "rook-ceph": the object has been modified; please apply your changes to the latest version and try again 2025-03-29 14:18:49.249175 I | ceph-cluster-controller: created placeholder configmap for ceph overrides "rook-config-override"

2025-03-29 10:14:00.654114 I | op-mgr: deployment for mgr rook-ceph-mgr-a already exists. updating if needed 2025-03-29 10:14:00.660132 I | op-k8sutil: deployment "rook-ceph-mgr-a" did not change, nothing to update 2025-03-29 10:14:00.660179 I | cephclient: getting or creating ceph auth key "mgr.b" 2025-03-29 10:14:00.918206 I | op-mgr: deployment for mgr rook-ceph-mgr-b already exists. updating if needed 2025-03-29 10:14:00.922071 I | op-k8sutil: deployment "rook-ceph-mgr-b" did not change, nothing to update 2025-03-29 10:14:02.012907 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph mgr: failed to enable mgr services: failed to enable service monitor: service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-mgr" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "rook-ceph" 2025-03-29 10:14:02.052359 E | ceph-spec: failed to update cluster condition to {Type:Ready Status:True Reason:ClusterCreated Message:Cluster created successfully LastHeartbeatTime:2025-03-29 10:14:02.048230153 +0000 UTC m=+9812.323876586 LastTransitionTime:2025-03-29 08:40:15 +0000 UTC}. failed to update object "rook-ceph/rook-ceph" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "rook-ceph": the object has been modified; please apply your changes to the latest version and try again 2025-03-29 10:25:15.419892 E | ceph-nodedaemon-controller: node reconcile failed: failed to enable service monitor: service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-exporter" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "rook-ceph" 2025-03-29 10:25:16.017269 E | ceph-nodedaemon-controller: node reconcile failed: failed to enable service monitor: service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-exporter" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "rook-ceph" 2025-03-29 10:25:18.622406 E | ceph-nodedaemon-controller: node reconcile failed: failed to enable service monitor: service monitor could not be enabled: failed to retrieve servicemonitor. servicemonitors.monitoring.coreos.com "rook-ceph-exporter" is forbidden: User "system:serviceaccount:rook-ceph:rook-ceph-system" cannot get resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "rook-ceph"

另外，用户可能需要检查Rook的配置，特别是CephCluster CRD中的spec.monitoring部分是否启用了ServiceMonitor，并且相关的Prometheus监控配置是否正确。如果配置正确，但权限不足，仍然会导致问题。还要考虑...

2025-03-30 12:29:47.604551 W | cephclient: skipping adding mon "a" to config file, detected out of quorum 2025-03-30 12:29:47.605972 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config 2025-03-30 12:29:47.606069 I | cephclient: generated admin config in /var/lib/rook/rook-ceph 2025-03-30 12:29:47.609578 W | op-mon: mon "a" not found in quorum, waiting for timeout (599 seconds left) before failover 2025-03-30 12:30:47.662795 W | op-mon: failed to check mon health. failed to get mon quorum status: mon quorum status failed: exit status 1 2025-03-30 12:31:18.589135 E | ceph-cluster-controller: failed to get ceph status. failed to get status. . timed out: exit status 1 为什么我每次用rook安装ceph都超时。我的是云主机，4核8G，三节点结群

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph quorum_status 2. 查看Monitor日志： bash kubectl -n rook-ceph logs <mon-pod-name> -c mon 3. 检查OSD部署情况： bash ...

2025-03-30 02:44:40.877806 E | ceph-spec: failed to update cluster condition to {Type:Ready Status:True Reason:ClusterCreated Message:Cluster created successfully LastHeartbeatTime:2025-03-30 02:44:40.873836668 +0000 UTC m=+107.935380607 LastTransitionTime:2025-03-30 02:44:40.873836622 +0000 UTC m=+107.935380571}. failed to update object "rook-ceph/rook-ceph" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "rook-ceph": the object has been modified; please apply your changes to the latest version and try again

好的，用户遇到了一个关于Rook Ceph集群部署的错误信息：“2025-03-30 02:44:40.877806 E | ceph-spec: failed to update cluster condition to... Operation cannot be fulfilled on cephclusters.ceph.rook.io ...

debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn e: 'ceph-mgr' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 0: 'ceph-mgr' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 1: '--fsid=eca52225-61f8-471f-a7ac-86c22e557a82' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 2: '--keyring=/etc/ceph/keyring-store/keyring' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 3: '--default-log-to-stderr=true' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 4: '--default-err-to-stderr=true' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 5: '--default-mon-cluster-log-to-stderr=true' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 6: '--default-log-stderr-prefix=debug ' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 7: '--default-log-to-file=false' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 8: '--default-mon-cluster-log-to-file=false' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 9: '--mon-host=[v2:10.96.165.248:3300,v1:10.96.165.248:6789],[v2:10.96.239.160:3300,v1:10.96.239.160:6789],[v2:10.96.131.95:3300,v1:10.96.131.95:6789]' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 10: '--mon-initial-members=c,a,b' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 11: '--id=b' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 12: '--setuser=ceph' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 13: '--setgroup=ceph' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 14: '--client-mount-uid=0' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 15: '--client-mount-gid=0' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 16: '--foreground' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn 17: '--public-addr=172.16.85.241' debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn respawning with exe /usr/bin/ceph-mgr debug 2025-03-29T13:11:09.128+0000 7f9ac5d04640 1 mgr respawn exe_path /proc/self/exe ignoring --setuser ceph since I am not root ignoring --setgroup ceph since I am not root

gdb /usr/bin/ceph-mgr /tmp/core.ceph-mgr.12345 --- ### **六、关联性思考** 您之前提到的ceph-mon高CPU使用可能与当前mgr问题存在关联： - **集群健康度下降**会导致监控组件频繁重试 - 建议同时检查： ...

[root@master01 rbd]# kubectl describe cephblockpool replicapool -n rook-ceph Name: replicapool Namespace: rook-ceph Labels: <none> Annotations: <none> API Version: ceph.rook.io/v1 Kind: CephBlockPool Metadata: Creation Timestamp: 2025-03-27T15:27:47Z Generation: 1 Resource Version: 23968 UID: ce6b7aeb-f8c8-4f2c-9616-ef9340b8576e Spec: Failure Domain: osd Replicated: Require Safe Replica Size: false Size: 2 Events: <none> [root@master01 rbd]# kubectl describe cephblockpool replicapool -n rook-ceph Name: replicapool Namespace: rook-ceph Labels: <none> Annotations: <none> API Version: ceph.rook.io/v1 Kind: CephBlockPool Metadata: Creation Timestamp: 2025-03-27T15:27:47Z Generation: 1 Resource Version: 23968 UID: ce6b7aeb-f8c8-4f2c-9616-ef9340b8576e Spec: Failure Domain: osd Replicated: Require Safe Replica Size: false Size: 2 Events: <none> [root@master01 rbd]# kubectl get cephblockpool -n rook-ceph NAME PHASE TYPE FAILUREDOMAIN AGE replicapool 2m12s

kubectl logs -n rook-ceph -l app=rook-ceph-operator --tail=200 | grep -i "block pool" 典型错误日志模式： text failed to create pool: [ERR] ... # 具体错误描述 --- ### 五、单节点优化参数...

[root@dai23 examples]# kubectl logs -n rook-ceph rook-ceph-osd-prepare-dai25-9pxp8 025-03-12 01:15:33.260168 W | inventory: uuid not found for device /dev/sdb. output=Creating new GPT entries in memory. Disk /dev/sdb: 83886080 sectors, 40.0 GiB Model: VMware Virtual S Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 4754F882-F0A5-401C-8D25-C23ADA95B6C5 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 83886046 Partitions will be aligned on 2048-sector boundaries Total free space is 83886013 sectors (40.0 GiB) Number Start (sector) End (sector) Size Code Name 2025-03-12 01:15:33.260287 I | cephosd: setting device type "hdd" for device "/dev/sdb" 2025-03-12 01:15:33.260314 I | cephosd: setting device class "hdd" for device "/dev/sdb" 2025-03-12 01:15:33.260384 I | cephosd: 1 ceph-volume raw osd devices configured on this node 2025-03-12 01:15:33.260430 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:6a9f8512-fdce-4583-ac9e-e23f01e44fd1 DevicePartUUID: DeviceClass:hdd BlockPath:/dev/sdb MetadataPath: WalPath: SkipLVRelease:true Location:root=default host=dai25 LVBackedPV:false CVMode:raw Store:bluestore TopologyAffinity: Encrypted:false ExportService:false NodeName: PVCName: DeviceType:hdd}] [root@dai23 examples]# kubectl get CephCluster -n rook-ceph NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID rook-ceph /var/lib/rook 2 63m Ready Cluster created successfully HEALTH_WARN 77b65090-25cf-4ff6-a840-c0419041c0e4

- **问题3：CephCluster状态为Error但PV仍可用** - **说明**：可能部分OSD异常但整体集群仍维持最小可用状态。 - **恢复**：重启Ceph Operator尝试重新部署： bash kubectl rollout restart deploy rook-...

[root@k8s-master01 examples]# cat dashboard-external-http.yaml apiVersion: v1 kind: Service metadata: name: rook-ceph-mgr-dashboard-external-http namespace: rook-ceph # namespace:cluster labels: app: rook-ceph-mgr rook_cluster: rook-ceph # namespace:cluster spec: ports: - name: dashboard port: 7000 protocol: TCP targetPort: 7000 selector: app: rook-ceph-mgr mgr_role: active rook_cluster: rook-ceph # namespace:cluster sessionAffinity: None type: NodePort 看rook安装目录里的文件就很清楚能知道你的理解错误

关键点在于选择器（selector）部分，包含了app: rook-ceph-mgr、mgr_role: active和rook_cluster: rook-ceph。这说明Rook在这里确实使用了mgr_role: active标签来筛选Active的MGR Pod。这与我之前的回答有...

2025-05-06T08:19:13.985+0000 7f0123733640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory 2025-05-06T08:19:13.985+0000 7f0123733640 -1 AuthRegistry(0x7f0123732000) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx

ceph-mon -i <节点名> --inject-monmap /tmp/monmap # 注入损坏节点 2. **重新部署 mon 节点** bash ceph-deploy mon destroy <节点名> # 清理旧节点 ceph-deploy mon create <节点名> # 重新创建 ...

2025-03-29 14:24:23.248871 W | op-mon: failed to check mon health. failed to get mon quorum status: mon quorum status failed: exit status 1

2025-03-29 14:24:23.248871 W | op-mon: failed to check mon health. failed to get mon quorum status: mon quorum status failed: exit status 1 首先，我需要理解这个日志的结构和内容。时间戳是2025年，不过...

debug 2025-03-30T03:55:55.692+0000 7f8d05834640 0 log_channel(cluster) log [DBG] : fsmap debug 2025-03-30T03:55:55.692+0000 7f8d05834640 0 log_channel(cluster) log [DBG] : osdmap e13: 3 total, 3 up, 3 in debug 2025-03-30T03:55:55.693+0000 7f8d05834640 0 log_channel(cluster) log [DBG] : mgrmap e26: b(active, since 63m), standbys: a debug 2025-03-30T03:55:55.693+0000 7f8d05834640 0 log_channel(cluster) log [INF] : Health check cleared: MON_DOWN (was: 1/3 mons down, quorum b,c) debug 2025-03-30T03:55:55.693+0000 7f8d05834640 0 log_channel(cluster) log [INF] : Cluster is now healthy debug 2025-03-30T03:55:55.709+0000 7f8d08039640 0 mon.a@0(leader) e3 handle_command mon_command({"prefix": "osd pool create", "format": "json", "pool": ".mgr", "pg_num": 1, "pg_num_min": 1, "pg_num_max": 32, "yes_i_really_mean_it": true} v 0) debug 2025-03-30T03:55:55.709+0000 7f8d08039640 0 log_channel(audit) log [INF] : from='mgr.14324 ' entity='mgr.b' cmd=[{"prefix": "osd pool create", "format": "json", "pool": ".mgr", "pg_num": 1, "pg_num_min": 1, "pg_num_max": 32, "yes_i_really_mean_it": true}]: dispatch

用户之前的问题是关于Rook/Ceph无法发现磁盘，现在可能是在调试过程中遇到了新的日志信息，需要理解这些日志的含义，以确定集群的状态或者问题所在。首先，我需要解析每一行日志的关键信息。第一行是fsmap的调试...

Last metadata expiration check: 0:00:14 ago on Sat 29 Mar 2025 08:37:13 PM CST. Error: Problem: conflicting requests - nothing provides libcrypto.so.10()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libtcmalloc.so.4()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libcrypto.so.10(libcrypto.so.10)(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libleveldb.so.1()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides liboath.so.0()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides /usr/bin/python3.6 needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libcrypto.so.10(OPENSSL_1.0.1_EC)(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides liboath.so.0(LIBOATH_1.10.0)(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides liboath.so.0(LIBOATH_1.2.0)(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libform.so.5()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libncurses.so.5()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph - nothing provides libtinfo.so.5()(64bit) needed by ceph-common-2:15.2.17-0.el7.x86_64 from Ceph (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

好的，用户现在遇到了在安装ceph-common时出现的依赖错误，需要解决。根据之前的对话，用户已经按照步骤添加了Ceph仓库并尝试安装，但遇到了多个依赖缺失的问题。现在需要分析错误信息，找出原因并提供解决方案。 ...

[root@k8s-master01 examples]# kubectl -n rook-ceph describe pod/rook-ceph-mon-a-5ffc55d9fc-xpthg |grep -A 5 "limit" POD_MEMORY_LIMIT: node allocatable (limits.memory) POD_MEMORY_REQUEST: 0 (requests.memory) POD_CPU_LIMIT: node allocatable (limits.cpu) POD_CPU_REQUEST: 0 (requests.cpu) CEPH_USE_RANDOM_NONCE: true ROOK_MSGR2: msgr2_false_encryption_false_compression_false ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false -- POD_MEMORY_LIMIT: node allocatable (limits.memory) POD_MEMORY_REQUEST: 0 (requests.memory) POD_CPU_LIMIT: node allocatable (limits.cpu) POD_CPU_REQUEST: 0 (requests.cpu) CEPH_USE_RANDOM_NONCE: true ROOK_MSGR2: msgr2_false_encryption_false_compression_false ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false

kubectl -n rook-ceph describe pod/rook-ceph-mon-a-5ffc55d9fc-xpthg | grep -A 5 "Resources" **期望输出示例**： Resources: Limits: cpu: 2 memory: 4Gi Requests: cpu: 1 memory: 1Gi 若...

2025-03-13 09:48:52.695 7f8dd00a7700 0 bluestore(/var/lib/ceph/osd/ceph-307) log_latency_fn slow operation observed for _do_read, latency = 6.29419s, num_ios = 61440 2025-03-13 09:48:52.696 7f8dd00a7700 0 bluestore(/var/lib/ceph/osd/ceph-307) log_latency slow operation observed for read, latency = 6.29511s

路径是/var/lib/ceph/osd/ceph-307，说明是OSD 307的日志。关键错误是“slow operation observed for _do_read”和“read”，延迟分别是6.29秒左右。这明显是读取操作过慢的警告。接下来，我需要回忆一下Ceph ...

NAME ENDPOINTS AGE endpoints/nginx 172.16.58.210:80 138m endpoints/rook-ceph-exporter 172.16.32.143:9926,172.16.58.222:9926,172.16.85.223:9926 173m endpoints/rook-ceph-mgr <none> 173m endpoints/rook-ceph-mgr-dashboard <none> 173m endpoints/rook-ceph-mgr-dashboard-external-http <none> 13m endpoints/rook-ceph-mgr-dashboard-np <none> 165m endpoints/rook-ceph-mon-a <none> 176m endpoints/rook-ceph-mon-b 172.16.58.201:6789,172.16.58.201:3300 175m endpoints/rook-ceph-mon-c 172.16.85.200:6789,172.16.85.200:3300 173m endpoints/rook-ceph-mon-d 172.16.32.150:6789,172.16.32.150:3300 22m 为什么我mgr的ep会丢失podip

在Ceph集群（尤其是通过Rook部署的场景）中，MGR的Endpoint（rook-ceph-mgr）出现<none>表示Kubernetes未能正确关联Pod IP到Service。以下是可能原因及排查步骤： --- ### 一、**直接原因** 当ENDPOINTS列为...

404 - Not Found Could not reach Prometheus's API on http://ceph-node1:9095/api/v1 7/20/25 10:49:50 PM [root@ceph-node1 ~]# curl http://localhost:9095/api/v1/ 404 page not found [root@ceph-node1 ~]# curl http://localhost:9095/api/v1/ 404 page not found [root@ceph-node1 ~]# curl -v http://localhost:9095/api/v1/labels * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 9095 (#0) > GET /api/v1/labels HTTP/1.1 > Host: localhost:9095 > User-Agent: curl/7.61.1 > Accept: / > < HTTP/1.1 200 OK < Content-Type: application/json < Date: Sun, 20 Jul 2025 14:43:55 GMT < Content-Length: 824 < * Connection #0 to host localhost left intact {"status":"success","data":["name","address","alertname","alertstate","bios_date","bios_vendor","bios_version","board_name","board_vendor","board_version","branch","broadcast","cause","ceph_daemon","ceph_version","chassis_asset_tag","chassis_vendor","chassis_version","clocksource","cluster_addr","code","collector","compression_mode","cpu","description","device","device_class","device_ids","devices","domainname","duplex","fstype","goversion","hostname","id","id_like","instance","ip","job","machine","major","method","minor","mode","mountpoint","name","nodename","objectstore","oid","operstate","pool_id","poolid","power_supply","pretty_name","product_name","product_version","public_addr","quantile","queue","rank","release","revision","severity","sysname","system_vendor","time_zone","type","version","version_id"]}[root@ceph-node1 ~]# curl -v http://localhost:9095/api/v1/query?query=up * Trying ::1... * TCP_NODELAY set * Connected to localhost (::1) port 9095 (#0) > GET /api/v1/query?query=up HTTP/1.1 > Host: localhost:9095 > User-Agent: curl/7.61.1 > Accept: / > < HTTP/1.1 200 OK < Content-Type: application/json < Date: Sun, 20 Jul 2025 14:44:21 GMT < Content-Length: 656 < * Connection #0 to host localhost left intact {"status":"success","data":{"resultType":"vector","result":[{"metric":{"name":"up","instance":"192.168.1.120:9283","job":"ceph"},"value":[1753022661.748,"1"]},{"metric":{"name":"up","instance":"192.168.1.121:9283","job":"ceph"},"value":[1753022661.748,"1"]},{"metric":{"name":"up","instance":"192.168.1.122:9283","job":"ceph"},"value":[1753022661.748,"1"]},{"metric":{"name":"up","instance":"ceph-node1","job":"node"},"value":[1753022661.748,"1"]},{"metric":{"name":"up","instance":"ceph-node2","job":"node"},"value":[1753022661.748,"1"]},{"metric":{"name":"up","instance":"ceph-node3","job":"node"},"value":[1753022661.748,"1"]}]}}[root@ceph-node1 ~]# [root@ceph-node1 ~]# [root@ceph-node1 ~]# curl http://ceph-node1:9095/api/v1/labels {"status":"success","data":["name","address","alertname","alertstate","bios_date","bios_vendor","bios_version","board_name","board_vendor","board_version","branch","broadcast","cause","ceph_daemon","ceph_version","chassis_asset_tag","chassis_vendor","chassis_version","clocksource","cluster_addr","code","collector","compression_mode","cpu","description","device","device_class","device_ids","devices","domainname","duplex","fstype","goversion","hostname","id","id_like","instance","ip","job","machine","major","method","minor","mode","mountpoint","name","nodename","objectstore","oid","operstate","pool_id","poolid","power_supply","pretty_name","product_name","product_version","public_addr","quantile","queue","rank","release","revision","severity","sysname","system_vendor","time_zone","type","version","version_id"]}[root@ceph-node1 ~]# curl http://ceph-node1:9095/api/v1/query?query=up {"status":"success","data":{"resultType":"vector","result":[{"metric":{"name":"up","instance":"192.168.1.120:9283","job":"ceph"},"value":[1753022801.488,"1"]},{"metric":{"name":"up","instance":"192.168.1.121:9283","job":"ceph"},"value":[1753022801.488,"1"]},{"metric":{"name":"up","instance":"192.168.1.122:9283","job":"ceph"},"value":[1753022801.488,"1"]},{"metric":{"name":"up","instance":"ceph-node1","job":"node"},"value":[1753022801.488,"1"]},{"metric":{"name":"up","instance":"ceph-node2","job":"node"},"value":[1753022801.488,"1"]},{"metric":{"name":"up","instance":"ceph-node3","job":"node"},"value":[1753022801.488,"1"]}]}}[root@ceph-node1 ~]# [root@ceph-node1 ~]# podman logs 4b86647c61ac ts=2025-07-20T12:20:19.199Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1753003219021 maxt=1753005600000 ulid=01K0KWE92SF5BJ1R7R93QA4B3A duration=38.265327ms ts=2025-07-20T12:20:19.204Z caller=head.go:827 level=info component=tsdb msg="Head GC completed" duration=4.361588ms ts=2025-07-20T13:00:02.770Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1753005601290 maxt=1753012800000 ulid=01K0KYQ0QQ6NSPJ01V9NC4H2G6 duration=90.476581ms ts=2025-07-20T13:00:02.774Z caller=head.go:827 level=info component=tsdb msg="Head GC completed" duration=2.258296ms [root@ceph-node1 ~]# podman exec -it 4b86647c61ac sh /prometheus $ ss -tulnp | grep 9095 sh: ss: not found /prometheus $ cat /etc/prometheus/prometheus.yml # This file is generated by cephadm. global: scrape_interval: 10s evaluation_interval: 10s rule_files: - /etc/prometheus/alerting/* alerting: alertmanagers: - scheme: http static_configs: - targets: ['ceph-node1:9093'] scrape_configs: - job_name: 'ceph' honor_labels: true static_configs: - targets: - '192.168.1.120:9283' - '192.168.1.121:9283' - '192.168.1.122:9283' - job_name: 'node' static_configs: - targets: ['ceph-node1:9100'] labels: instance: 'ceph-node1' - targets: ['192.168.1.121:9100'] labels: instance: 'ceph-node2' - targets: ['192.168.1.122:9100'] labels: instance: 'ceph-node3'

我们遇到了Prometheus API访问404的问题，但通过curl测试发现： ...ts=2025-07-20T13:00:02.770Z msg="write block" ... duration=90.476581ms 表明 Prometheus 数据存储功能正常，问题仅在于 API 路径使用错误。

[root@k8s-node01 ~]# ps aux|grep 112631 167 112631 98.2 0.2 264696 22460 ? R 19:33 6:02 ceph-mon --fsid=90a4d57c-012f-4f5b-986e-8299ba8abdc3 --keyring=/etc/ceph/keyring-store/keyring --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:10.96.42.228:3300,v1:10.96.42.228:6789],[v2:10.96.30.179:3300,v1:10.96.30.179:6789],[v2:10.96.55.215:3300,v1:10.96.55.215:6789] --mon-initial-members=c,a,b --id=b --setuser=ceph --setgroup=ceph --foreground --public-addr=10.96.55.215 --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db --public-bind-addr=172.16.85.253 root 116872 0.0 0.0 6408 2304 pts/0 S+ 19:39 0:00 grep --color=auto 112631 什么进程

kubectl logs -n rook-ceph rook-ceph-mon-b-xxx -c mon | grep -i error #### **步骤3：资源监控** bash # 查看节点整体负载 top -p 112631 # 动态观察该进程的CPU/内存变化 iotop -o # 检查磁盘IO情况 ...

kubectl create -f crds.yaml -f common.yaml 报错：error: resource mapping not found for name: “00-rook-privileged” namespace: “” from “common.yaml”: no matches for kind “PodSecurityPolicy” in version “policy/v1beta1”

name: rook-ceph-cluster-mgmt rules: - apiGroups: [""] resources: ["secrets", "pods", "configmaps", "services"] verbs: [ "get", "list", "watch", "create", "update", "delete" ] # ...保留其他非PSP...

相关推荐

云计算基于kolla-ansible的OpenStack部署方案：含Ceph存储集成与多节点配置详解

Ceph-Pi：Raspberry Pi上的CEPH存储群集。 安装表

如何调试使⽤ceph-deploy部署的ceph集群.pdf

2025-03-29 14:24:23.248871 W | op-mon: failed to check mon health. failed to get mon quorum status: mon quorum status failed: exit status 1

kubectl create -f crds.yaml -f common.yaml 报错：error: resource mapping not found for name: “00-rook-privileged” namespace: “” from “common.yaml”: no matches for kind “PodSecurityPolicy” in version “policy/v1beta1”

大家在看

oracle11g oci.dll 64位

Intel Huron River Platform development guide

PT-1000.rar_arduino_pt100_pt1000

5种方法解除开机密码

zemax安装包

最新推荐

PVE 6 离线安装CEPH-Nautilus.docx

【税会实务】Excel文字输入技巧.doc

C++实现的DecompressLibrary库解压缩GZ文件

【数据融合技术】：甘肃土壤类型空间分析中的专业性应用

VM ware如何查看软件版本信息

数据库课程设计报告：常用数据库综述

【空间分布规律】：甘肃土壤类型与农业生产的关联性研究

在halcon中，卡尺测量和二维测量谁的精度高

掌握牛顿法解方程：切线与割线的程序应用

【制图技术】：甘肃高质量土壤分布TIF图件的成图策略

Ceph-Pi：Raspberry Pi上的CEPH存储群集。安装表