Bug #64997
openThere is always an osd process that takes up high cpu
0%
Files
Updated by Radoslaw Zarzynski 2 months ago
- Status changed from New to Need More Info
Note from bugscrub: need a summary here.
Updated by cao yong 9 days ago
- File 11111.png 11111.png added
- File clipboard-202405241516-pfz60.png clipboard-202405241516-pfz60.png added
- File clipboard-202405241517-qtxkf.png clipboard-202405241517-qtxkf.png added
- File clipboard-202405241520-s5a5n.png clipboard-202405241520-s5a5n.png added
- File clipboard-202405241521-1yzwa.png clipboard-202405241521-1yzwa.png added
- File clipboard-202405241521-dxpc1.png clipboard-202405241521-dxpc1.png added
Radoslaw Zarzynski wrote in #note-1:
Note from bugscrub: need a summary here.
Bug Report
There is always an osd process that takes up high cpu at a certain moment for about 4 months .
I found that this is an admin_socket process, which belongs to the osd pod(pid:2809092) .
[root@sc-node-ceph-4 ~]# pstree -apscl 3360113
systemd,1 --switched-root --system --deserialize 31
└─containerd-shim,1229280 -namespace k8s.io -id 712f8293ab1ecd3d5cc0e576efad6ed9f4e943ccabf14834fd98241e3363a988 -address /run/containerd/containerd.sock
└─ceph-osd,2809092 --foreground --id 29 --fsid aa0e7bed-d3e3-49a7-a471-8e354bfe61f6 --setuser ceph --setgroup ceph --crush-location=root=default host=sc-node-ceph-4 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false
└─admin_socket,3360113 --foreground --id 29 --fsid aa0e7bed-d3e3-49a7-a471-8e354bfe61f6 --setuser ceph --setgroup ceph --crush-location=root=default host=sc-node-ceph-4 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false
After the above osd pod is restarted, another osd pod will replace it at another moment and occupy high CPU. This phenomenon occurs again and again.
I found that the admin_socket process belonging to a same osd pod(pid:2809092) is also constantly restarting.
[root@sc-node-ceph-4 ~]# pstree -apscl 3364029
systemd,1 --switched-root --system --deserialize 31
└─containerd-shim,1229280 -namespace k8s.io -id 712f8293ab1ecd3d5cc0e576efad6ed9f4e943ccabf14834fd98241e3363a988 -address /run/containerd/containerd.sock
└─ceph-osd,2809092 --foreground --id 29 --fsid aa0e7bed-d3e3-49a7-a471-8e354bfe61f6 --setuser ceph --setgroup ceph --crush-location=root=default host=sc-node-ceph-4 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false
└─admin_socket,3364029 --foreground --id 29 --fsid aa0e7bed-d3e3-49a7-a471-8e354bfe61f6 --setuser ceph --setgroup ceph --crush-location=root=default host=sc-node-ceph-4 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true --default-log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false
The status of ceph cluster is HEALTH.
✘ ⚡ root@sc-master-1 ~/rooks kubectl -n rook-ceph exec -it rook-ceph-tools-7d4b5bb689-k5tvp -- /bin/bash
bash-4.4$ ceph -s
cluster:
id: aa0e7bed-d3e3-49a7-a471-8e354bfe61f6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,e,j (age 7h)
mgr: b(active, since 6w), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 36 osds: 36 up (since 7h), 36 in (since 3M)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 649 pgs
objects: 37.40M objects, 16 TiB
usage: 47 TiB used, 84 TiB / 131 TiB avail
pgs: 647 active+clean
2 active+clean+scrubbing+deep
io:
client: 7.9 KiB/s rd, 1.9 MiB/s wr, 3 op/s rd, 137 op/s wr
Deviation from expected behavior:
There is always an osd process that takes up high cpu and memory .And these osd processes appear alternately with restarts.
Expected behavior:
This strange admin-socket process should not occur and should not cause the osd pod to keep restarting
Environment:
OS (e.g. from /etc/os-release):
⚡ root@sc-master-1 ~/rooks/rook-1.12.10/deploy/examples cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.2 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.2 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.2"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
Kernel (e.g. uname -a):
⚡ root@sc-master-1 ~/rooks/rook-1.12.10/deploy/examples uname -a
Linux sc-master-1 6.6.2-1.el9.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 20 12:18:26 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
Rook version (use rook version inside of a Rook Pod): V1.12.10
Storage backend version (e.g. for ceph do ceph -v):
bash-4.4$ ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
Kubernetes version (use kubectl version):
⚡ root@sc-master-1 ~/rooks/rook-1.12.10/deploy/examples kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.14", GitCommit:"a5967a3c4d0f33469b7e7798c9ee548f71455222", GitTreeState:"clean", BuildDate:"2023-09-13T09:12:09Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.14", GitCommit:"a5967a3c4d0f33469b7e7798c9ee548f71455222", GitTreeState:"clean", BuildDate:"2023-09-13T09:04:55Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}