Actions
Bug #56239
closedcrash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample)
% Done:
0%
Source:
Telemetry
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
372a820cbfc5af971785d9b6af2a345a1670c04429583dc564e357c04a53cf64
8f6cf6368e0ca8ac93beab8b45a0d5013805b9ef39286850ba17798f822e180c
d0ea52fbf30312347be61ce51cd1f6c5483dfaba1767a0eb62791d1f194f3381
ef43174c3be0e2b9ccb951f18b2301de313327d53325698fe20fbd29db555a38
2364791fa429f484e2ac788d520a6c4752a9e95983682b39f621373401ca0734
Crash signature (v2):
Description
Sanitized backtrace:
File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) File "mgr/devicehealth/module.py", in _get_device_metrics: with self._db_lock, self.db: File "mgr/mgr_module.py", in db: raise MgrDBNotReady();
Crash dump sample:
{ "archived": "2022-06-19 10:32:56.076950", "backtrace": [ " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 764, in get_recent_device_metrics\n return self._get_device_metrics(devid, min_sample=min_sample)", " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 553, in _get_device_metrics\n with self._db_lock, self.db:", " File \"/usr/share/ceph/mgr/mgr_module.py\", line 1203, in db\n raise MgrDBNotReady();", "<redacted>" ], "ceph_version": "17.2.0", "crash_id": "2022-06-18T19:09:19.112675Z_db7d5934-7e5a-4ee8-908e-4ee606f9dd1c", "entity_name": "mgr.8db3d30b2fe0f2dc446f5bc8b03f08b697cf9f58", "mgr_module": "devicehealth", "mgr_module_caller": "ActivePyModule::dispatch_remote get_recent_device_metrics", "mgr_python_exception": "MgrDBNotReady", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "bb14694bacd8d2b1a934cf4a3f4a27f50f27e160354c2f796b64991db731505e", "timestamp": "2022-06-18T19:09:19.112675Z", "utsname_machine": "x86_64", "utsname_release": "5.15.0-39-generic", "utsname_sysname": "Linux", "utsname_version": "#42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022" }
Files
Updated by Telemetry Bot almost 2 years ago
Updated by Telemetry Bot almost 2 years ago
- Crash signature (v1) updated (diff)
- Affected Versions v17.2.1, v17.2.2 added
Updated by Telemetry Bot about 1 year ago
- Crash signature (v1) updated (diff)
- Affected Versions v17.2.3, v17.2.4, v17.2.5, v17.2.6 added
Updated by Laura Flores about 1 year ago
- Crash signature (v1) updated (diff)
Happened in the gibba cluster:
[lflores@gibba001 ~]$ sudo ceph -s
cluster:
id: 5363501e-fdf2-11ed-bac8-3cecef3d8fb8
health: HEALTH_WARN
1 pool(s) do not have an application enabled
1 mgr modules have recently crashed
services:
mon: 5 daemons, quorum gibba001,gibba002,gibba005,gibba003,gibba004 (age 38h)
mgr: gibba006.afdywy(active, since 38h), standbys: gibba008.nemumh
osd: 62 osds: 62 up (since 38h), 62 in (since 38h); 18 remapped pgs
rgw: 6 daemons active (6 hosts, 1 zones)
data:
pools: 6 pools, 257 pgs
objects: 83.37M objects, 318 GiB
usage: 1.1 TiB used, 9.4 TiB / 11 TiB avail
pgs: 20809893/250109739 objects misplaced (8.320%)
239 active+clean
18 active+remapped+backfilling
io:
client: 63 KiB/s rd, 0 B/s wr, 63 op/s rd, 42 op/s wr
recovery: 1.0 MiB/s, 266 objects/s
progress:
Global Recovery Event (0s)
[............................]
[lflores@gibba001 ~]$ sudo ceph health detail
HEALTH_WARN 1 pool(s) do not have an application enabled; 1 mgr modules have recently crashed
[WRN] POOL_APP_NOT_ENABLED: 1 pool(s) do not have an application enabled
application not enabled on pool 'foo'
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
[WRN] RECENT_MGR_MODULE_CRASH: 1 mgr modules have recently crashed
mgr module devicehealth crashed in daemon mgr.gibba001.nkuepu on host gibba001 at 2023-05-29T07:32:20.873598Z
[lflores@gibba001 ~]$ sudo ceph crash info 2023-05-29T07:32:20.873598Z_0465ae2d-0220-4d9b-9ef8-debf2e6a5d70
{
"backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 764, in get_recent_device_metrics\n return self._get_device_metrics(devid, min_sample=min_sample)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 553, in _get_device_metrics\n with self._db_lock, self.db:",
" File \"/usr/share/ceph/mgr/mgr_module.py\", line 1233, in db\n raise MgrDBNotReady();",
"mgr_module.MgrDBNotReady"
],
"ceph_version": "17.2.6",
"crash_id": "2023-05-29T07:32:20.873598Z_0465ae2d-0220-4d9b-9ef8-debf2e6a5d70",
"entity_name": "mgr.gibba001.nkuepu",
"mgr_module": "devicehealth",
"mgr_module_caller": "ActivePyModule::dispatch_remote get_recent_device_metrics",
"mgr_python_exception": "MgrDBNotReady",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "fbbc6a4724a20738af8118fb5d84831008735002870daa3a76853a0dcaaa3f92",
"timestamp": "2023-05-29T07:32:20.873598Z",
"utsname_hostname": "gibba001",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-301.1.el8.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Tue Apr 13 16:24:22 UTC 2021"
}
From the mgr log:
2023-05-29T07:32:20.746+0000 7fe13d427700 0 [telemetry INFO root] Compiling and sending report to https://telemetry.ceph.com/report
2023-05-29T07:32:20.764+0000 7fe13d427700 0 [telemetry INFO root] Sending ceph report to: https://telemetry.ceph.com/report
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev c158f0be-5ee5-43ec-9dc4-5754658550ba does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev b16c5b1b-f70c-4902-a80a-58955b08c131 does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev d8460a9b-583b-4f9d-849c-3ed28768bbff does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev fbae7d8f-22a6-4ca3-8304-18a178d62c55 does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev 91c8d6fc-a976-4651-84f9-72dbc59c52b5 does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev 12f6ceb0-d855-4345-95cf-616f4429160b does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev b9af52da-d16d-4106-89b2-eb2220aff415 does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev 40bdf7b1-80d7-4fd3-beb6-069b394d7f31 does not exist
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO cherrypy.error] [29/May/2023:07:32:20] ENGINE Serving on http://:::9283
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO cherrypy.error] [29/May/2023:07:32:20] ENGINE Bus STARTED
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO root] Engine started.
2023-05-29T07:32:20.871+0000 7fe13d427700 0 [telemetry INFO root] Sent report to https://telemetry.ceph.com/report
2023-05-29T07:32:20.872+0000 7fe13d427700 -1 Remote method threw exception: Traceback (most recent call last):
File "/usr/share/ceph/mgr/devicehealth/module.py", line 764, in get_recent_device_metrics
return self._get_device_metrics(devid, min_sample=min_sample)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 553, in _get_device_metrics
with self._db_lock, self.db:
File "/usr/share/ceph/mgr/mgr_module.py", line 1233, in db
raise MgrDBNotReady();
mgr_module.MgrDBNotReady
2023-05-29T07:32:20.872+0000 7fe13d427700 0 [telemetry ERROR root] Unable to get recent metrics from device with id "TOSHIBA_MG04ACA1_Y9I3K2IYF6XF": Remote method threw exception: Traceback (most recent call last):
File "/usr/share/ceph/mgr/devicehealth/module.py", line 764, in get_recent_device_metrics
return self._get_device_metrics(devid, min_sample=min_sample)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 553, in _get_device_metrics
with self._db_lock, self.db:
File "/usr/share/ceph/mgr/mgr_module.py", line 1233, in db
raise MgrDBNotReady();
mgr_module.MgrDBNotReady
2023-05-29T07:32:20.872+0000 7fe13d427700 0 [telemetry ERROR root] Unable to send device report: Device channel is on, but the generated report was empty.
Updated by Laura Flores about 1 year ago
Updated by Laura Flores about 1 year ago
Could this be an sqlite issue rather than a problem with the devicehealth module?
src/pybind/mgr/mgr_module.py
1223 @property
1224 def db(self) -> sqlite3.Connection:
1225 assert self._db_lock.locked()
1226 if self._db is not None:
1227 return self._db
1228 db_allowed = self.get_ceph_option("mgr_pool")
1229 if not db_allowed:
1230 raise MgrDBNotReady();
1231 self._db = self.open_db()
1232 if self._db is None:
1233 raise MgrDBNotReady();
1234 return self._db
Updated by Yaarit Hatuka about 1 year ago
- Project changed from mgr to cephsqlite
- Category deleted (
devicehealth module)
Looks like a sqlite issue; Patrick, can you please take a look?
Updated by Patrick Donnelly about 1 year ago
- Status changed from New to Fix Under Review
- Assignee set to Patrick Donnelly
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
- Pull request ID set to 51858
Updated by Patrick Donnelly 11 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 11 months ago
- Copied to Backport #61834: quincy: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61835: pacific: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61836: reef: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Patrick Donnelly 7 months ago
- Status changed from Pending Backport to Resolved
Actions