Fix #65579
openmds: use _exit for QA killpoints rather than SIGABRT
0%
Description
Using signals to abruptly kill the MDS has a few issues:
- teuthology logs are polluted with stacktraces
- coredumps are generated and need cleaned up. These cores are not useful.
- signal handlers are invoked and allow some threads of the MDS to continue executing
- signal handlers may use malloc and lock up the mds
Instead, use the _exit syscall to stop the MDS immediately with an abnormal exit code*. This will immediately stop all threads atomically in the kernel, not generate a coredump, and quietly log to the teuthology.log.
Beyond changing the syscall, the qa suite will need cleaned up to stop looking for cores and may need adjustments to look for genuine exits instead of abnormal termination via signals.
* I'm using exit code 120 in the ceph-mgr in https://github.com/ceph/ceph/pull/56997Updated by Venky Shankar 16 days ago ยท Edited
@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:
src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);
with a conditional check and _exit()
?
Updated by Patrick Donnelly 16 days ago
Venky Shankar wrote in #note-1:
@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:
src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);
with a conditional check and
_exit()
?
That's right.
Updated by Venky Shankar 15 days ago
Patrick Donnelly wrote in #note-2:
Venky Shankar wrote in #note-1:
@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:
src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);
with a conditional check and
_exit()
?That's right.
Fair enough.
Updated by Venky Shankar 15 days ago
- Assignee set to Neeraj Pratap Singh
Neeraj, please take this one.