Project

General

Profile

Actions

Fix #65579

open

mds: use _exit for QA killpoints rather than SIGABRT

Added by Patrick Donnelly 29 days ago. Updated 15 days ago.

Status:
New
Priority:
High
Category:
Code Hygiene
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
squid,reef
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, qa-suite
Labels (FS):
qa, task(easy)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using signals to abruptly kill the MDS has a few issues:

- teuthology logs are polluted with stacktraces
- coredumps are generated and need cleaned up. These cores are not useful.
- signal handlers are invoked and allow some threads of the MDS to continue executing
- signal handlers may use malloc and lock up the mds

Instead, use the _exit syscall to stop the MDS immediately with an abnormal exit code*. This will immediately stop all threads atomically in the kernel, not generate a coredump, and quietly log to the teuthology.log.

Beyond changing the syscall, the qa suite will need cleaned up to stop looking for cores and may need adjustments to look for genuine exits instead of abnormal termination via signals.

* I'm using exit code 120 in the ceph-mgr in https://github.com/ceph/ceph/pull/56997
Actions #1

Updated by Venky Shankar 16 days ago ยท Edited

@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:

src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);

with a conditional check and _exit()?

Actions #2

Updated by Patrick Donnelly 16 days ago

Venky Shankar wrote in #note-1:

@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:

src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);

with a conditional check and _exit()?

That's right.

Actions #3

Updated by Patrick Donnelly 16 days ago

  • Description updated (diff)
Actions #4

Updated by Venky Shankar 15 days ago

Patrick Donnelly wrote in #note-2:

Venky Shankar wrote in #note-1:

@Patrick Donnelly Are you talking about TestShutdownKillpoints() in test_failover? If yes, you are suggesting changing, e.g.:

src/mds/MDCache.cc: ceph_assert(kill_shutdown_at != KILL_SHUTDOWN_AT::SHUTDOWN_START);

with a conditional check and _exit()?

That's right.

Fair enough.

Actions #5

Updated by Venky Shankar 15 days ago

  • Assignee set to Neeraj Pratap Singh

Neeraj, please take this one.

Actions

Also available in: Atom PDF