Bug #65647: Evicted kernel client may get stuck after reconnect - CephFS - Ceph

Actions

Copy link

Bug #65647

open

Evicted kernel client may get stuck after reconnect

Added by Mykola Golub 24 days ago. Updated 2 days ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Xiubo Li

Category:

Correctness/Safety

Target version:

Ceph - v20.0.0

% Done:

Source:

Tags:

Backport:

reef,squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Our customer were observing sporadic "client isn't responding to mclientcaps(revoke)" issue so they configured auto eviction without blocklisting:

mds               advanced  mds_cap_revoke_eviction_timeout        600.000000                                                                                            
mds               advanced  mds_session_blocklist_on_evict         false                                                                                                 
mds               advanced  mds_session_blocklist_on_timeout       false

It works the most of times: after eviction the client opens a new connection and k8s pods may successfully bind to the volume. But sometimes the client gets stuck: the pods fail to bind to the volume until the host is reboot, although on the mds side the session is seen in "open" state and looks like just an idle. If we do a manual evict for such session (ceph tell mds.0 session kill id) it helps and the pods may bind to the volume again.

From the mds log it looks like for these sporadic cases we always observe "denied reconnect attempt" message, like below:

Apr 08 14:35:03 ceph04 ceph-mds[48012]: mds.0.server evict_cap_revoke_non_responders: evicting cap revoke non-responder client id 19707773
Apr 08 14:35:03 ceph04 ceph-mds[48012]: mds.0.63169 Evicting client session 19707773 (v1:192.168.100.45:0/4254479370)
Apr 08 14:35:03 ceph04 ceph-mds[48012]: log_channel(cluster) log [INF] : Evicting client session 19707773 (v1:192.168.100.45:0/4254479370)
Apr 08 14:35:10 ceph04 ceph-mds[48012]: mds.0.server no longer in reconnect state, ignoring reconnect, sending close
Apr 08 14:35:10 ceph04 ceph-mds[48012]: log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:active) from client.19707773 v1:192.168.100.45:0/4254479370 after 1.68543e+07 (allowed interval 45)

Note, it is just some lines from the logs, for illustration. The full log is attached.

So, normally, when a client successfully reconnects we may see only "no longer in reconnect state, ignoring reconnect, sending close" messages, which according to the code are when the client sends "reconnect" message while the session is in close state.

But "denied reconnect attempt" message seems to be possible only when the session is in open state. My interpretation of this (although may be incorrect) is the following. Normally (when it works) the sequence looks like the following:

1) on eviction, the mds closes the session
2) the client notices this and sends "reconnect" message
3) the mds sees the session in close state and "ignores reconnect, sending close".
4) the client sends "open" and the session becomes "open" again.

And in the problematic case to me it looks like after (4) the mds receives "reconnect" message from (2) again (a resent dup?) and this makes it get stuck.

The client kernel version:

"kernel_version": "5.15.0-92-generic",

The mds version:

"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"

Files

ceph-mds.cephfs.ceph04.kqqlbq.log.gz (44.7 KB) ceph-mds.cephfs.ceph04.kqqlbq.log.gz

Mykola Golub, 04/24/2024 11:12 AM

Actions

Copy link

Updated by Xiubo Li 23 days ago

I think you have enabled recover_session in kclient ?

In general, this option should not be used by most clients that have stable connections to the MDS's, but in environments where connectivity may be spotty, and availability is more important than correctness, admins may want to consider enabling it.

More detail about this option please see https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/.

Actions

Copy link

Updated by Xiubo Li 23 days ago

Xiubo Li wrote in #note-1:

I think you have enabled recover_session in kclient ?

[...]

More detail about this option please see https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/.

Pardon, please just ignore this, this reconnect was from the reset logic.

Actions

Copy link

Updated by Xiubo Li 23 days ago

Mykola Golub wrote:

Our customer were observing sporadic "client isn't responding to mclientcaps(revoke)" issue so they configured auto eviction without blocklisting:
[...]
It works the most of times: after eviction the client opens a new connection and k8s pods may successfully bind to the volume. But sometimes the client gets stuck: the pods fail to bind to the volume until the host is reboot, although on the mds side the session is seen in "open" state and looks like just an idle. If we do a manual evict for such session (ceph tell mds.0 session kill id) it helps and the pods may bind to the volume again.

From the mds log it looks like for these sporadic cases we always observe "denied reconnect attempt" message, like below:
[...]
Note, it is just some lines from the logs, for illustration. The full log is attached.

So, normally, when a client successfully reconnects we may see only "no longer in reconnect state, ignoring reconnect, sending close" messages, which according to the code are when the client sends "reconnect" message while the session is in close state.

But "denied reconnect attempt" message seems to be possible only when the session is in open state. My interpretation of this (although may be incorrect) is the following. Normally (when it works) the sequence looks like the following:

1) on eviction, the mds closes the session
2) the client notices this and sends "reconnect" message
3) the mds sees the session in close state and "ignores reconnect, sending close".
4) the client sends "open" and the session becomes "open" again.

I didn't see any new open sessions logs just after the reconnect.

When the MDS reset the socket connection the kclient will detect it and then try to send a reconnect. While in the MDS it will just deny it since it's not in the RECONNECT state any more.

All the logs look normal to me.

And in the problematic case to me it looks like after (4) the mds receives "reconnect" message from (2) again (a resent dup?) and this makes it get stuck.

From the logs I didn't see this.

And do you have the kcleint side logs ? What's the errors there ?

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

debug mds = 20
debug ms = 1

The client kernel version:

"kernel_version": "5.15.0-92-generic",

The mds version:

"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)"

Actions

Copy link

Updated by Mykola Golub 23 days ago

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

Actions

Copy link

Updated by Venky Shankar 18 days ago

Mykola Golub wrote in #note-4:

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

I don't get it. The denied reconnect attempt cluster log is right after the no longer in reconnect state ... log message.

    if (reconnect_all_deny) {
      dout(1) << "mds_deny_all_reconnect was set to speed up reboot phase, ignoring reconnect, sending close" << dendl;
    } else {
      dout(1) << "no longer in reconnect state, ignoring reconnect, sending close" << dendl;
    }
    mds->clog->info() << "denied reconnect attempt (mds is " 
       << ceph_mds_state_name(mds->get_state())
       << ") from " << m->get_source_inst()
       << " after " << delay << " (allowed interval " << g_conf()->mds_reconnect_timeout << ")";
    deny = true;

... and further down below in the function, there is:

    if (!error_str.empty()) {
      deny = true;
      dout(1) << " " << error_str << ", ignoring reconnect, sending close" << dendl;
      mds->clog->info() << "denied reconnect attempt from " 
                        << m->get_source_inst() << " (" << error_str << ")";
    }

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

Actions

Copy link

Updated by Xiubo Li 18 days ago

Venky Shankar wrote in #note-5:

Mykola Golub wrote in #note-4:

Xiubo Li wrote in #note-3:

Is that possible to enable the mds debug logs, let's see whether there are other logs about this.

Hi Xiubo, thank you for the reply. It is our customer production cluster, the issue is very sporadic, I would say once per month or so, thus having the debug log enabled for all that time waiting for the issue to reoccur would be problematic. Also it is problematic to get access to the client hosts to collect debug info there (unless you may provide exact instructions how to get the debug information we need, I don't have much experience with the cephfs kernel client troubleshooting). Anyway if I get any new data I will update the ticket.

The log I attached is not very verbose, especially for ms subsystem, so my assumptions about "open session" is based on looking at the code: both "no longer in reconnect state, ignoring reconnect, sending close" and "denied reconnect attempt from " messages are printed by Server::handle_client_reconnect, i.e. it should be when handling a "reconnect" message, and the first message is printed when the session is not in open state on the mds side, while the second is printed when the session is open state.

[1] https://github.com/ceph/ceph/blob/c61472c83ce76869b0fe076c19ed2f4bcc3fb4af/src/mds/Server.cc#L1471

I don't get it. The denied reconnect attempt cluster log is right after the no longer in reconnect state ... log message.

[...]

... and further down below in the function, there is:

[...]

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

All these logs look normal and that's because in this case it would trigger the socket reset in client side and then the client tried to reconnect to the sessions, but since the MDS was not in reconnect state and denied it.

I didn't see any try to open the session before and after this.

Actions

Copy link

Updated by Venky Shankar 18 days ago

Category set to Correctness/Safety
Assignee set to Xiubo Li
Target version set to v20.0.0
Backport set to reef,squid

Actions

Copy link

Updated by Venky Shankar 18 days ago

Status changed from New to Triaged

Actions

Copy link

Updated by Mykola Golub 17 days ago

Xiubo Li wrote in #note-6:

Venky Shankar wrote in #note-5:

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

All these logs look normal and that's because in this case it would trigger the socket reset in client side and then the client tried to reconnect to the sessions, but since the MDS was not in reconnect state and denied it.

I didn't see any try to open the session before and after this.

Hi Xiubo and Venky,

In the attached logs you may find many cases when clients were auto-evicted. In most of the cases the clients successfully reconnected after this. In some cases they didn't (the client host needed reboot or one had to tell mds `session kill` for the problematic session to allow the pods successfully bind to the problematic volume). We was not able to check all the cases so cannot be 100% sure, but it seemed that for that problematic cases we always observed "denied reconnect attempt (mds is up:active) from client.*" messages. In other ("normal") cases we observed only "no longer in reconnect state, ignoring reconnect, sending close" messages.

If you think that the behavior is expected how do you explain the difference? Why do we observe "denied reconnect attempt" messages in some cases but not in others, and why do we have troubles with the client in the first case?

Note, normally (e.g. if I manually evict a client) I don't see "denied reconnect attempt (mds is up:active) from client.*" message, I failed to reproduce it, and it looks to me like a race.

Actions

Copy link

#10

Updated by Venky Shankar 15 days ago

Mykola, thank you for the update. It's very likely these config setting has exposed a bug in the MDS. I did a quick search in our qa suite and there isn't any test where mds_session_blocklist_on_evict is disabled. I should also point you to [0] which was specifically introduced to recover a blocklisted session and the respective tests exist in our qa suite.

I should mention that this is to be used in environments where availability matters more than correctness (which comes close to the config settings mentioned in the tracker description).

[0]: https://ceph.io/en/news/blog/2020/automatic-cephfs-recovery-after-blacklisting/

Actions

Copy link

#11

Updated by Xiubo Li 12 days ago

Mykola Golub wrote in #note-9:

Xiubo Li wrote in #note-6:

Venky Shankar wrote in #note-5:

The tracker description mentions denied reconnect attempt (mds is ... for the problematic case, but that should always show up in cluster log. I'm missing something.... :/

All these logs look normal and that's because in this case it would trigger the socket reset in client side and then the client tried to reconnect to the sessions, but since the MDS was not in reconnect state and denied it.

I didn't see any try to open the session before and after this.

Hi Xiubo and Venky,

In the attached logs you may find many cases when clients were auto-evicted. In most of the cases the clients successfully reconnected after this. In some cases they didn't (the client host needed reboot or one had to tell mds `session kill` for the problematic session to allow the pods successfully bind to the problematic volume). We was not able to check all the cases so cannot be 100% sure, but it seemed that for that problematic cases we always observed "denied reconnect attempt (mds is up:active) from client.*" messages. In other ("normal") cases we observed only "no longer in reconnect state, ignoring reconnect, sending close" messages.

If you think that the behavior is expected how do you explain the difference? Why do we observe "denied reconnect attempt" messages in some cases but not in others, and why do we have troubles with the client in the first case?

From your logs there had several failure cases except being denied:

Jan 16 11:52:39 ceph04 ceph-mds[48012]: mds.0.server  ignoring msg from not-open sessionclient_reconnect(9 caps 3 realms ) v3

Please note after a client being evicted the reconnect won't work anyway, and it should to send a reconnect request to if it want the client to continue to work.

Instead it should do the remount or recover.

So please tell me whether have you enabled the recover_session mount option as Venky mentioned in [0] in comment#10 ? If no, then could you tell me how did your case to reopen the sessions after being evicted, doing a remount ?

Thanks

Actions

Copy link

#12

Updated by Mykola Golub 6 days ago

Xiubo Li wrote in #note-11:

From your logs there had several failure cases except being denied:

[...]

Please note after a client being evicted the reconnect won't work anyway, and it should to send a reconnect request to if it want the client to continue to work.

Instead it should do the remount or recover.

So please tell me whether have you enabled the recover_session mount option as Venky mentioned in [0] in comment#10 ? If no, then could you tell me how did your case to reopen the sessions after being evicted, doing a remount ?

The cephfs subvolumes are used by k8s pods, we don't have access to the client side, so we will need to ask the customer to recheck, but my current assumption is that they don't use recover_session=clean mount option. Also note, as I mentioned earlier we have "mds_session_blocklist_on_evict=false" on the server side to prevent client blacklisting, so I don't think this option matters here. When testing this in local setup we saw that the remount was not needed after evict with these settings. But also note, that in the customer case, when this sporadic issue occurred, the remount did not work too: i.e. the failed pods were restarted but they failed to bind the volume.

Actions

Copy link

#13

Updated by Xiubo Li 5 days ago · Edited

Mykola Golub wrote in #note-12:

Xiubo Li wrote in #note-11:

From your logs there had several failure cases except being denied:

[...]

Please note after a client being evicted the reconnect won't work anyway, and it should to send a reconnect request to if it want the client to continue to work.

Instead it should do the remount or recover.

So please tell me whether have you enabled the recover_session mount option as Venky mentioned in [0] in comment#10 ? If no, then could you tell me how did your case to reopen the sessions after being evicted, doing a remount ?

The cephfs subvolumes are used by k8s pods, we don't have access to the client side, so we will need to ask the customer to recheck, but my current assumption is that they don't use recover_session=clean mount option. Also note, as I mentioned earlier we have "mds_session_blocklist_on_evict=false" on the server side to prevent client blacklisting, so I don't think this option matters here. When testing this in local setup we saw that the remount was not needed after evict with these settings. But also note, that in the customer case, when this sporadic issue occurred, the remount did not work too: i.e. the failed pods were restarted but they failed to bind the volume.

From the kclient side there have only 3 cases will it trigger reopen the killed sessions by itself:

1, when kclient detects the socket connection is closed or reset it will try to send a reconnect requests to MDS, but the reconnect requests will be dropped directly if MDS is not in `RECONNECT* state.

2, when the recover_session is correctly set, the kclient will try to reopen the session, in this case the MDS won't reject it mostly.

3, remount it

So firstly when the MDSs kill a session it will trigger the 1) case immediately and this is certainly will fail, and then the second per 5 seconds. So I just suspect your customer is using the second case.

Actions

Copy link

#14

Updated by Xiubo Li 5 days ago

Xiubo Li wrote in #note-13:

Mykola Golub wrote in #note-12:

Xiubo Li wrote in #note-11:

From your logs there had several failure cases except being denied:

[...]

Please note after a client being evicted the reconnect won't work anyway, and it should to send a reconnect request to if it want the client to continue to work.

Instead it should do the remount or recover.

So please tell me whether have you enabled the recover_session mount option as Venky mentioned in [0] in comment#10 ? If no, then could you tell me how did your case to reopen the sessions after being evicted, doing a remount ?

The cephfs subvolumes are used by k8s pods, we don't have access to the client side, so we will need to ask the customer to recheck, but my current assumption is that they don't use recover_session=clean mount option. Also note, as I mentioned earlier we have "mds_session_blocklist_on_evict=false" on the server side to prevent client blacklisting, so I don't think this option matters here. When testing this in local setup we saw that the remount was not needed after evict with these settings. But also note, that in the customer case, when this sporadic issue occurred, the remount did not work too: i.e. the failed pods were restarted but they failed to bind the volume.

From the kclient side there have only 3 cases will it trigger reopen the killed sessions by itself:

1, when kclient detects the socket connection is closed or reset it will try to send a reconnect requests to MDS, but the reconnect requests will be dropped directly if MDS is not in `RECONNECT* state.

2, when the recover_session is correctly set, the kclient will try to reopen the session, in this case the MDS won't reject it mostly.

3, remount it

So firstly when the MDSs kill a session it will trigger the 1) case immediately and this is certainly will fail, and then the second per 5 seconds. So I just suspect your customer is using the second case.

Yeah, there is another case:

Without the recover_session enabled the MDS will just close the session and then in client side it will just remove the session entity from the session list, then when the client is trying to issue new requests it will try to open the session again.

But when the mds_session_blocklist_on_evict is enabled, the client will the session openning request will be rejected by sending a MClientSession::SESSION_BLOCKLISTED flag back to kclient. And then kclient will stop the following requests.

Actions

Copy link

#15

Updated by Xiubo Li 5 days ago

Status changed from Triaged to In Progress

Actions

Copy link

#16

Updated by Xiubo Li 4 days ago

Added a test case for this https://github.com/ceph/ceph/pull/57458.

Actions

Copy link

#17

Updated by Xiubo Li 4 days ago

Status changed from In Progress to Fix Under Review

The patch work link to fix it in kernel: https://patchwork.kernel.org/project/ceph-devel/patch/20240514070856.194701-1-xiubli@redhat.com/

Actions

Copy link

#18

Updated by Xiubo Li 2 days ago

2794 Mar 05 12:40:21 ceph04 ceph-mds[48012]: mds.0.server  ignoring msg from not-open sessionclient_reconnect(9 caps 3 realms ) v3

For this it's because when the mds try to kill a session it will abort the connection directly without sending the session close request to clents. Then in the client side the session state will be kept as open and then receive the connection peer reset notification. And the kclient will send a reconnect request to MDS and try to reconnect it and at the same time the it will try to resend the inflight client requests.

While in MDS side the session has been closed and removed, so when it first receive any client requests and the reconnect request it couldn't find the corresponding sessions and then will drop them directly. And at the same time it will send the session close request to kclient. then in kclient the corresponding session will be marked as closed.

Then when the kclient tries to send any new client requests it will try to open the session first.

This is why it can recover by itself.

But I still haven't figured out in which case will the kclient send a reconnect request after this:

2790 Mar 05 12:40:21 ceph04 ceph-mds[48012]: mds.0.server no longer in reconnect state, ignoring reconnect, sending close

This really could cause issue IMO. I just suspect that the connection reset will be triggered more than once in some cases.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #65647

Evicted kernel client may get stuck after reconnect

Updated by Xiubo Li 23 days ago

Updated by Xiubo Li 23 days ago

Updated by Xiubo Li 23 days ago

Updated by Mykola Golub 23 days ago

Updated by Venky Shankar 18 days ago

Updated by Xiubo Li 18 days ago

Updated by Venky Shankar 18 days ago

Updated by Venky Shankar 18 days ago

Updated by Mykola Golub 17 days ago

Updated by Venky Shankar 15 days ago

Updated by Xiubo Li 12 days ago

Updated by Mykola Golub 6 days ago

Updated by Xiubo Li 5 days ago · Edited

Updated by Xiubo Li 5 days ago

Updated by Xiubo Li 5 days ago

Updated by Xiubo Li 4 days ago

Updated by Xiubo Li 4 days ago

Updated by Xiubo Li 2 days ago