Project

General

Profile

Actions

Bug #65716

open

mds: dir merge can't progress due to fragment nested pins, blocking the quiesce_path and causing a quiesce timeout

Added by Patrick Donnelly 18 days ago. Updated 2 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.bal operator() merging [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=2+5 state=1611399169|complete|auxsubtree f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00]                                                                                               
2024-04-30T06:01:37.391+0000 7f6043cf5640 20 mds.1.bal operator() : sib=[dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=1+1 state=1611399169|complete|auxsubtree f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80], is_auth=1, should_merge=1                                                                      
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.bal operator()   all sibs under 101* [0x56044844ff80] should merge
2024-04-30T06:01:37.391+0000 7f6043cf5640  7 mds.1.cache merge_dir to 10* on [inode 0x10000000003 [...2,head] /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ rep@0.1 fragtree_t(*^3) v282 DISTEPHEMERALPIN f(v1 m2024-04-30T06:00:43.833789+0000 40=0+40) n(v5 rc2024-04-30T06:01:32.635001+0000 b7974804365 27028=26903+125)/n(v0 rc2024-04-30T06:00:14.152213+0000 1=0+1) (isnap sync r=39) (inest mix w=8 dirty) (ipolicy sync r=34) caps={14636=pAsLsXs/-@19,17608=pAsLsXs/-@9,22353=pAsLsXsFs/-@14,22377=pAsLsXs/-@22} | dirtyscattered=1 request=0 lock=3 importing=0 dirfrag=6 caps=1 dirwaiter=0 waiter=0 discoverbase=0 distepin 0x560448325180]                                                                                                            
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache  we are merging by 1 bits
2024-04-30T06:01:37.391+0000 7f6043cf5640  7 mds.1.cache request_start_internal request(mds.1:1964 nref=2) op 5376
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=3+5 state=1611399169|complete|auxsubtree f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 3                                              
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=4+5 state=1611403265|complete|auxsubtree|fragmenting f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 4                                  
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) freeze_dir + wait [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=4+5 state=1611403281|complete|auxsubtree|freezingdir|fragmenting f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00]                                              
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.101*) auth_pin by 0x56044844ff80 on [dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=2+1 state=1611399169|complete|auxsubtree f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80] count now 2                                              
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.101*) auth_pin by 0x56044844ff80 on [dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=3+1 state=1611403265|complete|auxsubtree|fragmenting f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80] count now 3                                  
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.101*) freeze_dir + wait [dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=3+1 state=1611403281|complete|auxsubtree|freezingdir|fragmenting f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80]                                              
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache fragment_mark_and_complete [0x56044844fb00,0x56044844ff80] on [inode 0x10000000003 [...2,head] /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ rep@0.1 fragtree_t(*^3) v282 DISTEPHEMERALPIN f(v1 m2024-04-30T06:00:43.833789+0000 40=0+40) n(v5 rc2024-04-30T06:01:32.635001+0000 b7974804365 27028=26903+125)/n(v0 rc2024-04-30T06:00:14.152213+0000 1=0+1) (isnap sync r=39) (inest mix w=8 dirty) (ipolicy sync r=34) caps={14636=pAsLsXs/-@19,17608=pAsLsXs/-@9,22353=pAsLsXsFs/-@14,22377=pAsLsXs/-@22} | dirtyscattered=1 request=0 lock=3 importing=0 dirfrag=6 caps=1 dirwaiter=0 waiter=0 discoverbase=0 distepin 0x560448325180]                                                                  
2024-04-30T06:01:37.391+0000 7f6043cf5640 15 mds.1.cache  marking [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=4+5 state=1611403281|complete|auxsubtree|freezingdir|fragmenting f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00]                                                                               
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_unpin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=3+5 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 3       
2024-04-30T06:01:37.391+0000 7f6043cf5640 15 mds.1.cache  marking [dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=3+1 state=1611403281|complete|auxsubtree|freezingdir|fragmenting f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80]                                                                               
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.101*) auth_unpin by 0x56044844ff80 on [dir 0x10000000003.101* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=557 v=555 cv=161/161 dir_auth=1 ap=2+1 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 5=0+5)->f(v1 5=0+5) n(v5 rc2024-04-30T06:01:36.142295+0000 b289393048 2310=2295+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15)->n(v5 rc2024-04-30T06:01:36.142295+0000 b291145581 2320=2305+15)/n(v5 rc2024-04-30T06:01:31.201938+0000 b276857196 2139=2124+15) hs=5+0,ss=0+0 dirty=5 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844ff80] count now 2

..

2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.cache request_start_internal request(mds.1:6535 nref=2) op 5384
2024-04-30T06:01:55.355+0000 7f60474fc640  5 mds.1.cache dispatch_quiesce_path: dispatching
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.cache traverse: opening base ino 0x1 snap head
2024-04-30T06:01:55.355+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 0 'volumes' snapid head
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x1) lookup (volumes, 'head')
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x1)   hit -> (volumes,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=18),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=18) [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=18) on [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=19) [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=19) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:55.355+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 1 '_nogroup' snapid head
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000000) lookup (_nogroup, 'head')
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000000)   hit -> (_nogroup,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=18),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=18) [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=18) on [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=19) [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=19) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:55.355+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 2 'subvol_quiesce' snapid head
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000001) lookup (subvol_quiesce, 'head')
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000001)   hit -> (subvol_quiesce,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=18),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=18) [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=18) on [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=19) [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=19) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:55.355+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 3 '50c3fa66-d912-4010-923a-8660f36cff84' snapid head
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000002) lookup (50c3fa66-d912-4010-923a-8660f36cff84, 'head')
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000002)   hit -> (50c3fa66-d912-4010-923a-8660f36cff84,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=18),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=18) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=18) on [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=19) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=19) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.355+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 4 'tmp.TdX3PTfa5M' snapid head
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000003.100*) lookup (tmp.TdX3PTfa5M, 'head')
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000003.100*)   hit -> (tmp.TdX3PTfa5M,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  must authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]
2024-04-30T06:01:55.359+0000 7f60474fc640  7 mds.1.locker rdlock_finish on (dn sync r=18) on [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:55.359+0000 7f60474fc640  7 mds.1.locker rdlock_finish on (dn sync r=18) on [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:55.359+0000 7f60474fc640  7 mds.1.locker rdlock_finish on (dn sync r=18) on [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:55.359+0000 7f60474fc640  7 mds.1.locker rdlock_finish on (dn sync r=18) on [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.locker  can't auth_pin (freezing?), waiting to authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]
2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.cache traverse: failed to rdlock (dn sync) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]

From: /teuthology/pdonnell-2024-04-30_05:04:19-fs-wip-pdonnell-testing-20240429.210911-debug-distro-default-smithi/7680431/remote/smithi066/log/ceph-mds.c.log.1.gz


Related issues 1 (1 open0 closed)

Related to CephFS - Cleanup #66074: test_quiesce.py: Improve the quiesce vs fragmentation testNew

Actions
Actions #1

Updated by Leonid Usov 17 days ago

@Patrick Donnelly, so is this a deadlock between the operations, or just an unfortunate timing of the quiesce which would succeed eventually but only after the root would unfreeze?

Actions #2

Updated by Patrick Donnelly 17 days ago

Leonid Usov wrote in #note-1:

@Patrick Donnelly, so is this a deadlock between the operations, or just an unfortunate timing of the quiesce which would succeed eventually but only after the root would unfreeze?

Unfortunately, it's still a form of deadlock but far more unlikely.

Actions #3

Updated by Leonid Usov 17 days ago

  • Component(FS) quiesce added
  • Component(FS) deleted (MDS)
Actions #4

Updated by Leonid Usov 17 days ago

  • Component(FS) MDS added
Actions #5

Updated by Leonid Usov 16 days ago

The problem here doesn't seem to be quiesce-related.

We can see that the path traversal didn't attempt to authpin the fragmenting dir:

2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000002)   hit -> (50c3fa66-d912-4010-923a-8660f36cff84,head)
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:6535 nref=5)
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=18),f=0x1)]
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:55.355+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=18) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.355+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=18) on [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=18) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=19) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=19) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]

however, it did take the rdlock on it. Then it moved on to the inode in question, which happened to be the child of this dir, and there it did attempt to autpin:

2024-04-30T06:01:55.355+0000 7f60474fc640 10 mds.1.locker  must authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]

2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.locker  can't auth_pin (freezing?), waiting to authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]

The difference is that we don't attempt to authpin replicated directories which can be rdlocked immediately:

    } else if (p.is_rdlock()) {
      dout(20) << " must rdlock " << *lock << " " << *object << dendl;
      if (object->is_auth()) {
        mustpin.insert(object);
      } else if (!object->is_auth() &&
                 !lock->can_rdlock(client)) {      // we might have to request an rdlock
        dout(15) << " will also auth_pin " << *object
                 << " in case we need to request a rdlock" << dendl;
        mustpin.insert(object);
      }
    } else {

I suspect that keeping the rdlock on the fragmenting dir while waiting for the unfreeze most probably prevents the fragmenting from progressing.
Note that we never got to quiescing just yet, so this deadlock could happen with any other request that tries to traverse into the fragmenting dir.

I just realized that I need to look at the rank 0, the owner of the fragmenting dir. Our rdlock should have been replicated there and is probably holding the process from that side.

Actions #6

Updated by Leonid Usov 16 days ago

  • Subject changed from mds: quiesce_path blocks on acquiring auth_pins for dentries to root inode to be quiesced to mds: dir merge can't progress due to fragment nested pins, blocking the quiesce_path and causing a quiesce timeout

I made some progress.

the directory represented by the inode 0x1000000003 is owned by rank 0, we are the replica (rank 1)
we are currently merging two fragments .100* and .101* into one .10*. Both fragments are under our authority.

The process of fragmentation is hanging on WAIT_FROZEN for both fragments. I am looking at the fragment .100* because it was matched by the path traversal. And that fragment has a hanging pin by 0x56044844fb00

2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=3+5 state=1611399169|complete|auxsubtree f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 3
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=4+5 state=1611403265|complete|auxsubtree|fragmenting f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 4
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) auth_unpin by 0x56044844fb00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=3+5 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00] count now 3

One of the pins is unbalanced. The first pin is in fragment_freeze_dirs

  for (const auto& dir : dirs) {
    dir->auth_pin(dir);  // until we mark and complete them

and the second pin is in dir->freeze_dir()

bool CDir::freeze_dir()
{
  ceph_assert(!is_frozen());
  ceph_assert(!is_freezing());

  auth_pin(this);

Then, we unpin one of the pins in fragment_mark_and_complete

      dir->state_set(CDir::STATE_DNPINNEDFRAG);
      dir->auth_unpin(dir);

Looking later in the logs, we have plenty of opportunities to finish the freezing, but we don't:

2024-04-30T06:01:42.627+0000 7f60454f8640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044c4f2900 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=763 cv=161/161 dir_auth=1 ap=2+4 state=1611452433|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag|assimrstat f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 2
2024-04-30T06:01:43.095+0000 7f603eceb640 10 mds.1.cache.dir(0x10000000003.100*) auth_unpin by 0x56044c4f2900 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=764 cv=161/161 dir_auth=1 ap=1+4 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 1
2024-04-30T06:01:52.099+0000 7f60454f8640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044ec5c300 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=764 v=764 cv=161/161 dir_auth=1 ap=2+6 state=1611452433|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag|assimrstat f(v1 7=0+7)->f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)->n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 2
2024-04-30T06:01:52.391+0000 7f603eceb640 10 mds.1.cache.dir(0x10000000003.100*) auth_unpin by 0x56044ec5c300 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=765 cv=161/161 dir_auth=1 ap=1+7 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 1
2024-04-30T06:02:01.703+0000 7f60454f8640 10 mds.1.cache.dir(0x10000000003.100*) auth_pin by 0x56044fd51b00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=765 v=765 cv=161/161 dir_auth=1 ap=2+9 state=1611452433|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag|assimrstat f(v1 7=0+7)->f(v1 7=0+7) n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)->n(v8 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 2
2024-04-30T06:02:01.771+0000 7f603eceb640 10 mds.1.cache.dir(0x10000000003.100*) auth_unpin by 0x56044fd51b00 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=766 cv=161/161 dir_auth=1 ap=1+9 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v8 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] count now 1

it must be due to the dir_auth_pins

void CDir::maybe_finish_freeze()
{
  if (dir_auth_pins != 0)
    return;

And we keep going!

2024-04-30T06:01:42.419+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=763 cv=161/161 dir_auth=1 ap=1+4 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492ae100 count now 1/4
2024-04-30T06:01:44.343+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=764 cv=161/161 dir_auth=1 ap=1+5 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492ad600 count now 1/5
2024-04-30T06:01:51.651+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=764 cv=161/161 dir_auth=1 ap=1+6 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492af180 count now 1/6
2024-04-30T06:01:52.311+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=765 v=764 cv=161/161 dir_auth=1 ap=2+7 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7)->f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)->n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492af700 count now 2/7
2024-04-30T06:01:54.495+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=765 cv=161/161 dir_auth=1 ap=1+8 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492ae680 count now 1/8

This fragment's cdentries are heavily used before and during the fragmenting, and we just can't get to 0 nested pins on the dir.

 480836 2024-04-30T06:01:37.331+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=75 480836 9 cv=161/161 dir_auth=1 ap=0+4 state=1611399169|complete|auxsubtree f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty= 480836 7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 0x56044844fb00] by 0x5604492ae100 count now 0/4
 483540 2024-04-30T06:01:37.351+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=7 483540 61 v=759 cv=161/161 dir_auth=1 ap=1+5 state=1611399169|complete|auxsubtree f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)- 483540 >n(v5 rc2024-04-30T06:01:36.137200+0000 b424802455 3437=3416+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 aut 483540 hpin=1 0x56044844fb00] by 0x5604492ae680 count now 1/5
 495627 2024-04-30T06:01:37.435+0000 7f603eceb640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins -1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv= 495627 763 v=761 cv=161/161 dir_auth=1 ap=3+4 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b424802455 3437=3416+21)/n(v5 rc2024-04-30T06:01:31 495627 .513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree= 495627 1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x56044a152300 count now 3/4
 497441 2024-04-30T06:01:37.447+0000 7f603eceb640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins -1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=7 497441 63 cv=161/161 dir_auth=1 ap=1+3 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123 497441 668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x56044c923700 count now 1/3
1258511 2024-04-30T06:01:42.419+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=761258511 3 cv=161/161 dir_auth=1 ap=1+4 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b39812361258511 68 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492ae100 count now 1/4
1551619 2024-04-30T06:01:44.343+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=761551619 4 cv=161/161 dir_auth=1 ap=1+5 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 fro1551619 zen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492ad600 count now 1/5
2601758 2024-04-30T06:01:51.651+0000 7f60454f8640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=762601758 4 cv=161/161 dir_auth=1 ap=1+6 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v6 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 fro2601758 zen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00] by 0x5604492af180 count now 1/6

At this point, the quiesce request is just a victim of this never-ending fragmenting. There's no evidence yet about any kind of deadlock caused by the path traversal - I hope that rdlock on the parent dir won't be a problem.

Actions #7

Updated by Leonid Usov 16 days ago

@Patrick Donnelly I'm not ready to investigate why the freezing takes so long. Maybe it's one of those cases you mentioned to me. I don't see any mechanism in place that would actively revoke pins from clients, so this freezing might take arbitrarily long - right?

The only action item I can see here is to allow the quiesce_path to progress by forcing the authpin for a freezing parent, as we do in quiesce_inode.

Actions #8

Updated by Patrick Donnelly 16 days ago

2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.cache request_start_internal request(mds.1:1457 nref=2) op 5384
2024-04-30T06:01:36.107+0000 7f60474fc640  5 mds.1.cache dispatch_quiesce_path: dispatching
2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.cache traverse: opening base ino 0x1 snap head
2024-04-30T06:01:36.107+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 0 'volumes' snapid head
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x1) lookup (volumes, 'head')
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x1)   hit -> (volumes,head)
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:1457 nref=5)
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=4),f=0x1)]
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=4) [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=4) on [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=5) [dentry #0x1/volumes [2,head] rep@0.1 (dn sync r=5) (dversion lock) v=0 ino=0x10000000000 state=0 | lock=1 inodepin=1 0x5604482e2500]
2024-04-30T06:01:36.107+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 1 '_nogroup' snapid head
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000000) lookup (_nogroup, 'head')
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000000)   hit -> (_nogroup,head)
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:1457 nref=5)
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=4),f=0x1)]
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=4) [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=4) on [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=5) [dentry #0x1/volumes/_nogroup [2,head] rep@0.1 (dn sync r=5) (dversion lock) v=0 ino=0x10000000001 state=0 | lock=1 inodepin=1 0x5604482e2780]
2024-04-30T06:01:36.107+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 2 'subvol_quiesce' snapid head
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000001) lookup (subvol_quiesce, 'head')
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000001)   hit -> (subvol_quiesce,head)
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:1457 nref=5)
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=4),f=0x1)]
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=4) [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=4) on [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=5) [dentry #0x1/volumes/_nogroup/subvol_quiesce [2,head] rep@0.1 (dn sync r=5) (dversion lock) v=0 ino=0x10000000002 state=0 | lock=1 inodepin=1 0x5604482e2a00]
2024-04-30T06:01:36.107+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 3 '50c3fa66-d912-4010-923a-8660f36cff84' snapid head
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000002) lookup (50c3fa66-d912-4010-923a-8660f36cff84, 'head')
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000002)   hit -> (50c3fa66-d912-4010-923a-8660f36cff84,head)
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:1457 nref=5)
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync r=4),f=0x1)]
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync r=4) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:36.107+0000 7f60474fc640  7 mds.1.locker rdlock_start  on (dn sync r=4) on [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=4) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  got rdlock on (dn sync r=5) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84 [2,head] rep@0.1 (dn sync r=5) (dversion lock) v=0 ino=0x10000000003 state=0 | lock=1 inodepin=1 0x5604482e2c80]
2024-04-30T06:01:36.107+0000 7f60474fc640 12 mds.1.cache traverse: path seg depth 4 'tmp.6xQ4pmgVyY' snapid head
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000003.100*) lookup (tmp.6xQ4pmgVyY, 'head')
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.cache.dir(0x10000000003.100*)   hit -> (tmp.6xQ4pmgVyY,head)
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker acquire_locks request(mds.1:1457 nref=5)
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  lov = [LockOp(l=(dn sync),f=0x1)]
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  auth_pin_nonblocking=0
2024-04-30T06:01:36.107+0000 7f60474fc640 20 mds.1.locker  must rdlock (dn sync) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.6xQ4pmgVyY [2,head] auth{0=1} (dversion lock) v=744 ino=0x10000000bd0 state=1610612736 | inodepin=1 replicated=1 dirty=1 0x5604492bc500]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  must authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.6xQ4pmgVyY [2,head] auth{0=1} (dversion lock) v=744 ino=0x10000000bd0 state=1610612736 | inodepin=1 replicated=1 dirty=1 0x5604492bc500]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.locker  auth_pinning [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.6xQ4pmgVyY [2,head] auth{0=1} (dversion lock) v=744 ino=0x10000000bd0 state=1610612736 | inodepin=1 replicated=1 dirty=1 0x5604492bc500]
2024-04-30T06:01:36.107+0000 7f60474fc640 10 mds.1.cache.den(0x10000000003.100* tmp.6xQ4pmgVyY) auth_pin by 0x56044d282a00 on [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.6xQ4pmgVyY [2,head] auth{0=1} (dversion lock) v=744 ap=1 ino=0x10000000bd0 state=1610612736 | inodepin=1 replicated=1 dirty=1 authpin=1 0x5604492bc500] now 1
2024-04-30T06:01:36.107+0000 7f60474fc640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=749 cv=161/161 dir_auth=1 ap=0+2 state=1611399169|complete|auxsubtree f(v1 7=0+7) n(v5 rc2024-04-30T06:01:35.173616+0000 b418863303 3412=3391+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 0x56044844fb00] by 0x56044d282a00 count now 0/2
...
2024-04-30T06:01:37.391+0000 7f6043cf5640 10 mds.1.cache.dir(0x10000000003.100*) freeze_dir + wait [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} pv=763 v=759 cv=161/161 dir_auth=1 ap=4+5 state=1611403281|complete|auxsubtree|freezingdir|fragmenting f(v1 7=0+7)->f(v1 7=0+7) n(v5 rc2024-04-30T06:01:36.137200+0000 b422998290 3426=3405+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21)->n(v5 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 0x56044844fb00]
...
2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.locker  can't auth_pin (freezing?), waiting to authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]
2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.cache traverse: failed to rdlock (dn sync) [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]

mds.1:1457 adjusts nested authpins:

2024-04-30T06:01:36.107+0000 7f60474fc640 15 mds.1.cache.dir(0x10000000003.100*) adjust_nested_auth_pins 1 on [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=749 cv=161/161 dir_auth=1 ap=0+2 state=1611399169|complete|auxsubtree f(v1 7=0+7) n(v5 rc2024-04-30T06:01:35.173616+0000 b418863303 3412=3391+21)/n(v5 rc2024-04-30T06:01:31.513540+0000 b398123668 3208=3187+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 0x56044844fb00] by 0x56044d282a00 count now 0/2

dirfrag is then marked freezing by merge op, then new quiesce_path mds.1:6535 fails to authpin:

2024-04-30T06:01:55.359+0000 7f60474fc640 10 mds.1.locker  can't auth_pin (freezing?), waiting to authpin [dentry #0x1/volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/tmp.TdX3PTfa5M [2,head] auth{0=1} (dversion lock) v=756 ino=0x100000013a5 state=1610612738 | fragmenting=1 inodepin=1 replicated=1 dirty=1 0x5604492a9b80]
Actions #9

Updated by Patrick Donnelly 16 days ago

See also:

2024-04-30T06:01:55.355+0000 7f60454f8640 20 mds.1.journal EMetaBlob::add_dir_context(0x56044844fb00) reached unambig auth subtree, don't need 0x5604492a9900 at [dir 0x10000000003.100* /volumes/_nogroup/subvol_quiesce/50c3fa66-d912-4010-923a-8660f36cff84/ [2,head] auth{0=1} v=765 cv=161/161 dir_auth=1 ap=1+9 state=1611419665|complete|auxsubtree|freezingdir|fragmenting|dnpinnedfrag f(v1 7=0+7) n(v7 rc2024-04-30T06:01:36.137200+0000 b426890356 3448=3427+21) hs=7+0,ss=0+0 dirty=7 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 waiter=1 authpin=1 0x56044844fb00]

just before that second quiesce_path op in my previous comment. Shows the dirfrag is still freezing (not frozen!).

Actions #10

Updated by Leonid Usov 16 days ago

  • Assignee changed from Patrick Donnelly to Leonid Usov
  • Pull request ID set to 57250
Actions #11

Updated by Leonid Usov 6 days ago

There has been some development here. With the bypass freezing fix in place, we are still getting the timeouts. See https://qa-proxy.ceph.com/teuthology//leonidus-2024-05-12_10:14:03-fs-wip-lusov-quiesce-lock-auth-distro-default-smithi/7703256/ for an example.

The script has detected following requests that haven't completed:

'internal op quiesce_inode:mds.0:57429 fp=#0x100000068cd continuous want_bypass_freezing'
'internal op fragmentdir:mds.0:57002'
'client_request(client.4612:78238 unlink #0x10000003973/MDSTableServer.cc 2024-05-12T10:59:38.134353+0000 caller_uid=1000, caller_gid=1316{6,36,1000,1316,})'
'internal op fragmentdir:mds.0:57003'
'internal op fragmentdir:mds.0:57004'
'internal op quiesce_path:mds.0:57005 fp=#0x1 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57101 fp=#0x10000003973 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57006 fp=#0x1 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57007 fp=#0x10000000000 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57008 fp=#0x100000017b0 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57009 fp=#0x100000017b1 continuous want_bypass_freezing'
'internal op quiesce_inode:mds.0:57011 fp=#0x10000002ff6 continuous want_bypass_freezing'

We have here a split request that's freezing our Dir

2024-05-12T10:59:36.983+0000 7fb36d482640 10 mds.0.cache.dir(0x10000003973.001*) freeze_dir + wait [dir 0x10000003973.001* /client.0/tmp/ceph/src/mds/ [2,head] auth v=1556 cv=1007/1007 ap=2+10 state=1610878993|complete|freezingdir|fragmenting f(v0 m2024-05-12T10:59:31.982480+0000 5=5+0) n(v26 rc2024-05-12T10:59:31.984479+0000 b125104 5=5+0) hs=5+4,ss=99+0 dirty=104 | child=1 frozen=0 dirty=1 authpin=1 0x55cf1a4d1b00]

and we're quiescing this inode

2024-05-12T10:59:42.858+0000 7fb36ec85640 20 mds.0.cache dispatch_quiesce_inode request(mds.0:57429 nref=2 cnt wbf) quiescing
[inode 0x100000068cd [215,head] /client.0/tmp/ceph/src/mds/MDSTableServer.cc auth v1501 DIRTYPARENT s=4906 n(v0 rc2024-05-12T1
0:59:08.255970+0000 b4906 1=1+0) caps={4612=pAsLsXsFsc/-@2},l=4612(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dir
ty=1 authpin=0 0x55cf1ca76580]
..
2024-05-12T10:59:42.858+0000 7fb36ec85640 20 mds.0.cache.dir(0x10000003973.001*) allowing authpin with freezing

and then our quiesce is blocked on trying to xlock the filelock

2024-05-12T10:59:42.858+0000 7fb36ec85640  7 mds.0.locker xlock_start on (ifile sync) on [inode 0x100000068cd [215,head] /client.0/tmp/ceph/src/mds/MDSTableServer.cc auth v1501 ap=1 DIRTYPARENT s=4906 n(v0 rc2024-05-12T10:59:08.255970+0000 b4906 1=1+0) (iquiesce lock x=1 by request(mds.0:57429 nref=5 cnt wbf)) caps={4612=pAsLsXsFsc/-@2},l=4612(-1) | ptrwaiter=0 request=0 lock=2 caps=1 dirtyparent=1 dirty=1 authpin=1 0x55cf1ca76580]

for which it has to revoke caps from the client 4612

2024-05-12T10:59:42.858+0000 7fb36ec85640 20 mds.0.locker  client.4612 pending pAsLsXsFsc allowed pAsLsXsFc wanted -
2024-05-12T10:59:42.858+0000 7fb36ec85640  7 mds.0.locker    sending MClientCaps to client.4612 seq 3 new pending pAsLsXsFc was pAsLsXsFsc

but the client can't revoke caps! if go a few seconds back, we'll see

2024-05-12T10:59:38.134+0000 7fb36ec85640 10 mds.0.locker process_request_cap_release client.4612 pFc on [inode 0x100000068cd [215,head] /client.0/tmp/ceph/src/mds/MDSTableServer.cc auth v1501 DIRTYPARENT s=4906 n(v0 rc2024-05-12T10:59:08.255970+0000 b4906 1=1+0) caps={4612=pAsLsXsFsc/-@2},l=4612(-1) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=1 authpin=0 0x55cf1ca76580]
2024-05-12T10:59:38.134+0000 7fb36ec85640  7 mds.0.locker  frozen, deferring

which is coming from

bool Locker::should_defer_client_cap_frozen(CInode *in)
{
  if (in->is_frozen())
    return true;

  /*
   * This policy needs to be AT LEAST as permissive as allowing a client
   * request to go forward, or else a client request can release something,
   * the release gets deferred, but the request gets processed and deadlocks
   * because when the caps can't get revoked.
   *
   * No auth_pin implies that there is no unstable lock and @in is not auth
   * pinnned by client request. If parent dirfrag is auth pinned by a lock
   * cache, later request from lock cache owner may forcibly auth pin the @in.
   */
  if (in->is_freezing() && in->get_num_auth_pins() == 0) {
    CDir* dir = in->get_parent_dir();
    if (!dir || !dir->is_auth_pinned_by_lock_cache())
      return true;
  }
  return false;
}

Here we see that clients caps messages are deferred for a `freezing` parent, while our quiesce expects to be able to revoke caps on a freezing inode.

Unfortunately, it means that we need to change our strategy wrt `bypass_freezing`. We need to break through the fragmenting rather than ignore it, because otherwise, we'd have to change how cap messaging is allowed during freezing, which might be a bit too much.

Actions #12

Updated by Patrick Donnelly 5 days ago

Leonid Usov wrote in #note-11:

There has been some development here. With the bypass freezing fix in place, we are still getting the timeouts. See https://qa-proxy.ceph.com/teuthology//leonidus-2024-05-12_10:14:03-fs-wip-lusov-quiesce-lock-auth-distro-default-smithi/7703256/ for an example.

The script has detected following requests that haven't completed:

[...]

We have here a split request that's freezing our Dir

[...]

and we're quiescing this inode

[...]

and then our quiesce is blocked on trying to xlock the filelock

[...]

for which it has to revoke caps from the client 4612

[...]

but the client can't revoke caps! if go a few seconds back, we'll see

[...]

which is coming from

[...]

Here we see that clients caps messages are deferred for a `freezing` parent, while our quiesce expects to be able to revoke caps on a freezing inode.

Unfortunately, it means that we need to change our strategy wrt `bypass_freezing`. We need to break through the fragmenting rather than ignore it, because otherwise, we'd have to change how cap messaging is allowed during freezing, which might be a bit too much.

I'm not sure I understand why we wouldn't process cap releases on a freezing directory. Were you able to do some code history review to find a good rationale?

Actions #13

Updated by Leonid Usov 2 days ago

  • Pull request ID changed from 57250 to 57332
Actions #14

Updated by Leonid Usov 2 days ago

I'm not sure I understand why we wouldn't process cap releases on a freezing directory. Were you able to do some code history review to find a good rationale?

I haven't dug into it. I agree that this deserves a review. However, I wasn't comfortable changing the behavior to fix quiesce deadlocks. This could have unexpected side effects due to changed timings of otherwise stabilized flows. I prefer a solution where quiesce doesn't interfere with the freezing process

Actions #15

Updated by Leonid Usov 1 day ago

  • Related to Cleanup #66074: test_quiesce.py: Improve the quiesce vs fragmentation test added
Actions

Also available in: Atom PDF