Whatchers preventing images to be deleted

OpenStack colleagues might report problems purging images

[root@cci-cinder-u01 ~]# rbd -c /etc/ceph/ceph.conf --id volumes --pool volumes trash ls
2ccb86bd4fca85 volume-3983f035-a47f-46e8-868c-04d2345c3786
5afa5e5a07b8bc volume-02d959fe-a693-4acb-95e2-ca04b965389b
8df764f0d51e64 volume-eb48e00f-ea31-4d28-91a1-4f8319724da7
99e74530298e95 volume-18fbb3e6-fb37-4547-8d27-dcbc5056c2b2
ebcc84aa45a3da volume-821b9755-dd42-4bf5-a410-384339a2d9f0

[root@cci-cinder-u01 ~]# rbd -c /etc/ceph/ceph.conf --id volumes --pool volumes trash purge
2021-02-17 15:42:46.911 7f674affd700 -1 librbd::image::PreRemoveRequest: 0x7f6744001880 check_image_watchers: image has watchers - not removing
Removing images: 0% complete...failed.

Find out which are the watchers with using the identifier on the left-hand side:

[15:52][root@p05517715y58557 (production:ceph/beesly/mon*2:peon) ~]# rados listwatchers -p volumes rbd_header.2ccb86bd4fca85
watcher=188.184.103.106:0/964233084 client.634461458 cookie=140076936413376

Get in touch with the owner of the machine. The easiest way to fix stuck watchers is to reboot the machine.

Further information (might require untrash) about the volume can be found with

[18:31][root@p05517715y58557 (production:ceph/beesly/mon*2:peon) ~]# rbd info volumes/volume-00067659-3d1e-4e22-a5d7-212aba108500
rbd image 'volume-00067659-3d1e-4e22-a5d7-212aba108500':
    size 500 GiB in 128000 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: e8df4c4fe1aa8f
    block_name_prefix: rbd_data.e8df4c4fe1aa8f
    format: 2
    features: layering, striping, exclusive-lock, object-map
    op_features:
    flags:
    stripe unit: 4 MiB
    stripe count: 1

and with (no untrash required)

[18:32][root@p05517715y58557 (production:ceph/beesly/mon*2:peon) ~]# rados stat -p volumes rbd_header.e8df4c4fe1aa8f
volumes/rbd_header.e8df4c4fe1aa8f mtime 2020-11-23 10:25:56.000000, size 0

Unpurgeable RBD image in trash

We have seen a case of an image in Beesly's trash that cannot be purged:

# rbd --pool volumes trash ls
5afa5e5a07b8bc volume-02d959fe-a693-4acb-95e2-ca04b965389b

# rbd --pool volumes trash purge
Removing images: 0% complete...failed.
2021-03-10 13:58:42.849 7f78b3fc9c80 -1 librbd::api::Trash: remove:
error: image is pending restoration.

When trying to delete manually, it says there are some watchers, but this is actually not the case:

# rbd --pool volumes trash remove 5afa5e5a07b8bc
rbd: error: image still has watchers2021-03-10 14:00:21.262 7f93ee8f8c80
-1 librbd::api::Trash: remove: error: image is pending restoration.
This means the image is still open or the client using it crashed. Try
again after closing/unmapping it or waiting 30s for the crashed client
to timeout.
Removing image:
0% complete...failed.

# rados listwatchers -p volumes rbd_header.5afa5e5a07b8bc
#

This has been reported upstream. Check:

  • ceph-users with subject "Unpurgeable rbd image from trash"
  • ceph-tracker https://tracker.ceph.com/issues/49716

The original answer was

$ rados -p volumes getomapval rbd_trash id_5afa5e5a07b8bc key_file $ hexedit key_file ## CHANGE LAST BYTE FROM '01' to '00' $ rados -p volumes setomapval rbd_trash id_5afa5e5a07b8bc --input-file key_file $ rbd trash rm --pool volumes 5afa5e5a07b8bc

To unstuck the image and make it purgeable

  1. Get the value for its ID in rdb_trash
# rbd -p volumes trash ls
5afa5e5a07b8bc volume-02d959fe-a693-4acb-95e2-ca04b965389b
[09:42][root@p05517715d82373 (qa:ceph/beesly/mon*2:peon) ~]# rados -p volumes getomapval rbd_trash id_5afa5e5a07b8bc key_file
Writing to key_file
  1. Make a safety copy of the original key_file
# cp -vpr key_file key_file_master
  1. Edit the key_file with an hex editor and change the last byte from '01' to '00'
# hexedit key_file
  1. Make sure the edited file contains only that change
# xxd key_file > file
# xxd key_file_master > file_master
# diff file file_master
5c5
< 0000040: 2a60 09c5 d416 00                        *`.....
---
> 0000040: 2a60 09c5 d416 01                        *`.....
  1. Set the edited file to be the new value
# rados -p volumes setomapval rbd_trash id_5afa5e5a07b8bc < key_file
  1. Get it back and check that the last byte is now '00'
# rados -p volumes getomapval rbd_trash id_5afa5e5a07b8bc
value (71 bytes) :
00000000  02 01 41 00 00 00 00 2b  00 00 00 76 6f 6c 75 6d  |..A....+...volum|
00000010  65 2d 30 32 64 39 35 39  66 65 2d 61 36 39 33 2d  |e-02d959fe-a693-|
00000020  34 61 63 62 2d 39 35 65  32 2d 63 61 30 34 62 39  |4acb-95e2-ca04b9|
00000030  36 35 33 38 39 62 12 05  2a 60 09 c5 d4 16 12 05  |65389b..*`......|
00000040  2a 60 09 c5 d4 16 00                              |*`.....|
00000047
  1. Now you can finally purge the image
# rbd -p volumes trash purge
Removing images: 100% complete...done.
# rbd -p volumes trash ls
#

Undeletable image due to linked snapshots

We had a ticket (RQF2003413) of a user unable to delete a volume because of linked snapshots.

Dump the RBD info available on CEPH using the volume ID (see openstack_info of the undeletable volume:

[root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) ~]# rbd --pool cinder-critical info --image volume-d182a910-b40a-4dc0-89b7-890d6fa01efd 
rbd image 'volume-d182a910-b40a-4dc0-89b7-890d6fa01efd':
    size 29 TiB in 7680000 objects
    order 22 (4 MiB objects)
    snapshot_count: 1
    id: 457afdd323be829
    block_name_prefix: rbd_data.457afdd323be829
    format: 2
    features: layering
    op_features:
    flags:
    access_timestamp: Fri Mar 25 12:19:12 2022

The snapshot_count reports 1, which indicates one snapshot exists for the volume.

Now, list the snapshots for the undeletable volumes:

[root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) ~]# rbd --pool cinder-critical snap ls --image volume-d182a910-b40a-4dc0-89b7-890d6fa01efd 
SNAPID  NAME                                           SIZE    PROTECTED  TIMESTAMP
    37  snapshot-798d06dc-6af4-420d-89ce-1258104e1e0f  29 TiB  yes

In turn, it is possible to create volumes from snapshots. To check if they exist, list the child(ren) volume(s) from snapshots

[root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) ~]# rbd --pool cinder-critical children --image volume-d182a910-b40a-4dc0-89b7-890d6fa01efd --snap snapshot-798d06dc-6af4-420d-89ce-1258104e1e0f
cinder-critical/volume-b9d0035f-857c-46b6-b614-4480c462d306

This last is a brand new volume, that still keeps a reference to the snapshot it originates from:

[root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) ~]# rbd --pool cinder-critical info --image volume-b9d0035f-857c-46b6-b614-4480c462d306
rbd image 'volume-b9d0035f-857c-46b6-b614-4480c462d306':
    size 29 TiB in 7680000 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 7f8067e3510b0d
    block_name_prefix: rbd_data.7f8067e3510b0d
    format: 2
    features: layering, striping, exclusive-lock, object-map
    op_features:
    flags:
    access_timestamp: Fri Mar 25 12:20:51 2022
    modify_timestamp: Fri Mar 25 12:36:48 2022
    parent: cinder-critical/volume-d182a910-b40a-4dc0-89b7-890d6fa01efd@snapshot-798d06dc-6af4-420d-89ce-1258104e1e0f
    overlap: 29 TiB
    stripe unit: 4 MiB
    stripe count: 1

The parent field shows the volume comes from a snapshot, which cannot be deleted as the volume-from-snapshot is implemented as copy-on-write (see overlap: 29 TiB) via RBD layering.

Openstack can flatten volumes-from-snapshots in case these need to be made independent from the parent. Alternatively, to delete to parent volume, it is required to delete both the volume-from-snapshot and the snapshot.

Improve me !