ZFS give incorrect error messages, pretends total data loss and urges user to reformat disks, when actually there is no data problem. #17272
Replies: 5 comments 1 reply
-
Can you include |
Beta Was this translation helpful? Give feedback.
-
zpool status on the original system was completely normal, just indicating that on the recent scrub there weren't found/corrected any issue. zpool import on the target system:
|
Beta Was this translation helpful? Give feedback.
-
Next attempt was to try whether things go differently if I use manually-created device links. Thus, on the target system I entered: # ln -s /dev/disk/by-id/ata-HGST_HUS726T4TALE6L4_V6H9TVXR-part1 /zpooldevs/bak_4T_wwn_5000cca097d28c32
|
Beta Was this translation helpful? Give feedback.
-
Finally, this thread showed a "solution". Thanks @amotin Well, I do not consider really intuitive that doing a wipefs is apparently prerequisite before being able to importing... Maybe the error message links mentioned in the entry post could be updated accordingly, instead of effectively doing bad April Fools Day jokes to the users by withhelding that unintuitive fix and instead falsely telling them their data has gone forever? |
Beta Was this translation helpful? Give feedback.
-
Nooooo! Unfortunately this "solution" does not work always. When putting the transport drive into the target system again after having loaded it with another dataset, I now get told by zpool import -d that I can import the pool using its name or its pool GUID... But it imports the pool when using the pool GUID !! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
System A: Debian 12.9, zfs-2.1.11-1+deb12u1 zfs-kmod-2.1.11-1+deb12u1
System B: Ubuntu 24.04.02 Server LTS zfs-2.2.2-0ubuntu9.2 zfs-kmod-2.2.2-0ubuntu9.1
I am attempting to get some data transported physically from a computer to another.
On system A, the data was packed onto a transport pool on the 4TB drive via a lengthy local zfs send/receive, and exported afterwards, before the drive was moved to system B.
I expected to just import the pool on system B and access the data.
But, when I put that drive into system B, I got an error message a la "Oops! Your precious data now is irretrievably lost forever! Your only remaining option is to format the drive!".
When I researched I found out that this is a longstanding issue whose PR was apparently closed without resolving action:
So when I correct the symbolic links so that they actually represent what zdb -l displays in the "path:" section, I get another "joke" message. This time that one: The device listed as FAULTED with ‘corrupted data’ cannot be opened due to a corrupt label. ZFS will be unable to use the pool, and all data within the pool is irrevocably lost.
This message again is IMHO can not be correct.
Because, putting back the drive in system A, importing it again, inspecting and scrubbing did not reveal anything unusual!
Such behaviours I never observed on FreeBSD.
Totally misleading and incorrect error messages!
Even urging the user to actually irrevocably delete their data!
Right now I am doing the zfs send/receive action again, this time on a pool created using -d, and using a manually-created link as path, instead of the links in /dev/disk, just for finding out whether this behaviour could be caused by a gazillion of features active, or some issue with /dev/... paths. But, again, on FreeBSD I didn't have such an issue in years. IMHO such behavior should not happen on Linux either.
Any idea why on Linux ZFS gives such user-shocking grotesquely wrong messages suggesting total data loss, and even suggestis the user to format the drives, causing actual data loss, when in fact there are no data errors? I just don't get it...
Again, the only thing that was different in system B was the path position.
But, isn't it supposed to do so? At least when the disks are imported on a different system?
Refusing to import the pool at its new hardware path and pretend total data loss just because of this minor (if at all) label inconsistency is IMHO not an OK behavior... what do you think? what did I miss or get wrong?
Beta Was this translation helpful? Give feedback.
All reactions