Recovering files from a failing USB drive¶
A friend of mine came to me with a failing usb disk. There was a folder with a lot of garbish showing in Windows file explorer, and she asked me to retrieve whatever data was possible. There was one folder that went bad.
Why this complicates things¶
Under the VFAT file systems a directory is like a file itself, alocated in the File Allocation Table. So how we recover from these issues, is by searching through the data for recognized file headers (signatures) and going from there. We know this similar to the file command which can recognize a lot of signatures and even read some metadata from files, but it relies of the file system to know where a file begins, and more importantly: where it ends. Excavation software like photorec is needed to also find a 'sane ending' of a file. They are pretty good at it! (And you may want to sponsor them when it has saved your some of your files, i know i already did years ago).
So ploughing through a potential bad disk isn't a good idea (because more reads could damage your disk even more), so we need an image. use (g)ddrescue for that. ddrescue is the command, while gddrescue is the package name - good to know. Now, ddrescue basically has the purpose of dd but now with smarter rescue issues; that means: it keeps track of the bad sectors in your storage, and allows even deeper of better retries to try to get some data out, after a first "normal" light pass of trying to copy as much data as possible "before it fully breaks". Now it stores all this data in a map file, so in the case of disconnects, you can easily continue instead of starting over. The map file is also used to work hard only on the bad sectors, so it's really a nice thing when your physical layer is the actial problem. In this case, it wasn't so excavation was quite simple: ddrescue copied the entire disk into an usb.img file, while maintaining an usb.map file. An extra command ddrescueview is installable to view the map file graphically; used to see progress, but also to see the bad areas. In this case, everything was green and we duplicated the entire usb disk in about an hour time to an nvme disk. That really speeds things up very drastically!
Install gddrescue and photorec, although older that the latest compiled sources, these will probably work in most occasions.
First we have to find out what the device name is of the usbdisk.
sudo lsblk
: ...
Once we know where the disk as a device resides, we can copy it's contents to another device (namelijk my home device in this case) using ddrescue:
sudo ddrescue -n /dev/sda usb.img usb.map
: ...
having this (potentially large) usb.img file you can analyse it's partitions using testdtisk, or maybe even alter those. so it has some fdisk-esque capabilities, but far less secure (and therefor capable of handling situations that shouldn't arise).
Since our partition table wasn't the issue, testdisk didn't yield much results, other than okays and fines. Trying to read the corrupted directory yielded testdisks version of an io-error, or partition error. So no luck there.
Putting on our adventurous excavation hat¶
That's when it's up to photorec, as the name implies it once was created to return photos from crashed memory cards, but has since evolved into quite a generic file restoration tool. It will search through your entire disk (or usb.img) and search for file starts and endings. I haven't looked into the code to see what kind of magical rites or advanced math they to do discover the starts, endings and other blocks; but i know they do, and do it marvelously!
Photorec is an interactive console application, like testdisk, and you can easily navigate through the dump:
sudo photorec usb.img
: ...
I had to select the entire disk for the best results, and it recovered a massive amount of files. But since only 1 folder was corrupted, it will find all files from all directories and spew them out into a recovery folder. This can be good, but also quite bad: i don't need an extra copy of all the files elsewhere on the disk, but only those from the corrupted folder. And since recovery cannot reliably recover names, you get completely garbled useless filenames. Thus, i had gptme generate me an file hash compare tool that will compare files based on content-hashes, finding identical files. This was able to rename the recovered files to the location it has on disk (using a sligified version, so folder names and file names would be underscore seperated, but at least usable by my friend). This saved a few files, but it was nasty still: because without file dates or file names, it quite a lot to handle 1000+ files. Since there werent a lot of extension, i asked gptme to write me another python script that would use a uv virtual environment, install dependencies to read the different formats and extract metadata or contents and name the file accordingly. And so it did. There was a sligt issue with .xslx files, it tried to rename them all to Blad 1, which is the default Dutch excel sheet name. But in linux, saving that will lead to only one file that keeps getting overwritten. So, there was more magic to it: don't overwrite files, but check that those don't already exist. For .docx files it will find the title, or use the first sentence available. Same for .pdf files. for .png, .jpg and other graphic formats this isn't sucn an issue, and sometimes data is availble with the filename in exif data, but not always. So i ignored these files, because a graphic file viewer is enough mostly.
All this was vibecoded with gptme in a few minutes. I zipped the folder zip -er ../recovered.zip ./* (encrypt and recursively add all files in cwd) and uploaded it to our private file sharing facility.