Recovering a failing SSD drive which drops off the bus on read error

By dose | August 21, 2016
Under: Uncategorized

This week I had to take a look at a machine with Windows 10 that randomly locked up during working with a certain application. The computer didn’t freeze, but the foreground application didn’t react. You could click around the desktop, but there also was no more reaction to user input by the operation system so that the onl choice was to press reset.

The system was equipped with a 120GB SSD drive as boot drive and after presseing RESET, the drive wasn’t detected anymore, so the BIOS showed “No boot device found”.
When powercycling the machine, the system was back to normal.

This rose suspicion that the SSD drive may be faulty, which got confirmed by doing a full drive scan with HDTune. It locked up during run and showed a bad sector of the drive. So unfortunately, it seems that the drive locked up when accessing a bad block on the media and never recovered from this fault and didn’t react to ATA Reset-commands.
So the drive had to be replaced, but as it was the system drive and a reinstall of the whole operating system would have been a tedious task (due to the lack of some applications’ installers, etc., blabla..), I tried recovering the disk to a new (bigger) SSD drive.

First try was using some Windows imaging applications like the free Macrium Reflect, but of course they failed due to the fact that the disk locked up during cloning when hitting a bad block.
So ddrescue to the rescue! This application also naturally aborts when the drive locks up, but it has a big advantage: It has the option to write a human-readable logfile to resume operations! So starting the first round was easy, just image until bad block (using -n, as scraping doesn’t work anyway if touching a bad block locks up the device. So scraping has to be done manually here):

ddrescue -n /dev/sdb sdb.dd sdb.log

Now first part of disk was written to image and on bad block, drive locked up, leaving behind a logfile that said: Good until block X, then everything bad. Now shutdown and restart machine to get SSD back working.
As ddrescue logfile format is human readable, it’s also easy to edit. The manual fortunately tells us about the logfile format:

The first non-comment line is the status line. It contains a non-negative integer and a status character. The integer is the position being tried in the input file. (The beginning of the block being tried in a forward pass or the end of the block in a backward pass).

The status character is one of these:

‘?’     copying non-tried blocks

Every line in the list of data blocks describes a block of data. It contains 2 non-negative integers and a status character. The first integer is the starting position of the block in the input file, the second integer is the size (in bytes) of the block. The status character is one of these:

‘?’     non-tried block
‘*’     failed block non-trimmed
‘/’     failed block non-scraped
‘-‘     failed block bad-sector(s)
‘+’     finished block

On second run, we don’t have to do much but copy the logfile to a new file (i.e. sdb2.log) and delete all the lines in the block list that don’t contain a “+” and setting status to “?” and then run in reverse direction with this new file to image from the other end of the disk until we hit a block again that is faulty:

ddrescue -R -n /dev/sdb sdb.dd sdb2.log

Now the device falls off again and we have the sdb2.log which now contains good blocks from the beginning, then a list of failed blocks and good blocks until the end. Now there can be a huge gap in between and we have to rescue data from good blocks in there too. So shutdown, powercycle, reboot again, copy sdb2.log to sdb3.log, then modify the lines that contain ‘/’ and ‘-‘ and are beyond the first faulty block to ‘?’, modify the status header from ‘+’ to ‘?’ again and try again in forward direction like in the first sample.. If it locks up, retry the step again (after power off and on) but leaving the faulty block you tried with ‘-‘ or ‘/’ and instead just mark the next blocks until you hit one that doesn’t lockup the machine. This way you can manually do the scraping in order to finally rescue as much data as possible from the drive to the image file.

Afterwards it can be written to the new drive (dd if=sdb.dd of=/dev/sdc bs=4096), as the newer drive is bigger, partitions can be moved and resized with gparted, but I had to ensure to use a very recent version of ntfs-3g for ntfsresize to work with new Windows 10 NTFS format, with older versions this step fails in gparted and I had to do it manually with recent version. Finally, the old drive was rescued and system was properly moved to the new drive.

One comment | Add One

Comments

  1. Me - 11/6/2017 at 23:36

    Thank you.

Trackbacks

Leave a Comment

Name:

E-Mail :

Subscribe :
Website :

Comments :