Surviving a Linux Filesystem Failures

When you use term filesystem failure, you mean corrupted filesystem data structures (or objects such as inode, directories, superblock etc. This can be caused by any one of the following reason:

* Mistakes by Linux/UNIX Sys admin
* Buggy device driver or utilities (especially third party utilities)
* Power outage (very rarer on production system) due to UPS failure
* Kernel bugs (that is why you don’t run latest kernel on production Linux/UNIX system, most of time you need to use stable kernel release)

Due to filesystem failure:

  • File system will refuse to mount
  • Entire system get hangs
  • Even if filesystem mount operation result into success, users may notice strange behavior when mounted such as system reboot, gibberish characters in directory listings etc

So how the hell you are gonna Surviving a Filesystem Failures? Most of time fsck (front end to ext2/ext3 utility) can fix the problem, first simply run e2fsck – to check a Linux ext2/ext3 file system (assuming /home [/dev/sda3 partition] filesystem for demo purpose), first unmount /dev/sda3 then type following command :
# e2fsck -f /dev/sda3
Where,

  • -f : Force checking even if the file system seems clean.

Please note that If the superblock is not found, e2fsck will terminate with a fatal error. However Linux maintains multiple redundant copies of the superblock in every file system, so you can use -b {alternative-superblock} option to get rid of this problem. The location of the backup superblock is dependent on the filesystem’s blocksize:

  • For filesystems with 1k blocksizes, a backup superblock can be found at block 8193
  • For filesystems with 2k blocksizes, at block 16384
  • For 4k blocksizes, at block 32768.

Tip you can also try any one of the following command(s) to determine alternative-superblock locations:
# mke2fs -n /dev/sda3
OR
# dumpe2fs /dev/sda3|grep -i superblock
To repair file system by alternative-superblock use command as follows:
# e2fsck -f -b 8193 /dev/sda3

However it is highly recommended that you make backup before you run fsck command on system, use dd command to create a backup (provided that you have spare space under /disk2)
# dd if=/dev/sda2 of=/disk2/backup-sda2.img

If you are using Sun Solaris UNIX, see howto: Restoring a Bad Superblock.

Please note that things started to get complicated if hard disk participates in software RAID array. Take a look at Software-RAID HOWTO – Error Recovery. This article/tip is part of Understanding UNIX/Linux file system series, Continue reading rest of the Understanding Linux file system series (this is part III):

  • Part I – Understanding Linux superblock
  • Part II – Understanding Linux superblock
  • Part III – An example of Surviving a Linux Filesystem Failures
  • Part IV – Understanding filesystem Inodes
  • Part V – Understanding filesystem directories
  • Part VI – Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII – Why isn’t it possible to create hard links across file system boundaries?

🥺 Was this helpful? Please add a comment to show your appreciation or feedback.

nixCrat Tux Pixel Penguin
Hi! 🤠
I'm Vivek Gite, and I write about Linux, macOS, Unix, IT, programming, infosec, and open source. Subscribe to my RSS feed or email newsletter for updates.

33 comments… add one
  • Dash Feb 16, 2013 @ 3:21

    No matter what I do, I get following msg.

    root@tdsrv002 [~]# e2fsck -f -b 32768 -y /dev/xvdj
    e2fsck 1.41.12 (17-May-2010)
    e2fsck: Bad magic number in super-block while trying to open /dev/xvdj

    The superblock could not be read or does not describe a correct ext2
    filesystem. If the device is valid and it really contains an ext2
    filesystem (and not swap or ufs or something else), then the superblock
    is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193

    I tried all block locations which I found here.

    root@tdsrv002 [~]# mke2fs -n /dev/xvdj
    mke2fs 1.41.12 (17-May-2010)
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    Stride=0 blocks, Stripe width=0 blocks
    1966080 inodes, 7864320 blocks
    393216 blocks (5.00%) reserved for the super user
    First data block=0
    Maximum filesystem blocks=4294967296
    240 block groups
    32768 blocks per group, 32768 fragments per group
    8192 inodes per group
    Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
    4096000

    Thanks for any hint to fix this issue.

    • Balachandar Jan 8, 2015 @ 14:31

      Hi,

      Did you find way to fix this issue. I’m facing same problem. Help me to recover data in HDD.

      Thannks,
      Balachandar

  • sreejith Aug 18, 2013 @ 6:26

    helpful tutorial.
    thanks.

  • Catatan Belajar Sep 24, 2013 @ 7:12

    Ubuntu forum brought me here.
    Thank you very helpful.

  • Spuffler Oct 6, 2013 @ 19:34

    So far, sounds useful. My issue is not the root FS, so any live CD isn’t needed. My issue at the moment is the USB hosted external HDD. By the time I see a reply, I will probably have made a decision that finalizes the matter, so this post is mostly about educating the ‘future reader’.

    While manipulating files in KDEs Dolphin (KDE 4.7.4), I suddenly lost access to a directory on that external HDD. The directory holds valuable files, naturally, and this IS my backup device. FWIW, I’ve seen KDE having many issues in my mode of operations, no matter whether on a backup device or on a day-to-day flash device. In this instance, I’m pretty sure I botched it myself.

    Dolphin file manager allows a bash prompt child window, and I tend to use this from time to time – I often need to change permissions, owners, etc. of files that I have collected over the years. An example is where I’m mixing files over several years storage, so I see problems that I can only guess are related to user group for my normal username being group ID 101 for Mandrake files, and 1000 for current Kubuntu systems. All I can tell is that everything ‘looks’ correct, from what Dolphin displays, and yet files don’t open, or pop up an error message when moved within Dolphin. I then look at permissions within Dolphin, and clearly see a problem that I can fix. In the latest error, I needed to issue chmod 755 for a directory of files. I’ve tried using the Dolphin GUI to do this in the past, and seen it refuse to act despite displaying no errors, so today, as learned from the prior experiences, I used the bash propmt at the child window, and issued the chmod command. The prompt returned almost immediately, bu I neglected to sync the changes. I then closed that Dolphin window, and I thnik that caused the chmod to get zombied. In any event, the files and the directory thet are in cannot be accessed.

    I have run FSCK -y -b xxxx /dev/sdc1 for EVERY superblock I found, no joy. Despite displaying many thousands of correction messages:
    Free inodes count wrong for group #371 (8192, counted=8180).
    Fix? yes
    Free inodes count wrong for group #372 (8192, counted=8190).
    Fix? yes
    Free inodes count wrong for group #373 (8192, counted=8186).
    Fix? yes
    Free inodes count wrong for group #374 (8192, counted=8190).
    Fix? yes
    Free inodes count wrong for group #375 (8192, counted=8186).
    Fix? yes
    Free inodes count wrong for group #376 (8192, counted=8150).
    Fix? yes

    After these positive sounding messages, I mounted the device, still could not get into the directory, unmounted the device, and repeated the fsck with a new superblock but cannot even ls -la the directory from a Konsole.

    Pretty sure I killed it when I didn’t look (in Dolphin) for the chmod command results on that directory. But, any suggestions on what could be done when fsck achieves no improvement?

  • Spuffler Oct 6, 2013 @ 19:36

    And by try a new superblock, I mean I tried ALL of the superblocks.

  • Link Feb 4, 2014 @ 10:17

    Is there any way that we can predict a file system corruption?
    And/or is there any tool that we can use?

    Open for any suggestions.
    Thank you so much!

  • Yoga Apr 23, 2015 @ 8:29

    Thank you so much, i have same problem with the file system corruption & i try with command line # sudo e2fsck -f -b 8193 -y /dev/sda6 –> WORKING …
    that you so much is very very helpful …
    Finally I can get my data …

  • slese Sep 12, 2015 @ 6:50

    Worked great. Thx for the info

  • davi Dec 5, 2016 @ 17:28

    Thank you so much, saved my life.

  • Ridbowt Jan 10, 2017 @ 2:27

    It works, but when I reboot – again the same error comes… Ubuntu Mate 16.10 x64.

  • Dante Nov 22, 2017 @ 15:11

    Thanks you!

  • Alfredo Nov 24, 2017 @ 13:03

    What if the drive is “swap”?

Leave a Reply

Your email address will not be published. Required fields are marked *

Use HTML <pre>...</pre> for code samples. Your comment will appear only after approval by the site admin.