Each month, we will introduce a disaster scenario and accept
submissions of recovery procedures from them. For examples of good
disasters, see my O'Reilly article
here.
If you're new to UML, you'll probably want to read the following pages
before starting to solve this month's problem:
This month's disaster involves a
filesystem that
mysteriously won't boot. Your job is to figure out what's wrong and
fix it. This one is fairly easy, so if you find it to be trivial,
don't complain.
Anyway, download the filesystem, uncompress it, make it boot, and tell
us below how you did it.
Submit your solution
If you have a solution to this month's problem and you want it to be
immortalized on this very site, submit it here.
I will pick one or more winning solutions based on criteria such as
originality - all else being equal, I like non-obvious solutions
subtlety - if applicable, small fixes are better than big ones
brevity - short and sweet is better than long and involved
parsimony - the fewer external resources you need, the better
Propose a disaster
If you have a scenario which you think would make a good Disaster of
the Month, please submit it here. If you have a good solution,
include it as well. Disasters which have actually happened in real
life are especially good, but anything which can happen on a physical
box is fine.
Each month, I will look over the submissions and choose an interesting
one to feature as that month's disaster of the month.
Last Month's Disaster
The December, 2001
involved zeroing the root filesystem superblock and attempting to
recover it. (Note : I am lame, but not so lame that I don't realize
that December, 2001 is not the month preceding May, 2002. However, I
am lame enough to have run out of disaster ideas after writing the
O'Reilly article and starting SDOTM. So, this page just sat
here sadly until Roger Binns had pity on it and sent me some more
disaster ideas, one of which is the May contest.)
Last Month's Solutions
A number of proposed solutions involved reinstalling or restoring from
backups. These were rejected on the basis of being overly
heavy-handed. Most of the rest of the valid answers involved
UML# e2fsck -n alternate superblock number
I
tossed out the ones that suggested using 8193 as the alternate
superblock. That didn't work for me, and it turns out to be dependent
on the filesystem block size.
So, I kept the answers that provided a means of determining the
superblock location either from the filesystem itself or from the
block size. The earliest such answer came from nicholasperez (at) (a
VERY impolite domain name):
Similar answers came from mgalgoci (at) parcelfarce (dot) linux (dot)
theplanet (dot) co (dot) uk, MonMotha, Phil, skepticman, bluebird (at)
dartmouth (dot) edu, and tjw.
In addition, I decided to name some honorable mentions:
Solutions which involved fixing the filesystem on the host lost points
because you can't do that with a lost superblock on a physical system,
but dan_a (at) gmx (dot) net submitted a solution which pulled the
filesystem apart and reassembled it with the fixed superblock.
Petru Paler broke the rules slightly and submitted an InstaFix (tm)
which assumes that you realize immediately what you did, and that dd
is present in the page cache:
This lost points for hardcoding 8193, but gained some back for
possibly fixing the problem immediately.
willmore suggested
UML# mke2fs -S /dev/ubd/0
followed by
an fsck. This will cause mke2fs to attempt a filesystem rebuild. The
e2fsck will leave a bunch of files in lost+found, which you will have
to poke through to identify. Any directories shouldn't be hard, since
you can identify them through their contents, but the normal files
could be a pain.