The Sysadmin Disaster of the Month

Site Home Page
The UML Wiki
UML Community Site
The UML roadmap
What it's good for
Case Studies
Kernel Capabilities
Downloading it
Running it
Compiling
Installation
Skas Mode
Incremental Patches
Test Suite
Host memory use
Building filesystems
Troubles
User Contributions
Related Links
The ToDo list
Projects
Diary
Thanks
Contacts

Tutorials
The HOWTO (html)
The HOWTO (text)
Host file access
Device inputs
Sharing filesystems
Creating filesystems
Resizing filesystems
Virtual Networking
Management Console
Kernel Debugging
UML Honeypots
gprof and gcov
Running X
Diagnosing problems
Configuration
Installing Slackware
Porting UML
IO memory emulation
UML on 2G/2G hosts
Adding a UML system call
Running nested UMLs

How you can help
Overview
Documentation
Utilities
Kernel bugs
Kernel projects

Screenshots
A virtual network
An X session

Transcripts
A login session
A debugging session
Slackware installation

Reference
Kernel switches
Slackware README

Papers
ALS 2000 paper (html)
ALS 2000 paper (TeX)
ALS 2000 slides
LCA 2001 slides
OLS 2001 paper (html)
OLS 2001 paper (TeX)
ALS 2001 paper (html)
ALS 2001 paper (TeX)
UML security (html)
LCA 2002 (html)
WVU 2002 (html)
Security Roundtable (html)
OLS 2002 slides
LWE 2005 slides

Fun and Games
Kernel Hangman
Disaster of the Month

The Sysadmin Disaster of the Month

Each month, we will introduce a disaster scenario and accept submissions of recovery procedures from them. For examples of good disasters, see my O'Reilly article here.

If you're new to UML, you'll probably want to read the following pages before starting to solve this month's problem:

The May 2002 Disaster

This month's disaster involves a filesystem that mysteriously won't boot. Your job is to figure out what's wrong and fix it. This one is fairly easy, so if you find it to be trivial, don't complain.
Anyway, download the filesystem, uncompress it, make it boot, and tell us below how you did it.

Submit your solution

If you have a solution to this month's problem and you want it to be immortalized on this very site, submit it here.
I will pick one or more winning solutions based on criteria such as

originality - all else being equal, I like non-obvious solutions

subtlety - if applicable, small fixes are better than big ones

brevity - short and sweet is better than long and involved

parsimony - the fewer external resources you need, the better

Who you are (any identification is OK, including none at all)

Your billiant recovery:

Propose a disaster

If you have a scenario which you think would make a good Disaster of the Month, please submit it here. If you have a good solution, include it as well. Disasters which have actually happened in real life are especially good, but anything which can happen on a physical box is fine.
Each month, I will look over the submissions and choose an interesting one to feature as that month's disaster of the month.

Who you are (any identification is OK, including none at all)

Your disaster:

Last Month's Disaster

The December, 2001 involved zeroing the root filesystem superblock and attempting to recover it. (Note : I am lame, but not so lame that I don't realize that December, 2001 is not the month preceding May, 2002. However, I am lame enough to have run out of disaster ideas after writing the O'Reilly article and starting SDOTM. So, this page just sat here sadly until Roger Binns had pity on it and sent me some more disaster ideas, one of which is the May contest.)

Last Month's Solutions

A number of proposed solutions involved reinstalling or restoring from backups. These were rejected on the basis of being overly heavy-handed. Most of the rest of the valid answers involved
UML# e2fsck -n alternate superblock number
I tossed out the ones that suggested using 8193 as the alternate superblock. That didn't work for me, and it turns out to be dependent on the filesystem block size.
So, I kept the answers that provided a means of determining the superblock location either from the filesystem itself or from the block size. The earliest such answer came from nicholasperez (at) (a VERY impolite domain name):
                
backup_sblock=`mke2fs -n /dev/ubd/1 | tail -2 | grep , | awk -F, '{print $1}'`
fsck.ext2 -b $backup_sblock /dev/ubd/1

              
Similar answers came from mgalgoci (at) parcelfarce (dot) linux (dot) theplanet (dot) co (dot) uk, MonMotha, Phil, skepticman, bluebird (at) dartmouth (dot) edu, and tjw.
In addition, I decided to name some honorable mentions:

Solutions which involved fixing the filesystem on the host lost points because you can't do that with a lost superblock on a physical system, but dan_a (at) gmx (dot) net submitted a solution which pulled the filesystem apart and reassembled it with the fixed superblock.

Petru Paler broke the rules slightly and submitted an InstaFix (tm) which assumes that you realize immediately what you did, and that dd is present in the page cache:
UML# dd if=/dev/ubd/0 of=/dev/ubd/0 seek=1 bs=1024 count=1 skip=8193
This lost points for hardcoding 8193, but gained some back for possibly fixing the problem immediately.

willmore suggested
UML# mke2fs -S /dev/ubd/0
followed by an fsck. This will cause mke2fs to attempt a filesystem rebuild. The e2fsck will leave a bunch of files in lost+found, which you will have to poke through to identify. Any directories shouldn't be hard, since you can identify them through their contents, but the normal files could be a pain.

Hosted at