Data Recovery¶
Finding your Data¶
The first step in data recovery is finding the data that you want to recover. You need to decide whether you are going to to restore off backup media, or out of some existing staging data that has not yet been purged. The only difference is, if you purge staging data less frequently than once per week, you might have some data available in the staging directories which would not be found on your backup media, depending on how you rotate your media. (And of course, if your system is trashed or stolen, you probably will not have access to your old staging data in any case.)
Regardless of the data source you choose, you will find the data organized in the same way. The remainder of these examples will work off an example backup disc, but the contents of the staging directory will look pretty much like the contents of the disc, with data organized first by date and then by backup peer name.
This is the root directory of my example disc:
root:/mnt/cdrw# ls -l
total 4
drwxr-x--- 3 backup backup 4096 Sep 01 06:30 2005/
In this root directory is one subdirectory for each year represented in the backup. In this example, the backup represents data entirely from the year 2005. If your configured backup week happens to span a year boundary, there would be two subdirectories here (for example, one for 2005 and one for 2006).
Within each year directory is one subdirectory for each month represented in the backup.
root:/mnt/cdrw/2005# ls -l
total 2
dr-xr-xr-x 6 root root 2048 Sep 11 05:30 09/
In this example, the backup represents data entirely from the month of September, 2005. If your configured backup week happens to span a month boundary, there would be two subdirectories here (for example, one for August 2005 and one for September 2005).
Within each month directory is one subdirectory for each day represented in the backup.
root:/mnt/cdrw/2005/09# ls -l
total 8
dr-xr-xr-x 5 root root 2048 Sep 7 05:30 07/
dr-xr-xr-x 5 root root 2048 Sep 8 05:30 08/
dr-xr-xr-x 5 root root 2048 Sep 9 05:30 09/
dr-xr-xr-x 5 root root 2048 Sep 11 05:30 11/
Depending on how far into the week your backup media is from, you might have as few as one daily directory in here, or as many as seven.
Within each daily directory is a stage indicator (indicating when the directory was staged) and one directory for each peer configured in the backup:
root:/mnt/cdrw/2005/09/07# ls -l
total 10
dr-xr-xr-x 2 root root 2048 Sep 7 02:31 host1/
-r--r--r-- 1 root root 0 Sep 7 03:27 cback.stage
dr-xr-xr-x 2 root root 4096 Sep 7 02:30 host2/
dr-xr-xr-x 2 root root 4096 Sep 7 03:23 host3/
In this case, you can see that my backup includes three machines, and that the backup data was staged on September 7, 2005 at 03:27.
Within the directory for a given host are all of the files collected on that host. This might just include tarfiles from a normal Cedar Backup collect run, and might also include files “collected” from Cedar Backup extensions or by other third-party processes on your system.
root:/mnt/cdrw/2005/09/07/host1# ls -l
total 157976
-r--r--r-- 1 root root 11206159 Sep 7 02:30 boot.tar.bz2
-r--r--r-- 1 root root 0 Sep 7 02:30 cback.collect
-r--r--r-- 1 root root 3199 Sep 7 02:30 dpkg-selections.txt.bz2
-r--r--r-- 1 root root 908325 Sep 7 02:30 etc.tar.bz2
-r--r--r-- 1 root root 389 Sep 7 02:30 fdisk-l.txt.bz2
-r--r--r-- 1 root root 1003100 Sep 7 02:30 ls-laR.txt.bz2
-r--r--r-- 1 root root 19800 Sep 7 02:30 mysqldump.txt.bz2
-r--r--r-- 1 root root 4133372 Sep 7 02:30 opt-local.tar.bz2
-r--r--r-- 1 root root 44794124 Sep 8 23:34 opt-public.tar.bz2
-r--r--r-- 1 root root 30028057 Sep 7 02:30 root.tar.bz2
-r--r--r-- 1 root root 4747070 Sep 7 02:30 svndump-0:782-opt-svn-repo1.txt.bz2
-r--r--r-- 1 root root 603863 Sep 7 02:30 svndump-0:136-opt-svn-repo2.txt.bz2
-r--r--r-- 1 root root 113484 Sep 7 02:30 var-lib-jspwiki.tar.bz2
-r--r--r-- 1 root root 19556660 Sep 7 02:30 var-log.tar.bz2
-r--r--r-- 1 root root 14753855 Sep 7 02:30 var-mail.tar.bz2
As you can see, I back up variety of different things on host1. I run the normal collect action, as well as the sysinfo, mysql and subversion extensions. The resulting backup files are named in a way that makes it easy to determine what they represent.
Files of the form *.tar.bz2
represent directories backed up by the
collect action. The first part of the name (before “.tar.bz2”),
represents the path to the directory. For example, boot.tar.gz
contains data from /boot
, and var-lib-jspwiki.tar.bz2
contains
data from /var/lib/jspwiki
.
The fdisk-l.txt.bz2
, ls-laR.tar.bz2
and
dpkg-selections.tar.bz2
files are produced by the sysinfo extension.
The mysqldump.txt.bz2
file is produced by the mysql extension. It
represents a system-wide database dump, because I use the “all” flag in
configuration. If I were to configure Cedar Backup to dump individual
datbases, then the filename would contain the database name (something
like mysqldump-bugs.txt.bz2
).
Finally, the files of the form svndump-*.txt.bz2
are produced by the
subversion extension. There is one dump file for each configured
repository, and the dump file name represents the name of the repository
and the revisions in that dump. So, the file
svndump-0:782-opt-svn-repo1.txt.bz2
represents revisions 0-782 of
the repository at /opt/svn/repo1
. You can tell that this file
contains a full backup of the repository to this point, because the
starting revision is zero. Later incremental backups would have a
non-zero starting revision, i.e. perhaps 783-785, followed by 786-800,
etc.
Recovering Filesystem Data¶
Filesystem data is gathered by the standard Cedar Backup collect action.
This data is placed into files of the form *.tar
. The first part of
the name (before “.tar”), represents the path to the directory. For
example, boot.tar
would contain data from /boot
, and
var-lib-jspwiki.tar
would contain data from /var/lib/jspwiki
.
(As a special case, data from the root directory would be placed in
-.tar
). Remember that your tarfile might have a bzip2 (.bz2
) or
gzip (.gz
) extension, depending on what compression you specified in
configuration.
If you are using full backups every day, the latest backup data is always within the latest daily directory stored on your backup media or within your staging directory. If you have some or all of your directories configured to do incremental backups, then the first day of the week holds the full backups and the other days represent incremental differences relative to that first day of the week.
If you are restoring a home directory or some other non-system directory as part of a full restore, it is probably fine to extract the backup directly into the filesystem.
If you are restoring a system directory like /etc
as part of a full
restore, extracting directly into the filesystem is likely to break
things, especially if you re-installed a newer version of your operating
system than the one you originally backed up. It’s better to extract
directories like this to a temporary location and pick out only the
files you find you need.
When doing a partial restore, I suggest always extracting to a temporary location. Doing it this way gives you more control over what you restore, and helps you avoid compounding your original problem with another one (like overwriting the wrong file, oops).
Full Restore¶
To do a full system restore, find the newest applicable full backup and extract it. If you have some incremental backups, extract them into the same place as the full backup, one by one starting from oldest to newest. (This way, if a file changed every day you will always get the latest one.)
All of the backed-up files are stored in the tar file in a relative fashion, so you can extract from the tar file either directly into the filesystem, or into a temporary location.
For example, to restore boot.tar.bz2
directly into /boot
,
execute tar
from your root directory (/
):
root:/# bzcat boot.tar.bz2 | tar xvf -
Of course, use zcat
or just cat
, depending on what kind of
compression is in use.
If you want to extract boot.tar.gz
into a temporary location like
/tmp/boot
instead, just change directories first. In this case,
you’d execute the tar
command from within /tmp
instead of /
.
root:/tmp# bzcat boot.tar.bz2 | tar xvf -
Again, use zcat
or just cat
as appropriate.
For more information, you might want to check out the manpage or GNU
info documentation for the tar
command.
Partial Restore¶
Most users will need to do a partial restore much more frequently than a full restore. Perhaps you accidentally removed your home directory, or forgot to check in some version of a file before deleting it. Or, perhaps the person who packaged Apache for your system blew away your web server configuration on upgrade (it happens). The solution to these and other kinds of problems is a partial restore (assuming you’ve backed up the proper things).
The procedure is similar to a full restore. The specific steps depend on how much information you have about the file you are looking for. Where with a full restore, you can confidently extract the full backup followed by each of the incremental backups, this might not be what you want when doing a partial restore. You may need to take more care in finding the right version of a file — since the same file, if changed frequently, would appear in more than one backup.
Start by finding the backup media that contains the file you are looking for. If you rotate your backup media, and your last known “contact” with the file was a while ago, you may need to look on older media to find it. This may take some effort if you are not sure when the change you are trying to correct took place.
Once you have decided to look at a particular piece of backup media, find the correct peer (host), and look for the file in the full backup:
root:/tmp# bzcat boot.tar.bz2 | tar tvf - path/to/file
Of course, use zcat
or just cat
, depending on what kind of
compression is in use.
The tvf
tells tar
to search for the file in question and just
list the results rather than extracting the file. Note that the filename
is relative (with no starting /
). Alternately, you can omit the
path/to/file
and search through the output using more
or
less
If you haven’t found what you are looking for, work your way through the incremental files for the directory in question. One of them may also have the file if it changed during the course of the backup. Or, move to older or newer media and see if you can find the file there.
Once you have found your file, extract it using xvf
:
root:/tmp# bzcat boot.tar.bz2 | tar xvf - path/to/file
Again, use zcat
or just cat
as appropriate.
Inspect the file and make sure it’s what you’re looking for. Again, you may need to move to older or newer media to find the exact version of your file.
For more information, you might want to check out the manpage or GNU
info documentation for the tar
command.
Recovering MySQL Data¶
MySQL data is gathered by the Cedar Backup mysql extension. This extension always creates a full backup each time it runs. This wastes some space, but makes it easy to restore database data. The following procedure describes how to restore your MySQL database from the backup.
I am not a MySQL expert. I am providing this information for reference. I have tested these procedures on my own MySQL installation; however, I only have a single database for use by Bugzilla, and I may have misunderstood something with regard to restoring individual databases as a user other than root. If you have any doubts, test the procedure below before relying on it!
MySQL experts and/or knowledgable Cedar Backup users: feel free to write me and correct any part of this procedure.
First, find the backup you are interested in. If you have specified “all
databases” in configuration, you will have a single backup file, called
mysqldump.txt
. If you have specified individual databases in
configuration, then you will have files with names like
mysqldump-database.txt
instead. In either case, your file might have
a .gz
or .bz2
extension depending on what kind of compression
you specified in configuration.
If you are restoring an “all databases” backup, make sure that you have correctly created the root user and know its password. Then, execute:
daystrom:/# bzcat mysqldump.txt.bz2 | mysql -p -u root
Of course, use zcat
or just cat
, depending on what kind of
compression is in use.
Because the database backup includes CREATE DATABASE
SQL statements, this
command should take care of creating all of the databases within the backup, as
well as populating them.
If you are restoring a backup for a specific database, you have two choices. If you have a root login, you can use the same command as above:
daystrom:/# bzcat mysqldump-database.txt.bz2 | mysql -p -u root
Otherwise, you can create the database and its login first (or have someone create it) and then use a database-specific login to execute the restore:
daystrom:/# bzcat mysqldump-database.txt.bz2 | mysql -p -u user database
Again, use zcat
or just cat
as appropriate.
For more information on using MySQL, see the documentation on the MySQL
web site, http://mysql.org/, or the manpages for the mysql
and mysqldump
commands.
Recovering Subversion Data¶
Subversion data is gathered by the Cedar Backup subversion extension. Cedar Backup will create either full or incremental backups, but the procedure for restoring is the same for both. Subversion backups are always taken on a per-repository basis. If you need to restore more than one repository, follow the procedures below for each repository you are interested in.
First, find the backup or backups you are interested in. Typically, you will need the full backup from the first day of the week and each incremental backup from the other days of the week.
The subversion extension creates files of the form svndump-*.txt
.
These files might have a .gz
or .bz2
extension depending on what
kind of compression you specified in configuration. There is one dump
file for each configured repository, and the dump file name represents
the name of the repository and the revisions in that dump. So, the file
svndump-0:782-opt-svn-repo1.txt.bz2
represents revisions 0-782 of
the repository at /opt/svn/repo1
. You can tell that this file
contains a full backup of the repository to this point, because the
starting revision is zero. Later incremental backups would have a
non-zero starting revision, i.e. perhaps 783-785, followed by 786-800,
etc.
Next, if you still have the old Subversion repository around, you might want to just move it off (rename the top-level directory) before executing the restore. Or, you can restore into a temporary directory and rename it later to its real name once you’ve checked it out. That is what my example below will show.
Next, you need to create a new Subversion repository to hold the restored data. This example shows an FSFS repository, but that is an arbitrary choice. You can restore from an FSFS backup into a FSFS repository or a BDB repository. The Subversion dump format is “backend-agnostic”.
root:/tmp# svnadmin create --fs-type=fsfs testrepo
Next, load the full backup into the repository:
root:/tmp# bzcat svndump-0:782-opt-svn-repo1.txt.bz2 | svnadmin load testrepo
Of course, use zcat
or just cat
, depending on what kind of
compression is in use.
Follow that with loads for each of the incremental backups:
root:/tmp# bzcat svndump-783:785-opt-svn-repo1.txt.bz2 | svnadmin load testrepo
root:/tmp# bzcat svndump-786:800-opt-svn-repo1.txt.bz2 | svnadmin load testrepo
Again, use zcat
or just cat
as appropriate.
When this is done, your repository will be restored to the point of the last commit indicated in the svndump file (in this case, to revision 800).
Note: don’t be surprised if, when you test this, the restored directory doesn’t have exactly the same contents as the original directory. I can’t explain why this happens, but if you execute
svnadmin dump
on both old and new repositories, the results are identical. This means that the repositories do contain the same content.
For more information on using Subversion, see the book Version Control with Subversion (http://svnbook.red-bean.com/) or the Subversion FAQ (http://subversion.tigris.org/faq.html).
Recovering Mailbox Data¶
Mailbox data is gathered by the Cedar Backup mbox extension. Cedar Backup will create either full or incremental backups, but both kinds of backups are treated identically when restoring.
Individual mbox files and mbox directories are treated a little differently, since individual files are just compressed, but directories are collected into a tar archive.
First, find the backup or backups you are interested in. Typically, you will need the full backup from the first day of the week and each incremental backup from the other days of the week.
The mbox extension creates files of the form mbox-*
. Backup files
for individual mbox files might have a .gz
or .bz2
extension
depending on what kind of compression you specified in configuration.
Backup files for mbox directories will have a .tar
, .tar.gz
or
.tar.bz2
extension, again depending on what kind of compression you
specified in configuration.
There is one backup file for each configured mbox file or directory. The
backup file name represents the name of the file or directory and the
date it was backed up. So, the file
mbox-20060624-home-user-mail-greylist
represents the backup for
/home/user/mail/greylist
run on 24 Jun 2006. Likewise,
mbox-20060624-home-user-mail.tar
represents the backup for the
/home/user/mail
directory run on that same date.
Once you have found the files you are looking for, the restoration procedure is fairly simple. First, concatenate all of the backup files together. Then, use grepmail to eliminate duplicate messages (if any).
Here is an example for a single backed-up file:
root:/tmp# rm restore.mbox # make sure it's not left over
root:/tmp# cat mbox-20060624-home-user-mail-greylist >> restore.mbox
root:/tmp# cat mbox-20060625-home-user-mail-greylist >> restore.mbox
root:/tmp# cat mbox-20060626-home-user-mail-greylist >> restore.mbox
root:/tmp# grepmail -a -u restore.mbox > nodups.mbox
At this point, nodups.mbox
contains all of the backed-up messages
from /home/user/mail/greylist
.
Of course, if your backups are compressed, you’ll have to use zcat
or bzcat
rather than just cat
.
If you are backing up mbox directories rather than individual files, see the filesystem instructions for notes on now to extract the individual files from inside tar archives. Extract the files you are interested in, and then concatenate them together just like shown above for the individual case.
Recovering Data split by the Split Extension¶
The Split extension takes large files and splits them up into smaller
files. Typically, it would be used in conjunction with the
cback3-span
command.
The split up files are not difficult to work with. Simply find all of the files — which could be split between multiple discs — and concatenate them together.
root:/tmp# rm usr-src-software.tar.gz # make sure it's not there
root:/tmp# cat usr-src-software.tar.gz_00001 >> usr-src-software.tar.gz
root:/tmp# cat usr-src-software.tar.gz_00002 >> usr-src-software.tar.gz
root:/tmp# cat usr-src-software.tar.gz_00003 >> usr-src-software.tar.gz
Then, use the resulting file like usual.
Remember, you need to have all of the files that the original large file was split into before this will work. If you are missing a file, the result of the concatenation step will be either a corrupt file or a truncated file (depending on which chunks you did not include).
Previous: Dependencies • Next: Securing Password-less SSH Connections