Cedar Backup was designed — and is still primarily focused — around weekly backups to a single CD or DVD. Most users who back up more data than fits on a single disc seem to stop their backup process at the stage step, using Cedar Backup as an easy way to collect data.
However, some users have expressed a need to write these large kinds of backups to disc — if not every day, then at least occassionally. The cback-span tool was written to meet those needs. If you have staged more data than fits on a single CD or DVD, you can use cback-span to split that data between multiple discs.
cback-span is not a general-purpose disc-splitting tool. It is a specialized program that requires Cedar Backup configuration to run. All it can do is read Cedar Backup configuration, find any staging directories that have not yet been written to disc, and split the files in those directories between discs.
cback-span accepts many of the same command-line options as cback, but must be run interactively. It cannot be run from cron. This is intentional. It is intended to be a useful tool, not a new part of the backup process (that is the purpose of an extension).
In order to use cback-span, you must configure your backup such that the largest individual backup file can fit on a single disc. The command will not split a single file onto more than one disc. All it can do is split large directories onto multiple discs. Files in those directories will be arbitrarily split up so that space is utilized most efficiently.
The cback-span command has the following syntax:
Usage: cback-span [switches] Cedar Backup 'span' tool. This Cedar Backup utility spans staged data between multiple discs. It is a utility, not an extension, and requires user interaction. The following switches are accepted, mostly to set up underlying Cedar Backup functionality: -h, --help Display this usage/help listing -V, --version Display version information -b, --verbose Print verbose output as well as logging to disk -c, --config Path to config file (default: /etc/cback.conf) -l, --logfile Path to logfile (default: /var/log/cback.log) -o, --owner Logfile ownership, user:group (default: root:adm) -m, --mode Octal logfile permissions mode (default: 640) -O, --output Record some sub-command (i.e. cdrecord) output to the log -d, --debug Write debugging information to the log (implies --output) -s, --stack Dump a Python stack trace instead of swallowing exceptions
-h
, --help
Display usage/help listing.
-V
, --version
Display version information.
-b
, --verbose
Print verbose output to the screen as well writing to the logfile. When this option is enabled, most information that would normally be written to the logfile will also be written to the screen.
-c
, --config
Specify the path to an alternate configuration file.
The default configuration file is /etc/cback.conf
.
-l
, --logfile
Specify the path to an alternate logfile. The default
logfile file is /var/log/cback.log
.
-o
, --owner
Specify the ownership of the logfile, in the form
user:group
. The default ownership is
root:adm
, to match the Debian standard
for most logfiles. This value will only be used when
creating a new logfile. If the logfile already exists when
the cback command is executed, it will
retain its existing ownership and mode. Only user and group
names may be used, not numeric uid and gid values.
-m
, --mode
Specify the permissions for the logfile, using the
numeric mode as in chmod(1). The default mode is
0640
(-rw-r-----
).
This value will only be used when creating a new logfile.
If the logfile already exists when the
cback command is executed, it will retain
its existing ownership and mode.
-O
, --output
Record some sub-command output to the logfile. When this option is enabled, all output from system commands will be logged. This might be useful for debugging or just for reference. Cedar Backup uses system commands mostly for dealing with the CD/DVD recorder and its media.
-d
, --debug
Write debugging information to the logfile. This option
produces a high volume of output, and would generally only
be needed when debugging a problem. This option implies
the --output
option, as well.
-s
, --stack
Dump a Python stack trace instead of swallowing exceptions. This forces Cedar Backup to dump the entire Python stack trace associated with an error, rather than just propagating last message it received back up to the user interface. Under some circumstances, this is useful information to include along with a bug report.
As discussed above, the cback-span is an interactive command. It cannot be run from cron.
You can typically use the default answer for most questions. The only two questions that you may not want the default answer for are the fit algorithm and the cushion percentage.
The cushion percentage is used by cback-span to determine what capacity to shoot for when splitting up your staging directories. A 650 MB disc does not fit fully 650 MB of data. It's usually more like 627 MB of data. The cushion percentage tells cback-span how much overhead to reserve for the filesystem. The default of 4% is usually OK, but if you have problems you may need to increase it slightly.
The fit algorithm tells cback-span how it should determine which items should be placed on each disc. If you don't like the result from one algorithm, you can reject that solution and choose a different algorithm.
The four available fit algorithms are:
The worst-fit algorithm.
The worst-fit algorithm proceeds through a sorted list of items (sorted from smallest to largest) until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. The algorithm effectively includes the maximum number of items possible in its search for optimal capacity utilization. It tends to be somewhat slower than either the best-fit or alternate-fit algorithm, probably because on average it has to look at more items before completing.
The best-fit algorithm.
The best-fit algorithm proceeds through a sorted list of items (sorted from largest to smallest) until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. The algorithm effectively includes the minimum number of items possible in its search for optimal capacity utilization. For large lists of mixed-size items, it's not unusual to see the algorithm achieve 100% capacity utilization by including fewer than 1% of the items. Probably because it often has to look at fewer of the items before completing, it tends to be a little faster than the worst-fit or alternate-fit algorithms.
The first-fit algorithm.
The first-fit algorithm proceeds through an unsorted list of items until running out of items or meeting capacity exactly. If capacity is exceeded, the item that caused capacity to be exceeded is thrown away and the next one is tried. This algorithm generally performs more poorly than the other algorithms both in terms of capacity utilization and item utilization, but can be as much as an order of magnitude faster on large lists of items because it doesn't require any sorting.
A hybrid algorithm that I call alternate-fit.
This algorithm tries to balance small and large items to achieve better end-of-disk performance. Instead of just working one direction through a list, it alternately works from the start and end of a sorted list (sorted from smallest to largest), throwing away any item which causes capacity to be exceeded. The algorithm tends to be slower than the best-fit and first-fit algorithms, and slightly faster than the worst-fit algorithm, probably because of the number of items it considers on average before completing. It often achieves slightly better capacity utilization than the worst-fit algorithm, while including slightly fewer items.
Below is a log showing a sample cback-span run.
================================================ Cedar Backup 'span' tool ================================================ This the Cedar Backup span tool. It is used to split up staging data when that staging data does not fit onto a single disc. This utility operates using Cedar Backup configuration. Configuration specifies which staging directory to look at and which writer device and media type to use. Continue? [Y/n]: === Cedar Backup store configuration looks like this: Source Directory...: /tmp/staging Media Type.........: cdrw-74 Device Type........: cdwriter Device Path........: /dev/cdrom Device SCSI ID.....: None Drive Speed........: None Check Data Flag....: True No Eject Flag......: False Is this OK? [Y/n]: === Please wait, indexing the source directory (this may take a while)... === The following daily staging directories have not yet been written to disc: /tmp/staging/2007/02/07 /tmp/staging/2007/02/08 /tmp/staging/2007/02/09 /tmp/staging/2007/02/10 /tmp/staging/2007/02/11 /tmp/staging/2007/02/12 /tmp/staging/2007/02/13 /tmp/staging/2007/02/14 The total size of the data in these directories is 1.00 GB. Continue? [Y/n]: === Based on configuration, the capacity of your media is 650.00 MB. Since estimates are not perfect and there is some uncertainly in media capacity calculations, it is good to have a "cushion", a percentage of capacity to set aside. The cushion reduces the capacity of your media, so a 1.5% cushion leaves 98.5% remaining. What cushion percentage? [4.00]: === The real capacity, taking into account the 4.00% cushion, is 627.25 MB. It will take at least 2 disc(s) to store your 1.00 GB of data. Continue? [Y/n]: === Which algorithm do you want to use to span your data across multiple discs? The following algorithms are available: first....: The "first-fit" algorithm best.....: The "best-fit" algorithm worst....: The "worst-fit" algorithm alternate: The "alternate-fit" algorithm If you don't like the results you will have a chance to try a different one later. Which algorithm? [worst]: === Please wait, generating file lists (this may take a while)... === Using the "worst-fit" algorithm, Cedar Backup can split your data into 2 discs. Disc 1: 246 files, 615.97 MB, 98.20% utilization Disc 2: 8 files, 412.96 MB, 65.84% utilization Accept this solution? [Y/n]: n === Which algorithm do you want to use to span your data across multiple discs? The following algorithms are available: first....: The "first-fit" algorithm best.....: The "best-fit" algorithm worst....: The "worst-fit" algorithm alternate: The "alternate-fit" algorithm If you don't like the results you will have a chance to try a different one later. Which algorithm? [worst]: alternate === Please wait, generating file lists (this may take a while)... === Using the "alternate-fit" algorithm, Cedar Backup can split your data into 2 discs. Disc 1: 73 files, 627.25 MB, 100.00% utilization Disc 2: 181 files, 401.68 MB, 64.04% utilization Accept this solution? [Y/n]: y === Please place the first disc in your backup device. Press return when ready. === Initializing image... Writing image to disc...