8. Safe Updating - Using Shadow Registers

8.1. Description

The Zebra server supports updating of the index structures. That is, you can add, modify, or remove records from databases managed by Zebra without rebuilding the entire index. Since this process involves modifying structured files with various references between blocks of data in the files, the update process is inherently sensitive to system crashes, or to process interruptions: Anything but a successfully completed update process will leave the register files in an unknown state, and you will essentially have no recourse but to re-index everything, or to restore the register files from a backup medium. Further, while the update process is active, users cannot be allowed to access the system, as the contents of the register files may change unpredictably.

You can solve these problems by enabling the shadow register system in Zebra. During the updating procedure, zebraidx will temporarily write changes to the involved files in a set of "shadow files", without modifying the files that are accessed by the active server processes. If the update procedure is interrupted by a system crash or a signal, you simply repeat the procedure - the register files have not been changed or damaged, and the partially written shadow files are automatically deleted before the new updating procedure commences.

At the end of the updating procedure (or in a separate operation, if you so desire), the system enters a "commit mode". First, any active server processes are forced to access those blocks that have been changed from the shadow files rather than from the main register files; the unmodified blocks are still accessed at their normal location (the shadow files are not a complete copy of the register files - they only contain those parts that have actually been modified). If the commit process is interrupted at any point during the commit process, the server processes will continue to access the shadow files until you can repeat the commit procedure and complete the writing of data to the main register files. You can perform multiple update operations to the registers before you commit the changes to the system files, or you can execute the commit operation at the end of each update operation. When the commit phase has completed successfully, any running server processes are instructed to switch their operations to the new, operational register, and the temporary shadow files are deleted.

8.2. How to Use Shadow Register Files

The first step is to allocate space on your system for the shadow files. You do this by adding a shadow entry to the zebra.cfg file. The syntax of the shadow entry is exactly the same as for the register entry (see Section 7, “Register Location”). The location of the shadow area should be different from the location of the main register area (if you have specified one - remember that if you provide no register setting, the default register area is the working directory of the server and indexing processes).

The following excerpt from a zebra.cfg file shows one example of a setup that configures both the main register location and the shadow file area. Note that two directories or partitions have been set aside for the shadow file area. You can specify any number of directories for each of the file areas, but remember that there should be no overlaps between the directories used for the main registers and the shadow files, respectively.

      register: /d1:500M
      shadow: /scratch1:100M /scratch2:200M

When shadow files are enabled, an extra command is available at the zebraidx command line. In order to make changes to the system take effect for the users, you'll have to submit a "commit" command after a (sequence of) update operation(s).

      $ zebraidx update /d1/records
      $ zebraidx commit

Or you can execute multiple updates before committing the changes:

      $ zebraidx -g books update /d1/records  /d2/more-records
      $ zebraidx -g fun update /d3/fun-records
      $ zebraidx commit

If one of the update operations above had been interrupted, the commit operation on the last line would fail: zebraidx will not let you commit changes that would destroy the running register. You'll have to rerun all of the update operations since your last commit operation, before you can commit the new changes.

Similarly, if the commit operation fails, zebraidx will not let you start a new update operation before you have successfully repeated the commit operation. The server processes will keep accessing the shadow files rather than the (possibly damaged) blocks of the main register files until the commit operation has successfully completed.

You should be aware that update operations may take slightly longer when the shadow register system is enabled, since more file access operations are involved. Further, while the disk space required for the shadow register data is modest for a small update operation, you may prefer to disable the system if you are adding a very large number of records to an already very large database (we use the terms large and modest very loosely here, since every application will have a different perception of size). To update the system without the use of the the shadow files, simply run zebraidx with the -n option (note that you do not have to execute the commit command of zebraidx when you temporarily disable the use of the shadow registers in this fashion. Note also that, just as when the shadow registers are not enabled, server processes will be barred from accessing the main register while the update procedure takes place.