2. The Zebra Configuration File

The Zebra configuration file, read by zebraidx and zebrasrv defaults to zebra.cfg unless specified by -c option.

You can edit the configuration file with a normal text editor. parameter names and values are separated by colons in the file. Lines starting with a hash sign (#) are treated as comments.

If you manage different sets of records that share common characteristics, you can organize the configuration settings for each type into "groups". When zebraidx is run and you wish to address a given group you specify the group name with the -g option. In this case settings that have the group name as their prefix will be used by zebraidx. If no -g option is specified, the settings without prefix are used.

In the configuration file, the group name is placed before the option name itself, separated by a dot (.). For instance, to set the record type for group public to grs.sgml (the SGML-like format for structured records) you would write:

     public.recordType: grs.sgml

To set the default value of the record type to text write:

     recordType: text

The available configuration settings are summarized below. They will be explained further in the following sections.

group .recordType[.name]: type

Specifies how records with the file extension name should be handled by the indexer. This option may also be specified as a command line option (-t). Note that if you do not specify a name, the setting applies to all files. In general, the record type specifier consists of the elements (each element separated by dot), fundamental-type, file-read-type and arguments. Currently, two fundamental types exist, text and grs.

group.recordId: record-id-spec

Specifies how the records are to be identified when updated. See Section 3, “Locating Records”.

group.database: database

Specifies the Z39.50 database name.

group.storeKeys: boolean

Specifies whether key information should be saved for a given group of records. If you plan to update/delete this type of records later this should be specified as 1; otherwise it should be 0 (default), to save register space. See Section 5, “Indexing with File Record IDs”.

group.storeData: boolean

Specifies whether the records should be stored internally in the Zebra system files. If you want to maintain the raw records yourself, this option should be false (0). If you want Zebra to take care of the records for you, it should be true(1).

register: register-location

Specifies the location of the various register files that Zebra uses to represent your databases. See Section 7, “Register Location”.

shadow: register-location

Enables the safe update facility of Zebra, and tells the system where to place the required, temporary files. See Section 8, “Safe Updating - Using Shadow Registers”.

lockDir: directory

Directory in which various lock files are stored.

keyTmpDir: directory

Directory in which temporary files used during zebraidx's update phase are stored.

setTmpDir: directory

Specifies the directory that the server uses for temporary result sets. If not specified /tmp will be used.

profilePath: path

Specifies a path of profile specification files. The path is composed of one or more directories separated by colon. Similar to PATH for UNIX systems.

modulePath: path

Specifies a path of record filter modules. The path is composed of one or more directories separated by colon. Similar to PATH for UNIX systems. The 'make install' procedure typically puts modules in /usr/local/lib/idzebra-2.0/modules.

index: filename

Defines the filename which holds fields structure definitions. If omitted, the file default.idx is read. Refer to Section 1, “The default.idx file” for more information.

sortmax: integer

Specifies the maximum number of records that will be sorted in a result set. If the result set contains more than integer records, records after the limit will not be sorted. If omitted, the default value is 1,000.

staticrank: integer

Enables whether static ranking is to be enabled (1) or disabled (0). If omitted, it is disabled - corresponding to a value of 0. Refer to Section 9.2, “Static Ranking” .

estimatehits: integer

Controls whether Zebra should calculate approximate hit counts and at which hit count it is to be enabled. A value of 0 disables approximate hit counts. For a positive value approximate hit count is enabled if it is known to be larger than integer.

Approximate hit counts can also be triggered by a particular attribute in a query. Refer to Section 3.2.5, “Global Approximative Limit Attribute (type 12)”.

attset: filename

Specifies the filename(s) of attribute set files for use in searching. In many configurations bib1.att is used, but that is not required. If Classic Explain attributes is to be used for searching, explain.att must be given. The path to att-files in general can be given using profilePath setting. See also Section 3.4, “The Attribute Set (.att) Files”.

memMax: size

Specifies size of internal memory to use for the zebraidx program. The amount is given in megabytes - default is 8 (8 MB). The more memory, the faster large updates happen, up to about half the free memory available on the computer.

tempfiles: Yes/Auto/No

Tells zebra if it should use temporary files when indexing. The default is Auto, in which case zebra uses temporary files only if it would need more that memMax megabytes of memory. This should be good for most uses.

root: dir

Specifies a directory base for Zebra. All relative paths given (in profilePath, register, shadow) are based on this directory. This setting is useful if your Zebra server is running in a different directory from where zebra.cfg is located.

passwd: file

Specifies a file with description of user accounts for Zebra. The format is similar to that known to Apache's htpasswd files and UNIX' passwd files. Non-empty lines not beginning with # are considered account lines. There is one account per-line. A line consists of fields separate by a single colon character. First field is username, second is password.

passwd.c: file

Specifies a file with description of user accounts for Zebra. File format is similar to that used by the passwd directive except that the password are encrypted. Use Apache's htpasswd or similar for maintenance.

perm.user: permstring

Specifies permissions (privilege) for a user that are allowed to access Zebra via the passwd system. There are two kinds of permissions currently: read (r) and write(w). By default users not listed in a permission directive are given the read privilege. To specify permissions for a user with no username, or Z39.50 anonymous style use anonymous. The permstring consists of a sequence of characters. Include character w for write/update access, r for read access and a to allow anonymous access through this account.

dbaccess: accessfile

Names a file which lists database subscriptions for individual users. The access file should consists of lines of the form username: dbnames, where dbnames is a list of database names, separated by '+'. No whitespace is allowed in the database list.

encoding: charsetname

Tells Zebra to interpret the terms in Z39.50 queries as having been encoded using the specified character encoding. The default is ISO-8859-1; one useful alternative is UTF-8.

storeKeys: value

Specifies whether Zebra keeps a copy of indexed keys. Use a value of 1 to enable; 0 to disable. If storeKeys setting is omitted, it is enabled. Enabled storeKeys are required for updating and deleting records. Disable only storeKeys to save space and only plan to index data once.

storeData: value

Specifies whether Zebra keeps a copy of indexed records. Use a value of 1 to enable; 0 to disable. If storeData setting is omitted, it is enabled. A storeData setting of 0 (disabled) makes Zebra fetch records from the original locaction in the file system using filename, file offset and file length. For the DOM and ALVIS filter, the storeData setting is ignored.