Chapter 7. DOM XML Record Model and Filter Module

Table of Contents

1. DOM Record Filter Architecture
2. DOM XML filter pipeline configuration
2.1. Input pipeline
2.2. Extract pipeline
2.3. Store pipeline
2.4. Retrieve pipeline
2.5. Canonical Indexing Format
2.5.1. Processing-instruction governed indexing format
2.5.2. Magic element governed indexing format
2.5.3. Semantics of the indexing formats
3. DOM Record Model Configuration
3.1. DOM Indexing Configuration
3.2. DOM Indexing MARCXML
3.3. DOM Indexing Wizardry
3.4. Debuggig DOM Filter Configurations

The record model described in this chapter applies to the fundamental, structured XML record type DOM, introduced in Section 2.5.1, “DOM XML Record Model and Filter Module”. The DOM XML record model is experimental, and its inner workings might change in future releases of the Zebra Information Server.

1. DOM Record Filter Architecture

The DOM XML filter uses a standard DOM XML structure as internal data model, and can therefore parse, index, and display any XML document type. It is well suited to work on standardized XML-based formats such as Dublin Core, MODS, METS, MARCXML, OAI-PMH, RSS, and performs equally well on any other non-standard XML format.

A parser for binary MARC records based on the ISO2709 library standard is provided, it transforms these to the internal MARCXML DOM representation. Other binary document parsers are planned to follow.

The DOM filter architecture consists of four different pipelines, each being a chain of arbitrarily many successive XSLT transformations of the internal DOM XML representations of documents.

Figure 7.1. DOM XML filter architecture

[Here there should be a diagram showing the DOM XML filter architecture, but is seems that your tool chain has not been able to include the diagram in this document.]

Table 7.1. DOM XML filter pipelines overview

inputfirstinput parsing and initial transformations to common XML formatInput raw XML record buffers, XML streams and binary MARC buffersCommon XML DOM
extractsecondindexing term extraction transformationsCommon XML DOMIndexing XML DOM
storesecond transformations before internal document storageCommon XML DOMStorage XML DOM
retrievethirdmultiple document retrieve transformations from storage to different output formats are possibleStorage XML DOMOutput XML syntax in requested formats

The DOM XML filter pipelines use XSLT (and if supported on your platform, even EXSLT), it brings thus full XPATH support to the indexing, storage and display rules of not only XML documents, but also binary MARC records.