OpenSP provides a generic API in addition to its native API. The generic interface is much simpler than the native interface. It is generic in the sense that it could be easily implemented using parsers other than OpenSP. It provides all ESIS information as well as some other information about the instance that is commonly needed by applications. However, it doesn't provide access to all information available from OpenSP; in particular, it doesn't provide information about the DTD. It is also slightly less efficient than the native interface.
The interface uses two related abstract classes. An
SGMLApplication
is an object that can handle a number of
different kinds of event which correspond to information in an SGML
document. An EventGenerator
is an object that can
generate a sequence of events of the kinds handled by an
SGMLApplication
. The
ParserEventGeneratorKit
class makes an
EventGenerator
that generates events using OpenSP.
SGMLApplication
has a number of local types that are used
in several contexts:
Char
unsigned short
if
SP_MULTI_BYTE
is defined and unsigned char
otherwise.
CharString
Char
.
It has the following members:
const Char *ptr
Char
s of the string.
size_t len
Char
s in the string.
Location
OpenEntityPtr
and a Position
. The CharString
s in it will
remain valid as long as the OpenEntity
that is pointed to
by the OpenEntityPtr
that was used to construct it
remains valid.
It has the following members:
unsigned long lineNumber
(unsigned long)-1
if invalid.
unsigned long columnNumber
(unsigned long)-1
if invalid.
unsigned long byteOffset
(unsigned long)-1
if invalid.
unsigned long entityOffset
(unsigned long)-1
if invalid.
CharString entityName
CharString filename
const void *other
When a location is in an internal entity, the location of the reference to the entity will be used instead.
OpenEntity
OpenEntity
is, in conjunction with a
Position
, to create a Location
. An
OpenEntity
is accessed using an
OpenEntityPtr
.
OpenEntityPtr
OpenEntity
.
Position
Position
is completely determined by the
OpenEntity
object with which it is associated. The only
use for an Position
is, in conjunction with an
OpenEntity
, to create a Location
.
ExternalId
bool haveSystemId
CharString systemId
havePublicId
is true.
bool havePublicId
CharString publicId
havePublicId
is true.
bool haveGeneratedSystemId
CharString generatedSystemId
haveGeneratedSystemId
is true.
Notation
CharString name
ExternalId externalId
Entity
CharString name
Entity::DataType dataType
Entity::DataType
is a local enum with the following possible
values:
Entity::sgml
Entity::cdata
Entity::sdata
Entity::ndata
Entity::subdoc
Entity::pi
Entity::DeclType declType
Entity::DeclType
is a local enum with the following possible
values:
Entity::general
Entity::parameter
Entity::doctype
Entity::linktype
bool isInternal
CharString text
isInternal
is true.
ExternalId externalId
isInternal
is false.
const Attribute *attributes
isInternal
is false.
size_t nAttributes
isInternal
is false.
Notation notation
isInternal
is false.
Attribute
CharString name
Attribute::Type type
Attribute::Type
is a local type with the following possible
values:
Attribute::invalid
Attribute::implied
Attribute::cdata
Attribute::tokenized
Attribute::Defaulted defaulted
Attribute::Defaulted
is a local enum with the following
possible values:
Attribute::specified
Attribute::definition
Attribute::current
size_t nCdataChunks
Attribute::CdataChunk
s comprising the value
of the attribute. Valid only if type
is
cdata
.
const Attribute::CdataChunk *cdataChunks
Attribute::CdataChunk
s comprising the value of this attribute.
Valid only if type
is cdata
.
Attribute::CdataChunk
is a local struct with the
following members:
bool isSdata
CharString data
CharString entityName
isSdata
is true.
This is non-ESIS information.
CharString tokens
type
is Attribute::tokenized
.
bool isId
size_t nEntities
const Entity *entities
Notation notation
For each event xyzEvent
handled by
SGMLApplication
, there is a virtual function of
SGMLApplication
named xyz
to
handle the event, and a local struct of SGMLApplication
named XyzEvent
.
Pointers within an event xyzEvent
are valid
only during the call to xyz
. None of the
structs in events have copy constructors or assignment operators
defined. It is up to the event handling function to make a copy of
any data that it needs to preserve after the function returns.
Except as otherwise stated, the information in events is ESIS information. All position information is non-ESIS information.
There are the following types of event:
AppinfoEvent
Position pos
bool none
CharString string
none
is false.
PiEvent
Position pos
CharString data
CharString entityName
StartElementEvent
Position pos
CharString gi
Element::ContentType contentType
Element::ContentType
is an enum with the following
possible values:
Element::empty
Element::cdata
Element::rcdata
Element::mixed
Element::element
bool included
size_t nAttributes
const Attribute *attributes
EndElementEvent
Position pos
CharString gi
DataEvent
DataEvent
s.
The event has the following members:
Position pos
CharString data
SdataEvent
Position pos
CharString text
CharString entityName
ExternalDataEntityRefEvent
Position pos
Entity entity
SubdocEntityRefEvent
Position pos
Entity entity
StartDtdEvent
Position pos
CharString name
bool haveExternalId
ExternalId externalId
EndDtdEvent
Position pos
CharString name
EndPrologEvent
Position pos
GeneralEntityEvent
ParserEventGeneratorKit::outputGeneralEntities
option is
enabled. This is non-ESIS information. The event has the following
members:
Entity entity
No event will be generated for the declaration of the
#default
entity; instead an event will be generated when
an entity reference uses the #default
entity if that is
the first time on which an entity with that name is used. This means
that GeneralEntityEvent
can occur after the end of the
prolog.
CommentDeclEvent
ParserEventGeneratorKit::outputCommentDecls
option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
size_t nComments
const CharString *comments
const CharString *seps
MarkedSectionStartEvent
ParserEventGeneratorKit::outputMarkedSections
option is enabled.
This is non-ESIS information.
The event has the following members:
Position pos
MarkedSectionStartEvent::Status status
MarkedSectionStartEvent::Status
is a local enum with the
following possible values:
MarkedSectionStartEvent::include
MarkedSectionStartEvent::rcdata
MarkedSectionStartEvent::cdata
MarkedSectionStartEvent::ignore
size_t nParams
const MarkedSectionStartEvent::Param *params
Param
is a local struct with the following members:
MarkedSectionStartEvent::Param::Type type
MarkedSectionStartEvent::Param::Type
is a local enum with the
following possible values:
MarkedSectionStartEvent::Param::temp
MarkedSectionStartEvent::Param::include
MarkedSectionStartEvent::Param::rcdata
MarkedSectionStartEvent::Param::cdata
MarkedSectionStartEvent::Param::ignore
MarkedSectionStartEvent::Param::entityRef
CharString entityName
type
is
MarkedSectionStartEvent::Param::entityRef
.
MarkedSectionEndEvent
ParserEventGeneratorKit::outputMarkedSections
option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
MarkedSectionEndEvent::Status status
MarkedSectionEndEvent::Status
is a local enum with the
following possible values:
MarkedSectionEndEvent::include
MarkedSectionEndEvent::rcdata
MarkedSectionEndEvent::cdata
MarkedSectionEndEvent::ignore
IgnoredCharsEvent
ParserEventGeneratorKit::outputMarkedSections
option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
CharString data
ErrorEvent
Position pos
ErrorEvent::Type type
ErrorEvent::Type
is a local enum with the following possible
values:
ErrorEvent::quantity
ErrorEvent::idref
ErrorEvent::capacity
ErrorEvent::otherError
ErrorEvent::warning
ErrorEvent::info
CharString message
SGMLApplication
also has a virtual function
void openEntityChange(const OpenEntityPtr &);
which is similar to an event. An application that wishes to makes use
of position information must maintain a variable of type
OpenEntityPtr
representing the current open entity, and
must provide an implementation of the openEntityChange
function that updates this variable. It can then use the value of
this variable in conjunction with a Position
to obtain a
Location
; this can be relatively slow. Unlike events, an
OpenEntityPtr
has copy constructors and assignment
operators defined.
The EventGenerator
interface provides the following
functions:
unsigned run(SGMLApplication &app)
app
for each event. Returns the number of
errors. This must not be called more than once for any
EventGenerator
object.
EventGenerator *makeSubdocEventGenerator(const SGMLApplication::Char *s, size_t n)
EventGenerator
for a subdocument of the
current document. s and n together specify the
system identifier of the subdocument entity. These should usually be
obtained from the generatedSystemId
member of the
externalId
member of the Entity
object for
the subdocument entity. This function can only be called after
run
has been called; the call to run
need
not have returned, but the SGMLApplication
must have been passed events from the prolog or instance (ie the SGML
declaration must have been parsed).
void inhibitMessages(bool b)
run()
is executing.
void halt()
run()
.
This can be at any point during the execution of run()
.
It is safe to call this function from a different thread from that which
called run()
.
The ParserEventGeneratorKit
class is used to create an
EventGenerator
that generate events using OpenSP. It
provides the following members:
EventGenerator *makeEventGenerator(int nFiles, char *const *files)
EventGenerator
that will generate events
for the SGML document whose document entity is contained in the
files
.
The returned EventGenerator
should be deleted when it
is no longer needed.
makeEventGenerator
may be called more than once.
void setOption(ParserEventGeneratorKit::Option opt)
makeEventGenerator()
is called.
ParserEventGeneratorKit::Option
is a local enum with the following possible
values:
ParserEventGeneratorKit::showOpenEntities
-e
option of nsgmls.
ParserEventGeneratorKit::showOpenElements
-g
option of nsgmls.
ParserEventGeneratorKit::outputCommentDecls
CommentDeclEvent
s to be generated.
ParserEventGeneratorKit::outputMarkedSections
MarkedSectionStartEvent
s,
MarkedSectionStartEvent
s
and IgnoredCharsEvent
s
to be generated.
ParserEventGeneratorKit::outputGeneralEntities
GeneralEntityEvent
s to be generated.
ParserEventGeneratorKit::showErrorNumbers
-n
option of nsgmls.
void setOption(ParserEventGeneratorKit::OptionWithArg opt, const char *arg)
makeEventGenerator()
is called.
ParserEventGeneratorKit::OptionWithArg
is a local enum with the following possible
values:
ParserEventGeneratorKit::addCatalog
-m
option of nsgmls.
ParserEventGeneratorKit::includeParam
-i
option of nsgmls.
ParserEventGeneratorKit::enableWarning
-w
option of nsgmls.
ParserEventGeneratorKit::addSearchDir
-D
option of nsgmls.
Creating an application with this interface involves the following steps:
SGMLApplication
,
called, say, MyApplication
.
FooEvent
that the application
needs information from, define a member of MyApplication
void MyApplication::foo(const FooEvent &)
.
ParserEventGeneratorKit
.
ParserEventGeneratorKit
using
ParserEventGeneratorKit::setOption
.
EventGenerator
using
ParserEventGeneratorKit::makeEventGenerator
.
MyApplication
(usually on the stack).
EventGenerator::run
passing it a reference to
the instance of MyApplication
.
EventGenerator
.
The application must include the ParserEventGeneratorKit.h
file (which in turn includes EventGenerator.h
and
SGMLApplication.h
), which is in the generic
directory. If your compiler does not support the standard C++
bool
type, you must ensure that bool
is
defined as a typedef for int
, before including this. One
way to do this is to include config.h
and then
Boolean.h
from the lib
subdirectory of the OpenSP
distribution.
On Unix, the application must be linked with the
lib/libsp.a
library.
Here's a simple example of an application that uses this interface. The application prints an outline of the element structure of a document, using indentation to represent nesting.
// The next two lines are only to ensure bool gets defined appropriately. #include "config.h" #include "Boolean.h" #include "ParserEventGeneratorKit.h" #include <iostream.h> ostream &operator<<(ostream &os, SGMLApplication::CharString s) { for (size_t i = 0; i < s.len; i++) os << char(s.ptr[i]); return os; } class OutlineApplication : public SGMLApplication { public: OutlineApplication() : depth_(0) { } void startElement(const StartElementEvent &event) { for (unsigned i = 0; i < depth_; i++) cout << " "; cout << event.gi << '\n'; depth_++; } void endElement(const EndElementEvent &) { depth_--; } private: unsigned depth_; }; int main(int argc, char **argv) { ParserEventGeneratorKit parserKit; // Use all the arguments after argv[0] as filenames. EventGenerator *egp = parserKit.makeEventGenerator(argc - 1, argv + 1); OutlineApplication app; unsigned nErrors = egp->run(app); delete egp; return nErrors > 0; }
This example will only work for the non-multibyte version of OpenSP; for the multibyte version you will need to use the standard C++ library's facilities for wide character output, or roll your own equivalents (like the OutputCharStream used by OpenSP applications).
There's a bigger example in the osgmlnorm
directory in the OpenSP
distribution.
This uses the SGMLApplication
interface, but it doesn't
use the ParserEventGeneratorKit
interface.