20030408 (v 0.949)
james anderson,
The CL-XML processor can be invoked so as to communicate individual parse events to an invoking application. This behaviour can be instead of or in addition to the default behaviour as a model-based XML processor, An application can implement handlers for these events so as to control both the products of the parser and its resource usage. The mechanism operates in addition to that provided to specialize the implementations of the document model nodes which the processor generates.
This document describes the mechanism for processing parse events and illustrates
its use in an event-based RDF parser.
|
|
[Top] |
The prevalent interface for event-based XML processing is the Simple Application programming interface for XML, SAX. SAX serves variously as an autonomous event-based XML parser for Java applications, as the standard event-driver for numerous Java-based parsers (Xerces, JAXP), and as the preprocessor for numerous other XML tools (SAXON, XDK).
The core of the current generation of SAX-parsers is the org.xml.sax.ContentHandler
interface, which specifies the parsing events reported to the application. While
this interface does, in keeping with its name, provide a concise report of the document
content, the match with the CL-XML parser is too course-grained for it to serve as
the primary event-based interface. For this reason, two event-based interfaces are
provided. The first, lower level interface permits an application to specify a surrogate,
a construction context, to handle an event stream is generated by the parser directly
in the course of phrase reduction. this permits the application detailed access to
all lexical entitiesand to the the process by which the parser constructs the document
model. A higher level, SAX-equivalent, stream is generated by a special form of construction
context which uses the detailed events to build and generate SAX-equivalent events.
The lower-level construction event interface comprises that subset of the parser's
reduction functions which are defined with a context parameter in addition to the
parameters for the properties of the respective term which is to be reduced. The
interface to these functions is governed by the naming and constitution of those
terms in the BNF which denote model nodes and properties. The actual terms are specified
by the implementation of the bnfp:atn-constructor-specializer method
specialized for the bnfp:atn-edge class which is engaged during parser
generation. Note that these functions are distinguished by case-sensitive names.
In addition to these functions, several auxiliary functions are specialized in order
to provide finer grained control over the reduction/construction process. These are
distinguished by names which begin with construct-. (see "xml;code:xparser:xml-constructors.lisp"
for the respective documentation.)
XMLP:|AttCharData-Constructor| context att-value name |
[Generic Function] |
XMLP:|Attribute-Constructor| context att-value name |
[Generic Function] |
XMLP:|CharData-Constructor| context character-data |
[Generic Function] |
XMLP:|CDataCharData-Constructor| character-data |
[Generic Function] |
XMLP:|Comment-Constructor| character-data |
[Generic Function] |
XMLP:construct-attribute-name context name |
[Generic Function] |
XMLP:construct-attribute-plist context |
[Generic Function] |
XMLP:construct-construction-context context component |
[Generic Function] |
XMLP:construct-content-sequence context |
[Generic Function] |
XMLP:construct-elem-property-node context |
[Generic Function] |
XMLP:construct-element-name context |
[Generic Function] |
XMLP:construct-element-node context name |
[Generic Function] |
XMLP:construct-ns-node context attribute-value name |
[Generic Function] |
XMLP:construct-string-attr-node context |
[Generic Function] |
XMLP:|Content-Constructor| context |
[Generic Function] |
XMLP:|ContentSequence-Constructor| context |
[Generic Function] |
XMLP:|Document-Constructor| context |
[Generic Function] |
XMLP:|Element-Constructor| context |
[Generic Function] |
XMLP:|ExtParsedEnt-Constructor| context |
[Generic Function] |
XMLP:|Pi-Constructor| context literal target |
[Generic Function] |
XMLP:|PiCharData-Constructor| context character-data |
[Generic Function] |
XMLP:|STag-Constructor| context |
[Generic Function] |
An application may avail itself of this interface by specifying a context instance
value for the keyword argument :construction-context to the XMLP:document-parser
function. Where no value is specified, the document instance is used initially and
is supplanted by the respective element instances over their respective extent. The
parser itself incorporates methods for these functions specialized accordingly to
generate a document model.
Use of the construction context interface is demonstrated in the two classes XMLP:null-construction-context
and NOX:sax-construction-context.
Should an XMLP:null-construction-context
instance be specified as the construction context, the parser produces a NULL
result.
XMLP:null-construction-context |
[Class] |
|
Where an In order to effect this, the context specializes the following functions with
methods which return
|
|
The NOX:sax-construction-context
class implementes methods for the low-level construction methods which direct a SAX1-equivalent
parse event stream at the context instance's bound consumer property.
The class is the basis of the bridge class used to parse RDF documents by driving
the WILBUR RDF parser through its
NOX:sax-consumer interface. Note that, in comparison to an orthodox
SAX interface this is a hybrid event/tree interface in that various atomic properties
and events are accumulated and passed to the event consumer as instances. See below
for an example which uses it to parse RDF.
NOX:sax-construction-context |
[Class] |
|
Where a In order to effect this, the context specializes the following functions with
methods which return
|
|
For convenience, the NOX:sax-consumer interface is summarized below.
Note that this constitutes a SAX-1 equivalent interface with additional support for
namespaces.
NOX:char-content |
[Generic Function] |
NOX:end-document |
[Generic Function] |
NOX:end-element |
[Generic Function] |
NOX:proc-instruction |
[Generic Function] |
NOX:start-document |
[Generic Function] |
NOX:start-element |
[Generic Function] |
|
|
[Top] |
A stream-based RDF parser demonstrates one way to use this event interface. The
implementation specializes the WILBUR:rdf-parser class to use a NOX:sax-construction-context,
introduced above, as a SAX-equivalent driver to generate its parse events. The source
to specialize the RDF parser's event producer class is minimal.
xml:demos;rdf;rdf-inline-parser.lisp |
;;; -*- package: WILBUR; Syntax: Common-lisp; Base: 10 -*- (in-package "WILBUR") ;; an xmlp-based rdf parser drives the wilbur event-based counterpart based on ;; inline parser construction operations (defClass rdf-xmlp-parser (rdf-parser) () (:default-initargs :producer (make-instance 'nox::sax-construction-context :consumer (make-instance 'rdf-syntax-normalizer)))) ;; ;; ;; ;; ;; the top-level parse function (defGeneric parse-db-from-xmlp-stream (source &rest options) (:documentation "generate an rdf database from an input source. uses an rdf parser specialized to translate xmlp parse events into the required SAX-equivalent events. includes a somewhat redundant method to map a string source to an URI as the rdf parsing interface required that before the xmlp parser itself is called. ") (:method ((source t) &rest options) (apply #'parse-db-from-stream source (xqdm:uri source) :parser-class 'rdf-xmlp-parser options)) (:method ((source string) &rest options) (cond ((char= (char source 0) #\<) (apply #'parse-db-from-xmlp-stream (make-instance 'vector-input-stream :vector source) options)) (t (apply #'parse-db-from-xmlp-stream (xutils:make-uri source) options))))) :EOF |
The implementation for the NOX:sax-construction-context class supplants
numerous construction operators by specializing them to operate to the exclusion
of the parser's internal methods. The excerpts below demonstrate how it supplants,
respectively, operators which the xml parser would use to generate nodes in a document
model (XMLP:|CharData-Constructor|, XMLP:|Element-Constructor|,
and XMLP:|STag-Constructor|) and one used to manipulate properties in
the parsing context (XMLP:construct-ns-node).
xml:demos;sax;sax-construction-context.lisp (excerpted) |
;;; -*- package: NOX; Syntax: Common-lisp; Base: 10 -*- (in-package "NOX") ;;; ... ;;; the (defMethod xmlp:|CharData-Constructor| ((context sax-construction-context) (data string)) (setf data (collapse-whitespace data)) (when (plusp (length data)) (char-content (sax-producer-consumer context) data (sax-consumer-mode (sax-producer-consumer context))) nil)) ;;; ... ;;; instead of binding the namespace prefix, as the parser's default method would, ;;; the specialization simply returns the properties. the parser eventually furnishes ;;; them together with attribute properties to the call to xmlp:|STag-Constructor| (defMethod xmlp:construct-ns-node ((context sax-construction-context) attribute-value name &optional (colon-position (position #\: name)) &aux ns-name namespace) (setf ns-name (xqdm:value-string attribute-value)) (unless (stringp ns-name) (xqdm:xml-error "namespace name syntax error: ~s: ~s." name attribute-value)) (when (and colon-position (zerop (length ns-name))) (xqdm:xml-error xqdm:|NSC: No Null Namespace Bindings| :name name)) (setf namespace (xqdm:find-namespace ns-name :if-does-not-exist :create)) (xmlp:call-with-name-properties #'(lambda (&key local-part &allow-other-keys) (cons local-part namespace)) name :colon-position colon-position :namespace xqdm:*xmlns-namespace*)) ;;; ... ;;; the distinction between the specialized method for xmlp:|Element-Constructor| ;;; and that for xmlp:|STag-Constructor| demonstrates how these constructors ;;; interact with the xml parser's internal state. where the element constructor ;;; passes the event through the consumer's end-element method and produces no result, ;;; the xmlp:|STag-Constructor| specialization not only generates a start-element ;;; event, it also returns the resulting event instance. which xml parser then collects ;;; among the terms in the Element phrase and supplies to the call to ;;; xmlp:|Element-Constructor| (defMethod xmlp:|Element-Constructor| ((context sax-construction-context) (content* t) etag stag) (when etag (let ((close-tag (make-instance 'close-tag))) (setf (tag-counterpart stag) close-tag (tag-counterpart close-tag) stag))) (end-element (sax-producer-consumer context) stag (sax-consumer-mode (sax-producer-consumer context))) nil) (defMethod xmlp:|STag-Constructor| ((context sax-construction-context) attr-plist+ns-cons* name) (let ((tag (make-instance 'open-tag)) (namespaces nil) (attributes nil)) (xmlp:call-with-name-properties #'(lambda (&key namestring local-part namespace &allow-other-keys) (flet ((tag-attribute (&key name att-value) (xmlp:call-with-name-properties #'(lambda (&key local-part namespace &allow-other-keys) (cons (concatenate 'string (xqdm:namespace-name namespace) local-part) (xqdm:value-string att-value))) name)) (tag-namespace (name value) (cons (if (string-equal name xqdm:*xmlns-prefix-namestring*) nil name) value))) (setf (token-string tag) (if (eq namespace xqdm:*null-namespace*) local-part (concatenate 'string (xqdm:namespace-name namespace) local-part)) (tag-original-name tag) namestring) (mapcar #'(lambda (attr-plist+ns-cons) (cond ((consp (rest attr-plist+ns-cons)) ;; an attribute (push (apply #'tag-attribute attr-plist+ns-cons) attributes)) (t (push (tag-namespace (first attr-plist+ns-cons) (rest attr-plist+ns-cons)) namespaces)))) attr-plist+ns-cons*) (setf (tag-attributes tag) attributes (tag-namespaces tag) namespaces))) name) (start-element (sax-producer-consumer context) tag (sax-consumer-mode (sax-producer-consumer context))) tag)) ;;; ... |
As a side note, it is also possible to drive the event-based interface to the
RDF parser by generating a parse event stream while traversing a document model.
This practice is demonstrated by the WILBUR:rdf-dom-parser
implementation, which is analogous to that for NOX:sax-construction-context.
|
|
[Top] |