Common Lisp support for the
'Extensible Markup Language'
(CL-XML)

20030602 (v 0.949)
james anderson,



[xml-support level] [program structure]
[usage] [examples] [further work]


CL-XML is a collection of Common LISP modules for data stream parsing and serialization according to the "Extensible Markup Language" and anscilliary standards. The modules perform parsing and serialization between XML, XML Query, and XML Path expressions and DOM-compatible CLOS instances. The XML processor includes a conformant, validating, namespace-aware model-based parser. It supports, in particular, namespace-aware DTD-based validation. The XPATH module comprises LISP bindings for the XML Path library, an S-expression-based namespace-aware path model, and a macro-based path model compiler which implements an XPATH-algebra. The XQUERY module comprises LISP bindings for the XML Query library, an S-expression-based query model which incorporates the XPATH facilities, and a macro-based query compiler. The base CLOS model class library implements the XML Query Data Model and presents an Infoset compatiable programming interface.

This document describes the implemented parsing/processing mechanism for CLOS-based applications, and explains how to use the processor. The processor is intended for use both as a stand-alone XML interface and as an extension to the CL-HTTP server. The runtime environment is examined during compilation to determine if the if HTTP support is already present. If so, then the existing facilities are used and server extensions are generated to support XML. If CL-HTTP is not present, then these extensions are not generated and only file streams and primitive http streams are supported.

A cursory introduction to XML is available here. Source archives are available for the MCL, Lispworks, Allegro, CMUCL, OpenMCL, and Scieneer implementations of Common Lisp. A separate document provides the download paths and details on the implementation status.


[Top]

Implementation Level

The respective releases have been tested with

XML

The XML module implements a conformant, namespace-aware, validating XML processor which instantitiates an Info-Model compatible document model. It also supports event-based parsing according to both a grammar-based and a SAX-equivalent event interface.

The processor always incorporates external references. A referenced document definition is instantiated and incorporated in the document instance as an internal document type definition model. The definition is used to effect instance defaulting and typing and to perform in-line document validation. The parser can be invoked with validation enabled or disabled.
It can be invoked so as to produce a data instance, a parse tree, or to produce a parse event stream. Among these varaitions it is possible to parse without generating any result. By default it parses the production designated as Document in the standard and generates a document node, but it can also be invoked so as to parse others of the non-terminal forms, subject to the constraints implicit in the context-sensitivity of XML lexical analysis. (see xml-tokenizer.)

Namespace-aware qualified name resolution is effected as an integral aspect of parsing. Name resolution within the document element accords with XML-1.0+Names. Name resolution within a DTD applies analogous rules to element and attribute declarations. As a consequence, namespace-aware dtd-based validation is supported.

Conformance

The processor passes 1749 of the 1812 tests in the OASIS conformance suite when the base implementation supports sixteen-bit characters. The test protocol is present in the release in the file "xml:tests;test-oasis-xxxxxxxx.txt". This protocol file notes the discrepancies, which fall into three categories

Numerous test documents include invalid URI literals and/or system identifiers which were apparently not intended to be interpreted in a platform independent manner. In such cases, when configured for CL-HTTP, warnings will appear.

The validation engine is insensitive to nondeterministic models.

The validation engine is insensitive to lexical encoding - in particular to entities, to processing instructions and to comments. Which means that it is insensitive to distinctions such as those suggested in xml-V10-2e-errata#E15. Should this matter to users, a mode could be added to enforce this behaviour. At this point it seems senseless to distinguish validity based on properties which the models do not express.

Validation

Where the parser is invoked with validation enabled, the respetive elements are examined at the conclusion of the respective content. It is also possible to effect static validation for elements at will. Where validation is performed, content models are compiled as referenced. The models can be read from a DTD or constructed programmatically.

The validation mechanism is namespace-aware. This as a direct consequence of the parser's ability to interpret and propagate namespace bindings within the DTD.

Serialization

Methods are available for namespace aware serialization. They take three forms.

Names

As of 0.912 the parser is capable of representing names either as names or as CLOS instances. The environment features xml-symbols and nameset-tokenizers in the file xml:base;parameters.lisp determine the name implementation.

As of 0.918, the instance/symbol implementations have been tests in MCL, LispWorks and Allegro.

XMLPath

The XMLPath module implements access to document models based on XML Path expressions. It includes an implementation for the XML Path library, an interpreter for paths formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form.

Conformance

The path parser manages all of the examples in the XML Path recommendation. I'm, unfortunately, at a loss for a conformance suite. I'm waiting for a public version of the OASIS XLST conformance suite.

XMLQuery

The XMLQuery module implements access to document models based on XML Query expressions. These incorporate XML Path expressions to address document elements and extend them with construction operations. The module includes an implementation for the XML Query library, an interpreter for queries formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form.

Conformance

The query parser manages all of the examples in the query use cases. In some cases, the parse is ambiguous. The code generator is at an early stage.

A serializer is included. It is restricted to data model instances and follows the concrete syntax for the query algebra, not the query language.

XMLQueryDataModel

The instance model represents an Infoset compatible document model. The root class for the abstract model is ABSTRACT-NODE, of which the specializations DOC-NODE, ELEM-NODE, ATTR-NODE, NS-NODE, PI-NODE, and COMMENT-NODE constitute the principal concrete specializations. The root of a result document instance is a DOC-NODE, which binds a single ELEM-NODE and a (COMMENT-NODE + ELEMENT-NODE + PI-NODE) set. Within the element node, content is represented with a sequence of strings and/or ELEM-NODE instances, Element attributes appear as a sequenceof ATTR-NODE instances and namespaces appear as a sequence of NS-NODE instances. Where a document type definitions is present, each instance binds its respective definition instance to its DEF slot.

A DOC-NODE collects definitions for DEF-GENERAL-ENTITY, DEF-PARAMETER-ENTITY, DEF-NOTATION, and DEF-TYPE. The bindings can be effected upon instantiation or incrementally - as is the case when parsing. A slot is available to cache attribute declarations, but it is entirely informational. The effective declarations are those in the respective DEF-TYPE instance. The XML parser collectes them there when processing a DTD.

A DOC-NODE also collects ID attribute instances as they occur in a hashtable which maps ID values to the respective ELEM-NODE instance.


[Top]

Installation

The source archives should unpack to a single directory with the files "sysdcl.lisp" and "define-system.lisp"and the directory "code" at the top level. Installation is demonstrated by the example files xml*clhttp*instanceNames. If the xml modules are to be integrated into a larger system, the system definition is compatible with the cl-http conventions.

xml:tests;test.lisp
;;; -*- Mode: lisp; Syntax: ansi-common-lisp; Base: 10; Package: cl-user; -*-



(in-package "CL-USER")



;;; simplest of tests, to load the parse and parse a document



#-CL-HTTP

(load "entwicklung@bataille:source:lisp:xml:define-system.lisp")



;; minimum system; adjust the pathname accordingly

(register-system-definition :xparser

                            "entwicklung@bataille:source:lisp:xml:sysdcl.lisp")

;(execute-system-operations :xparser '(:load))

(execute-system-operations :xparser '(:compile :load))



;; extended to include xml paths

(register-system-definition :xpath "entwicklung@bataille:source:lisp:xml:sysdcl.lisp")

(execute-system-operations :xpath '(:compile :load))



;; including xml query

(register-system-definition :xquery "entwicklung@bataille:source:lisp:xml:sysdcl.lisp")

(execute-system-operations :xquery '(:compile :load))



(xmlp:document-parser "<test attr='1234'>asdf</test>")



(xmlp:document-parser #4p"xml:tests;xml;channel.xml")



:EOF

The implementation is factored into distinct modules, one for each "standard" aspect, each in its own directory. The system definition specifies the dependancy among these packages. One need specify only one module and the others are loaded implicitly.


Generated source and binary files are placed in distinct directories. The release includes empty directories to account for LISPs which don't generate directories on demand.

In addition there are several general source files in the base directory

XQDM includes numerous atypical files

XML, XPath, and XQuery manifest a common structure:

XML, in addition, includes

Note that there is no static code for any "parser" in the release itself. The "atn-lib" directory contains the code for the parsers as generated from the respective BNF descriptions. These files are generated as a side-effect of operations on the respective *-parser file through bnfp:compile-atn-system. When compiling the effect is to generate, compile, and load. When loading from source the effect is to generate and load source. When loading binaries, an existing binary is loaded. Lacking that an existing generated source file is loaded. Tracing is enabled via arguments to the compilation function.


[Top]

Usage

XML

The primary interface function is XML:DOCUMENT-PARSER. Depending on the keywords provided to the stream-specialized method, the parser can be invoked to generate a document model, events, or a combination of both.
XMLP:DOCUMENT-PARSER source &key

[Generic Function]

It accepts as its primary argument a source designation. The various optional argument forms are transformed into binary streams and parsed as XML-encoded documents.

XMLP:DOCUMENT-PARSER
(source FILE-URL) &rest args

[Primary Method]

Delegates to the PATHNAME method on the respective pathname.
XMLP:DOCUMENT-PARSER
(source HTTP-URL) &rest args

[Primary Method]

Attempts to generate an HTTP input stream and delegate to the STREAM method.
XMLP:DOCUMENT-PARSER
(source PATHNAME) &rest args

[Generic Function]

Delegates to the STREAM method on the respective binary stream.
XMLP:DOCUMENT-PARSER
(source STREAM) &key trace (reduce t)
document construction-context
(start-name '|Document|)

[Generic Function]

Decodes the provided stream. Parses, by default, the Document production to produce a DOC-NODE. Other productions are possible subject to context constraints.
A non-NULL trace value causes the parser to emit a progress log.
A reduce value NIL causes the processor to suppress instantiation.
A reduce value CONS causes the processor to cons a parse tree, rather than instantiating.
The keyword arguments construction-context and document are provided to support specialization of DOM instances and/or to specializeevent handlers.
XMLP:DOCUMENT-PARSER
(source STRING) &rest args

[Primary Method]

Delegates to the STREAM method on the respective character code vector

XMLP:DOCUMENT-PARSER
(source VECTOR) &rest args

[Primary Method]

Delegates to the STREAM method on the vector input stream
XMLP:PARSE-EXTERNAL-SUBSET-TOPLEVEL
source &rest args &key bind-definitions intern-names

[Function]

Parses an external DTD subset. Where intern-names is non-null (by default) qualified names are resolved to universal names. Where bind-definitions is NIL (by default), the function returns ext-subset-node, of which the children property binds the parsed forms in order of appearance. Where intern-names and bind-definitions are both non-null, a doc-node context is returned which binds the definitions present in the external subset.

XMLP:*DOCUMENT*

[Variable]

Binds an the in-progress document during the parse process.
XMLP:*NAMESPACES*

[Variable]

Binds an association list of the form (prefix-symbol . package) for use resolving qualified name prefixes. The default value is.
((|xmlns|:|xmlns| . #<Package "xmlns">) (|xmlns|:|xml| . #<Package "xml">))
When the default set is overridden, these bindings must be maintained if those prefixes appear in the respective document.

XPATH

The primary interface function is XP:XPATH-PARSER.


XP:XPATH-PARSER source &key

[Generic Function]

It accepts as its primary argument a source designation. The various optional argument forms are tokenized and parsed as XML-encoded documents.

XP:XPATH-PARSER
(source STRING) &key trace (reduce t)
(start-name '|Expr|)

[Primary Method]

Tokenizes the string and parses it, by default, as an XML Path Expr to produce a LAMBDA expression of one argument comprising the equivalent XPA:PATH S-expression form. Other terms may be specified. The result expression may be saved and loaded independent of the parsing environment. It must be compiled prior to application, in an environment which binds any prefixes present in the path expression.

The result of application to a document component is a generator function, of which repeated calls generates the successive members of the addressed node set.

The LISP binding is implemented as a collection of self-evaluating forms for path, step, and test expressions, together with a library of primitives.

XQUERY

The primary interface function is XP:XPATH-PARSER.


XQ:QUERY-PARSER source &key

[Generic Function]

It accepts as its primary argument a source designation. The various optional argument forms tokenized and parsed as XML-encoded documents.

XQ:QUERY-PARSER
(source STRING) &key trace (reduce t)
(start-name '|Query|)

[Primary Method]

Tokenizes the string and parses it, by default, as an XML Query expression to produce a LAMBDA expression of no arguments comprising the equivalent XPA:PATH S-expression form. Other terms may be specified. The result expression may be saved and loaded independent of the parsing environment. It must be compiled prior to application, in an envidonemtn which binds any prefixes present in the path expression.

The function produces the query result when invoked.

The LISP-binding is implemented as a collection of forms and utility functions together with a library of primitives. Where no XQuery specific function is defined, the functions from the XPath library are available.


[Top]

Examples

The directory "Tests" includes several test files.


[Top]

Further Work


[Top]
References


[Top]

© setf.de 2003