20030602 (v 0.949)
james anderson,
|
[xml-support level] [program
structure] |
CL-XML is a collection of Common LISP modules for data stream parsing and serialization according to the "Extensible Markup Language" and anscilliary standards. The modules perform parsing and serialization between XML, XML Query, and XML Path expressions and DOM-compatible CLOS instances. The XML processor includes a conformant, validating, namespace-aware model-based parser. It supports, in particular, namespace-aware DTD-based validation. The XPATH module comprises LISP bindings for the XML Path library, an S-expression-based namespace-aware path model, and a macro-based path model compiler which implements an XPATH-algebra. The XQUERY module comprises LISP bindings for the XML Query library, an S-expression-based query model which incorporates the XPATH facilities, and a macro-based query compiler. The base CLOS model class library implements the XML Query Data Model and presents an Infoset compatiable programming interface.
This document describes the implemented parsing/processing mechanism for CLOS-based applications, and explains how to use the processor. The processor is intended for use both as a stand-alone XML interface and as an extension to the CL-HTTP server. The runtime environment is examined during compilation to determine if the if HTTP support is already present. If so, then the existing facilities are used and server extensions are generated to support XML. If CL-HTTP is not present, then these extensions are not generated and only file streams and primitive http streams are supported.
A cursory introduction to XML is available here. Source
archives are available for the MCL, Lispworks, Allegro, CMUCL, OpenMCL, and Scieneer
implementations of Common Lisp. A separate document provides
the download paths and details on the implementation status.
|
|
[Top] |
The respective releases have been tested with
The XML module implements a conformant, namespace-aware, validating XML processor which instantitiates an Info-Model compatible document model. It also supports event-based parsing according to both a grammar-based and a SAX-equivalent event interface.
The processor always incorporates external references. A referenced document definition
is instantiated and incorporated in the document instance as an internal document
type definition model. The definition is used to effect instance defaulting and typing
and to perform in-line document validation. The parser can be invoked with validation
enabled or disabled.
It can be invoked so as to produce a data instance, a parse tree, or to produce a
parse event stream. Among these varaitions it is possible to parse without generating
any result. By default it parses the production designated as Document
in the standard and generates a document node, but it can also be invoked so as to
parse others of the non-terminal forms, subject to the constraints implicit in the
context-sensitivity of XML lexical analysis. (see xml-tokenizer.)
Namespace-aware qualified name resolution is effected as an integral aspect of parsing. Name resolution within the document element accords with XML-1.0+Names. Name resolution within a DTD applies analogous rules to element and attribute declarations. As a consequence, namespace-aware dtd-based validation is supported.
The processor passes 1749 of the 1812 tests in the OASIS conformance suite when the base implementation supports sixteen-bit characters. The test protocol is present in the release in the file "xml:tests;test-oasis-xxxxxxxx.txt". This protocol file notes the discrepancies, which fall into three categories
Numerous test documents include invalid URI literals and/or system identifiers which were apparently not intended to be interpreted in a platform independent manner. In such cases, when configured for CL-HTTP, warnings will appear.
The validation engine is insensitive to nondeterministic models.
The validation engine is insensitive to lexical encoding - in particular to entities, to processing instructions and to comments. Which means that it is insensitive to distinctions such as those suggested in xml-V10-2e-errata#E15. Should this matter to users, a mode could be added to enforce this behaviour. At this point it seems senseless to distinguish validity based on properties which the models do not express.
Where the parser is invoked with validation enabled, the respetive elements are examined at the conclusion of the respective content. It is also possible to effect static validation for elements at will. Where validation is performed, content models are compiled as referenced. The models can be read from a DTD or constructed programmatically.
The validation mechanism is namespace-aware. This as a direct consequence of the parser's ability to interpret and propagate namespace bindings within the DTD.
Methods are available for namespace aware serialization. They take three forms.
As of 0.912 the parser is capable of representing names either as names or as
CLOS instances. The environment features xml-symbols and nameset-tokenizers
in the file xml:base;parameters.lisp determine the name implementation.
As of 0.918, the instance/symbol implementations have been tests in MCL, LispWorks and Allegro.
The XMLPath module implements access to document models based on XML Path expressions. It includes an implementation for the XML Path library, an interpreter for paths formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form.
The path parser manages all of the examples in the XML Path recommendation. I'm, unfortunately, at a loss for a conformance suite. I'm waiting for a public version of the OASIS XLST conformance suite.
The XMLQuery module implements access to document models based on XML Query expressions. These incorporate XML Path expressions to address document elements and extend them with construction operations. The module includes an implementation for the XML Query library, an interpreter for queries formulated as S-expressions and, a parser to translate string-encoded expressions into the equivalent S-expression form.
The query parser manages all of the examples in the query use cases. In some cases, the parse is ambiguous. The code generator is at an early stage.
A serializer is included. It is restricted to data model instances and follows the concrete syntax for the query algebra, not the query language.
The instance model represents an Infoset compatible document model. The root class for the abstract model is ABSTRACT-NODE, of which the specializations DOC-NODE, ELEM-NODE, ATTR-NODE, NS-NODE, PI-NODE, and COMMENT-NODE constitute the principal concrete specializations. The root of a result document instance is a DOC-NODE, which binds a single ELEM-NODE and a (COMMENT-NODE + ELEMENT-NODE + PI-NODE) set. Within the element node, content is represented with a sequence of strings and/or ELEM-NODE instances, Element attributes appear as a sequenceof ATTR-NODE instances and namespaces appear as a sequence of NS-NODE instances. Where a document type definitions is present, each instance binds its respective definition instance to its DEF slot.
A DOC-NODE collects definitions for DEF-GENERAL-ENTITY, DEF-PARAMETER-ENTITY, DEF-NOTATION, and DEF-TYPE. The bindings can be effected upon instantiation or incrementally - as is the case when parsing. A slot is available to cache attribute declarations, but it is entirely informational. The effective declarations are those in the respective DEF-TYPE instance. The XML parser collectes them there when processing a DTD.
A DOC-NODE also collects ID attribute instances as they occur in a hashtable which maps ID values to the respective ELEM-NODE instance.
|
|
[Top] |
The source archives should unpack to a single directory with the files "sysdcl.lisp" and "define-system.lisp"and the directory "code" at the top level. Installation is demonstrated by the example files xml*clhttp*instanceNames. If the xml modules are to be integrated into a larger system, the system definition is compatible with the cl-http conventions.
xml:tests;test.lisp |
;;; -*- Mode: lisp; Syntax: ansi-common-lisp; Base: 10; Package: cl-user; -*- (in-package "CL-USER") ;;; simplest of tests, to load the parse and parse a document #-CL-HTTP (load "entwicklung@bataille:source:lisp:xml:define-system.lisp") ;; minimum system; adjust the pathname accordingly (register-system-definition :xparser "entwicklung@bataille:source:lisp:xml:sysdcl.lisp") ;(execute-system-operations :xparser '(:load)) (execute-system-operations :xparser '(:compile :load)) ;; extended to include xml paths (register-system-definition :xpath "entwicklung@bataille:source:lisp:xml:sysdcl.lisp") (execute-system-operations :xpath '(:compile :load)) ;; including xml query (register-system-definition :xquery "entwicklung@bataille:source:lisp:xml:sysdcl.lisp") (execute-system-operations :xquery '(:compile :load)) (xmlp:document-parser "<test attr='1234'>asdf</test>") (xmlp:document-parser #4p"xml:tests;xml;channel.xml") :EOF |
The implementation is factored into distinct modules, one for each "standard" aspect, each in its own directory. The system definition specifies the dependancy among these packages. One need specify only one module and the others are loaded implicitly.
Generated source and binary files are placed in distinct directories. The release
includes empty directories to account for LISPs which don't generate directories
on demand.
code:atn-lib contains the generated source files for the respective
parsers.
bin:allfasl contains compiled files for Allegro
bin:lwfasl contains compiled files for LispWorks
bin:m68kfasl contains compiled files for MCL on 68k platforms
bin:ppcfasl contains compiled files for MCL on PPC platforms
In addition there are several general source files in the base directory
cllib utilits adopted from the cllib packge
defsystem the system definition file. one should be able to load
and/or compile it directly. it should load the bnf compiler itself. When compiled,
it loads the respective files incrementally.
package package definitions
parameters configuration parameters, primarily to govern conformance
utils miscellaneous utility functions
XQDM includes numerous atypical files
xqdm-character-classes comprises the character predicates as specified
by the XML standard.
xqdm-classes comprises the respetive class definitions and accessors.
xqdm-conditions comprises conditions
xqdm-qnames contains the mechanism for processing a document type
definition to resolve qualified names to universal names based on inferred namespace
declarations.
xqdm-validation contains the validation engine.
XML, XPath, and XQuery manifest a common structure:
*-classes comprises the respetive class definitions and accessors.
*-constructors is present in the parser modules and comprises the
functions which reduce non-terminal productions to instances. There is one function
present for each production. They are respectively functions of the production terms
taken in alphabetical order. In exceptional cases, the parser is generates so as
to suppress insiginficant terms - such as 'S' in XML. Terms which appear once are
appear as reduced. Terms of higher ordinality are collected into a list.
*-grammar.bnf is present in the parser modules. It is the BNF description
of the respetive encoding. For the most part it is a literal copy of the standard.
In some cases terms have been reordered in order to eliminate ambiguity.
*-operators implements the respective interpreter or, in the case
of the document model, the respective class behaviour. For the path and query modules,
the interpreter consists primarily of macro functions which compile the expressions.
*-library is present in interpretative modules. It comprises primitive
operations in the respective language.
*-parameters contains global variable definitions.
*-parser is present in the parsing modules. It comprises token class
predicates, language-specific extensions to the bnf-parser, control functions to
generate the lisp source for the respective parser from the bnf, and wrapper functions
for the generated parser.
*-printer is present in the parsing modules. It comprises methods
to serialize instances from document according to the respective encoding.
*-tokenizer is present in the parsing modules. It implements the
respective lexical processing functions.
XML, in addition, includes
Note that there is no static code for any "parser" in the release itself.
The "atn-lib" directory contains the code for the parsers as generated
from the respective BNF descriptions. These files are generated as a side-effect
of operations on the respective *-parser file through bnfp:compile-atn-system.
When compiling the effect is to generate, compile, and load. When loading from source
the effect is to generate and load source. When loading binaries, an existing binary
is loaded. Lacking that an existing generated source file is loaded. Tracing is enabled
via arguments to the compilation function.
|
|
[Top] |
The primary interface function is XML:DOCUMENT-PARSER. Depending on the keywords
provided to the stream-specialized method, the parser can be invoked to generate
a document model, events,
or a combination of both.
XMLP:DOCUMENT-PARSER source &key |
[Generic Function] |
| It accepts as its primary argument a source designation. The various optional argument forms are transformed into binary streams and parsed as XML-encoded documents. | |
XMLP:DOCUMENT-PARSER |
[Primary Method] |
Delegates to the PATHNAME method on the respective pathname. |
|
XMLP:DOCUMENT-PARSER |
[Primary Method] |
Attempts to generate an HTTP input stream and delegate to the STREAM
method. |
|
XMLP:DOCUMENT-PARSER |
[Generic Function] |
Delegates to the STREAM method on the respective binary stream. |
|
XMLP:DOCUMENT-PARSER |
[Generic Function] |
Decodes the provided stream. Parses, by default, the Document production
to produce a DOC-NODE. Other productions are possible subject to context
constraints.A non- NULL trace value causes the parser to emit a progress
log.A reduce value NIL causes the processor to suppress
instantiation.A reduce value CONS causes the processor to cons
a parse tree, rather than instantiating.The keyword arguments construction-context and document
are provided to support specialization of DOM instances and/or to specializeevent
handlers. |
|
XMLP:DOCUMENT-PARSER |
[Primary Method] |
Delegates to the STREAM method on the respective character code vector |
|
XMLP:DOCUMENT-PARSER |
[Primary Method] |
Delegates to the STREAM method on the vector input stream |
|
XMLP:PARSE-EXTERNAL-SUBSET-TOPLEVEL |
[Function] |
Parses an external DTD subset. Where intern-names is non-null (by default) qualified
names are resolved to universal names. Where bind-definitions
is NIL (by default), the function returns ext-subset-node,
of which the children property binds the parsed forms in order of appearance. Where
intern-names and bind-definitions are both
non-null, a doc-node context is returned which binds the definitions
present in the external subset. |
|
XMLP:*DOCUMENT* |
[Variable] |
| Binds an the in-progress document during the parse process. | |
XMLP:*NAMESPACES* |
[Variable] |
| Binds an association list of the form (prefix-symbol . package) for use resolving
qualified name prefixes. The default value is. ((|xmlns|:|xmlns| . #<Package "xmlns">) (|xmlns|:|xml| . #<Package "xml">)) When the default set is overridden, these bindings must be maintained if those prefixes appear in the respective document. |
|
The primary interface function is XP:XPATH-PARSER.
XP:XPATH-PARSER source &key |
[Generic Function] |
| It accepts as its primary argument a source designation. The various optional argument forms are tokenized and parsed as XML-encoded documents. | |
XP:XPATH-PARSER |
[Primary Method] |
Tokenizes the string and parses it, by default, as an XML Path Expr
to produce a LAMBDA expression of one argument comprising the equivalent
XPA:PATH S-expression form. Other terms may be specified. The result
expression may be saved and loaded independent of the parsing environment. It must
be compiled prior to application, in an environment which binds any prefixes present
in the path expression.The result of application to a document component is a generator function, of which repeated calls generates the successive members of the addressed node set. |
|
The LISP binding is implemented as a collection of self-evaluating forms for path, step, and test expressions, together with a library of primitives.
The primary interface function is XP:XPATH-PARSER.
XQ:QUERY-PARSER source &key |
[Generic Function] |
| It accepts as its primary argument a source designation. The various optional argument forms tokenized and parsed as XML-encoded documents. | |
XQ:QUERY-PARSER |
[Primary Method] |
Tokenizes the string and parses it, by default, as an XML Query expression
to produce a LAMBDA expression of no arguments comprising the equivalent
XPA:PATH S-expression form. Other terms may be specified. The result
expression may be saved and loaded independent of the parsing environment. It must
be compiled prior to application, in an envidonemtn which binds any prefixes present
in the path expression.The function produces the query result when invoked. |
|
The LISP-binding is implemented as a collection of forms and utility functions together with a library of primitives. Where no XQuery specific function is defined, the functions from the XPath library are available.
|
|
[Top] |
The directory "Tests" includes several test files.
|
|
[Top] |
Support for XML Schemas is planned. The data and type model would need to be extended. At present it suffices for those aspects of XML Schema required by XML Query, but additions would be necessary.
Complete support for schema data types. Port ATN-regex code for use with text patterns.
Would be nice.
One will have to complete definitions in the following files (look for ALLEGRO / LispWorks / MCL conditionals)
Of the files in the xml:tests directory, one might look first at test-xml-document.lisp, followed by test-xml-oasis.lisp
There is a degree of interdependance in the modules.
|
|
[Top] |
|
|
[Top] |