(in-package :web-user)

CL-XML: How To: Parsing Documents

20031024
james anderson



[loading] [parsing] [accessor functions] [paths] [path methods]

(defVar *dm* nil)

the primary interface function for parsing is parse-document. It is a generic function with a method which which accepts streams and parsed the content tp produce a document data instance. in addition, it comprises methods which accept universal resource identifiers, or, more conveniently, strings designators for URI and operate on the indicated resource. it is possible to parse a document from any of various sources.

parse a document from an http server by passing an http uri designator. nb: the illustrated address,127.0.0.1 is the tcp "loopback" address, which means that the howto file must be present in the local server's root in order for the retrieval to succeed.

(setq *dm* (parse-document "http://127.0.0.1/howto.xml"))

== #<DOC-NODE <no uri> #x186EC1E>

(describe *dm*)

== #<DOC-NODE <no uri> #x186EC1E>
Class: #<STANDARD-CLASS DOC-NODE>
Wrapper: #<CCL::CLASS-WRAPPER DOC-NODE #xD53A26>
Instance slots
PARENT: NIL
CHILDREN: (#<ELEM-NODE ||::|inventory| 1 #x18710D6>)
ENTITY-INFO: #<ENTITY-INFORMATION-NODE ||::|| #x1874666>
ROOT: #<ELEM-NODE ||::|inventory| 1 #x18710D6>
STANDALONE: T
VERSION: NIL
NOTATIONS: #<HASH-TABLE :TEST EQL size 0/60 #x186EF8E>
XML-QUERY-DATA-MODEL::IDS: #<HASH-TABLE :TEST EQL size 0/60 #x186F266>
GENERAL-ENTITIES: #<HASH-TABLE :TEST EQL size 5/60 #x186EBC6>
PARAMETER-ENTITIES: #<HASH-TABLE :TEST EQL size 0/60 #x186F816>
TYPES: #<HASH-TABLE :TEST EQL size 0/60 #x186FAEE>
ATTRIBUTES: NIL
XML-QUERY-DATA-MODEL::VALIDATE: NIL
NAMESPACES: NIL

parse a document which is located in the file system by passing a file url designator.

(setq *dm* (parse-document "file://xml/documentation/howto/howto.xml"))

parse a document which is contained in a string by passing the string. Note that the initial character mut be a #\< in order that the string is recognized as a literal document. note also, in this care, must be taken to escape delimiters or use apostrophe rather than quotation delimiters.

(setq *dm* (parse-document "<inventory title='OmniCorp Store #45x10^3'> <section name='health'> <item upc='123456789' stock='12'> <name>Invisibility Cream</name> <price>14.50</price> <description>Makes you invisible</description> </item> <item upc='445322344' stock='18'> <name>Levitation Salve</name> <price>23.99</price> <description>Levitate yourself for up to 3 hours per application</description> </item> </section> <section name='food'> <item upc='485672034' stock='653'> <name>Blork and Freen Instameal</name> <price>4.95</price> <description>A tasty meal in a tablet; just add water</description> </item> <item upc='132957764' stock='44'> <name>Grob winglets</name> <price>3.56</price> <description>Tender winglets of Grob. Just add water</description> </item> </section> </inventory>"))

alternative to the file url, a pathname designator may be used to identify a file source.

(parse-document #4P"xml:documentation;howto;howto.xml")

or from a stream

(with-open-file (stream #4P"xml:documentation;howto;howto.xml") (parse-document stream))

in addition to the common forms, the parser accepts alternatives such as byte vectors, and url instances.

(mapcar #'class-name (mapcar #'first (mapcar #'method-specializers (generic-function-methods #'document-parser))))

== (STREAM PATHNAME DATA-URL HTTP-URL FILE-URL VECTOR STRING)

namespaces are handled transparently, in that names are automatically interned into packages or namesets. by default, the null namespace is modeled with a concrete nameset with the name "". in event that it is desired to intern those names in some other nameset, the *null-namespace* should be bound to the desired package/nameset while the document is parsed.

(let ((*null-namespace* *package*)) (name (root (parse-document "<element/>"))))

== WEB-USER::|element|

:eof