One of the approaches to XML processing is a representation of XML as a tree. Navigation over an XML tree is performed using XPath. XPath is a basis for XML transformation languages XSLT and XQuery. XSLT is a template language, and XQuery can be considered as a highly extended XPath.
XML can be considered as an external representation of in-memory tree-like structures, and XML-related standards - as methods of processing such data. Some types of applications can benefit from these approach. For examples, such applications are
Complex hierarchical queries are very important in such applications. Examples of possible queries are:
Straightforward realization of such applications is often a hard work because a programmer concentrates on low-level details instead of concentrating on a real task. Instead of low-level coding we suggest to use an embedded XQuery. We suppose that this approach simplifies application development.
The main problem in implementation of an XQuery system is a big amount of work. It is not reasonable to write XQuery each time from scratch. Instead, it is reasonable to create an portable implementation and port this implementation with small enough efforts.
Our approach is to develop a virtual machine for tree processing and an XQuery system on top of this virtual machine. In case some application need embedded XQuery support, it is enough to realize only the virtual machine.
In order to simplify development, we write and debug code using the Common Lisp language2. The development is based on the CL-XML system. James Anderson, the author of the CL-XML, already have done a lot, so CL-XML can process examples from the XQuery specification.
CL-XML translates XQuery into an internal representation in a form of an abstract syntax tree (AST). AST can be used for high-level optimization of queries.
Most of the language constructions of Common Lisp are realized using macros. If man takes CL-XML and expands all macros, then man gets a program which consists of primitives only. A set of the produced primitives is a draft version of a virtual machine.
Realization of a dialect of a minimal Lisp (or Scheme) is a simple enough task because this task is well known and investigated.
A special interest is a translation of our virtual machine into the LLVM - an intermediate representation and a set of tools for program optimization. Due to LLVM and high-level optimization, an automatically generated code for XQuery evaluation may be more efficient than a hand crafted code.
The project status is an idea. Right now only a plan of work, first experiments and selection of tools are done. Intermediate results will be uploaded on the home page of the Protva XQuery project. Results will be distributed under the terms of the LGPL license.
The conference talk will demonstrate a draft of a virtual machine and the first results.
A first planned milestone3 of the project is addition of an XQuery support into the XSLT-processor xsltproc.
1A thesis for the First Conference of Free Software Developers on Protva River, Obninsk, Russia, 2004. The original Russian version is available at "http://xmlhack.ru/protva/xquery-thesis.pdf".
215 August 2004: Now we work with Scheme and SXPath/SXQuery instead of Common Lisp and CL-XML. Anyway, all the following ideas are still the same.
315 August 2004. Now it is a second milestone. The first one is an XQuery support for the Python AST.