WWW9 Devday: XML Track

www9 home
Program Information
Past Conferences

Hotel Information

Contact Us

Organizer: Jon Bosak, Sun Microsystems

Session 1 (0900-1030): The XML Tool Set

The apache.org XML project -- James Davidson, Sun Microsystems

A report on the apache.org XML project.

Libxml, an OpenSource general purpose XML library -- Daniel Veillard, W3C

A report on the ongoing effort described at http://xmlsoft.org/.

XML Conformance Testing -- Mary Brady, NIST

Emerging web applications need flexible methods for defining, exchanging, and displaying domain-specific data. The Extensible Markup Language (XML) provides a standards-based approach to defining and exchanging data. Companion standards, such as the Document Object Model (DOM) and the Extensible Stylesheet Language (XSL), provide methods for manipulating and displaying this data. Virtually all application domains are looking to use XML and its related technologies to define, manipulate, exchange, and display structured information. In addition, XML processors and support for DOM and XSL are beginning to appear in beta versions of popular web browsers and application development software. The widespread availability of these technologies have made them obvious choices as building blocks for electronic commerce. As such, determining whether a product faithfully implements the W3C Recommendations will be essential to creating robust, interoperable solutions.

The National Institute of Standards and Technology (NIST) and The Organization for the Advancement of Structured Information Standards (OASIS) are addressing XML-related interoperability issues by jointly developing test suites that are intended to complement the W3C Recommendations. These tests can be used to determine whether a product faithfully implements the appropriate W3C Recommendations. This work is accomplished under the auspices of OASIS Technical Subcommittees, with contributions from OASIS member companies.

The first of these test suites, the XML test suite, contains over 1000 test files and an associated test report. Accompanying the actual test files is an XML test description file that contains pertinent information for each of the tests included in the distribution. An XSL stylesheet is used to generate an XML Conformance Test Report, which includes background information on conformance testing for XML and information that is useful in determining that each test is well-grounded in the recommendation. This test suite has been used on a variety of XML processors, and results have been published by Dave Brownell on XML.COM.
In much the same fashion, work has also progressed on a DOM test suite. Currently, there exists over 1000 tests that can be used to test the fundamental, extended, and html ECMAScript and Java bindings. Work has just begun on a third area, XSLT.

In this session, we will give an overview of conformance testing and each of the available test suites. In addition, we will discuss particular problems that we have uncovered as a result of developing the tests, and give an indication of how various implementations fare against the available test suites.

Session 2 (1100-1230): Accessing XML

Accessing Repositories Using XPath and the DOM (presentation)
-- Jeroen van Rotterdam, Connection Factory

Currently the DOM interface for storing and accessing XML-based data is limited to the scope of individual documents. XML-OO repositories provide an environment for XML storage and retrieval beyond document boundaries, which is a necessary step towards our objective: managing large XML databases.

W3C provides the DOM-level II interface for operations within documents, whereas cross-document (repository-level) operations are provided as a set of natural extensions to DOM.

A full XPath implementation is currently used as an additional means for data retrieval from XML documents. Unlike other implementations, which perform their query evaluation in-memory, it imposes no limits on the size of queried documents. This proved to be a challenging characteristic where speed, memory management, and locking are concerned.

As with DOM, XPath is document-based. Current research involves the extension of XPath to enable specification of cross-document queries within document repositories.

XML.org: a progress report on the OASIS registry & repository (presentation)
-- Una Kearns, XML Architect, Documentum

A key requirement for the successful use of XML on the Internet, including implementation of any XML business solution, is ensuring the schema driving the application is openly and consistently available to all users (whether human or machine based) of that schema. The talk will provide a report on the important work at the OASIS XML Portal, www.xml.org , to develop and implement a Registry and Repository for the registration, storage and supply of XML schemas and vocabularies. The implementation is the first to be built on the open specifications published, in November 1999, by the OASIS Regrep Technical Committee.

X2X - A Revolution in XML Linking Technology -- Graham Moore, STEP

Abstract: X2X is an exciting new technology that facilitates the creation and maintenance of out of line links. Constructed using the fundamental abstract concepts of linking, yet supporting the XLink syntax X2X has the power to help build sophisticated information products.

There is a demand from information providers and information uses to have more appropriate access to information. In the same way that content is styled for individual profiles so there is a demand to present different links for individuals. Out-of-line linking provides a mechanism to do this and much more.

The X2X technology has been designed as with a number of goals and requirements in mind:
  • To provide link management in the same way that there exists Content Management. This enables the role of knowledge engineers and removes the responsibility for creating links from the content authors.
  • To Support the 'Resolution of Resources', this is the merging of out-of-line link information with a resource as though the links had been authored in-line.
  • To be a 'glue technology'. X2X can integrate with any repository, any link creation tool and any web server through its careful design and extensible architetcure. It is intended to be a neutral and integrating technology. This is in stark contrast to many of the existing content management closed world systems.
  • To make X2X available to everyone and in particular that the link information persists in a common form without requiring costly investment by users. To this end X2X is developed in Java, uses JDBC with a supporting relational database, such as SQL Server or Oracle 8i. It has been utilised on NT, Linux and UNIX. X2X also utilises the Java Hotspot VM for accelerated performance.

The presentation of this technology will demonstrate the power and usefulness of out-of-line linking, it will discus the architecure and concepts that underly linking and will also demonstrate the use of X2X in a complete authoring, integration and delivery scenario.

Session 3 (1400-1530): Acting on XML

Can a fully standards-based application exist? -- Daniel Rivers-Moore, RivCom

This presentation argues that, while the XML family of standards provides 80% of what is needed to build a completely standards-based application, there are 2 missing pieces to the puzzle - a binding from the editable objects in a user interface to the place in the XML data structure to which they correspond, and an XML-based language to specify the application’s behaviour. If these two pieces were added to the XML family, completely standards-based applications could be built. To demonstrate this thesis, a prototype application will be shown that uses XPath statements and a minimum of "glue" to fill the first gap and an XML "action language" developed as part of the European XML/EDI Pilot Project to fill the second. The application is fully configurable for language (English and Finnish in the example shown) and device (browser and WML-aware mobile phone). The application is built from a series of XSLT transformations between data structures, and its data components are fully reusable. In addition to HTML and WML, it is based on just 3 DTDs, two of which are domain-independent. The domain-specific DTD is the one used for data transfer, and is itself derived from an industry standard.

Worldwide Botanical Knowledge Base -- Jean Marc Vanel

to make available botanical data on Internet :
description of the species, including pictures, geographical distribution.
Currently these basic scientific data, which moreover refer to a threatened inheritance of humanity are in:
- publications on paper, little diffused,
- disconnected databases, generally without biological information and pictures,
- regional floras, on paper, subject to copyright restrictions.
Thousands of species are disappearing forever, sometimes even before description. It would be irresponsible (and absurd) not to make an inventory of the biological inheritance of our earth. How can we preserve what is poorly known? Knowledge which is too difficult to access is unusable.
How is it possible that such a desirable and feasible project hasn't been done?
- funds go rather to biotechnologies than to descriptive biology,
- taxonomists are not aware of the huge possibilities of computers and networks, and loose a lot of energy in solving the small problem of synomyms and accepted names,
- software engineers don't have the biological knowledge.
How to start
Hope that lots of software engineers will join the project
Funding will follow when enough results are to see
This is a great project for humanity
This is also a great, far-reaching, and enjoyable software project.
Botany, Zoology, Ecology and the Semantic Web
- express relations between plants and animals in a formal and flexible way
- plants are just a beginning: insects alone have 1 000 000 species
- all kinds of properties, and properties about properties
- XML is not just to catalog consumers's preferences and locate the cheapest merchandise
A free sofware / free information project
- nobody can own nature
New kind of software needed
multi-everything browser will mainly be an empty shell
- able to call the appropriate processors whenever it sees certain XML namespaces and/or Processing Instructions.
- multi-domain documents.
- manage drag'n drop and clipboard with an XML data model.
- editor with the same multi-domain capabilities.
- manage the display space between XML processors (tiling, resize, ...).
- manage the mapping between raw XML and displayed XML transformed by XSLT
- Generic display skills are also desirable:
- collapsable tree/graph views for the document tree, the inheritance graph, the ID/IDREF graph
- extended search/query
- using a standard dictionary (e.g. wordnet) and some AI techniques will enable to treat well-formed XML with a natural vocabulary
- general and modular tool for manipulating data, of the 3 main kinds:
- document-oriented (HTML & word processor)
- structure-oriented (database type)
- knowledge-oriented (semantic network, AI, RDF, etc)
- The next killer-app ... A role for Mozilla ? or Gnome/KDE ? or will the next Microsoft Wave submerge all ?
Imagine you're in nature, with a portable computer running the botanical database, with a camera, a GPS, and a wireless Internet connection.

Suddenly you meet a remarkable plant; you show it to the computer, which asks you two questions about the number of carpels, and the shape of hairs (answers needs a cutting of the ovary, and lenses). The computer tells you that this a new location of Strasburgeria robusta, which was thought to exist only in New Caledonia. You are proposed to send e-mails to the specialists of the Strasburgeriaceae, and of the region, and to collect a herbarium specimen. Meanwhile this discovery, complete with images and geographical coordinates, is sent to the global database, and the updated repartition map appears on the screen.

The data exists
- more than 90% of the 250 000 species are on Floras on paper
- herbarium images
2D and 3D images:
- vectorized images (SVG) from bitmap images
- 3D images (X3D, etc) generated from several pictures through stereoscopic software
- very compact representations adapted to growing beings having recursive structure: L-Systems, etc
- Artificial Vision techniques
- distributed knowledge on the Web
- AI techniques and exchange XML formats/protocols for AI
- authoring side
- new ways of working for taxonomic scientists: new species created by Web transaction, etc
More on our site: http://wwbota.free.fr/

Inside an XSLT Processor (presentation)
-- Michael Kay, ICL

XSLT is here, there's a handful of implementations some more complete than others, and it's starting to be used by a growing band of enthusiasts, not to mention beginners who all ask the same perfectly-sensible questions. This is a talk about the XSLT language from the viewpoint of an implementor. It talks about what an XSLT processor looks like from the inside and about which features of the language cause difficulty - especially performance difficulty. It will talk about some of the optimization techniques that need to be developed to make XSLT processors run at reasonable speed, and will speculate on where significant progress is possible and where it isn't. It will offer suggestions for changes or enhancements to the language that will improve both its usability and its performance by an order of magnitude.

Michael Kay works as a systems architect for ICL, the IT services company, and is best known in the XML world as the developer of the open source product Saxon, the first tool to achieve 100% conformance with the XSLT and XPath 1.0 specifications*. His background is in database technology.

Session 4 (1600-1730): XML Publishing

XML in Opera

The Opera browser has established itself as the "third browser" on the Web due to its speed, size and support for standards. Opera is full-featured, yet fits on a single floppy. The presentation will describe how XML was added to Opera and why the size of the executable was reduced in the process. By adding a simple XML parser (expat) and combining it with Opera's CSS2 formatting engine, generic XML documents are fetched from the Web and rendered at a very high speed. Issues that will be discussed include Opera's support for progressive rendering, style sheet linking, hyperlinks and replaced elements (e.g. images). A version of Opera with XML will be available at the time of the conference.

RenderX XSL FO Formatter -- David Tolpin and Nikolai Grigoriev, RenderX

RenderX has developed a formatter for converting XSL Formatting Objects to PDF or PostScript. Its main features are:
  1. Support for most functionality included in XSL FO, plus some extensions.
  2. Modular engine structure:
    - a separate parser/inheritance resolver module;
    - pluggable output generation modules for different formats.
All this is up and running (see results/demos at www.RenderX.com), and will be presented by the key developers.

Cross-Media Publishing Using XML and XSLT -- Michael Wechner, Wyona

Cross-media publishing is very expensive. But XML, XLink, XPointer and XSLT are greatly simplifying cross-media publishing. Real-world examples from the newspaper industry will be demonstrated. The Exchange of news by using markup languages such as NewsML, NITF and XMLNews and the generation of various output formats such as HTML, WML and PDF will be discussed. The case of scientific publishing using MathML, HTML and LaTeX will be described and examples showed.

Updated: November 28, 2000