Program
BenchmarX'09 is scheduled on April 20, 2009. It will consist of three parts - an invited talk given by Stephane Bressan on
Current Approaches to XML Benchmarking (see below) and two sessions consisting of accepted papers (see the list of abstracts of Accepted Papers). The program schedule of
the DASFAA conference can be found here.
| Start | End | Content |
| 8:30 | 9:00 | Welcome tea/coffee |
| 9:00 | 10:30 | Invited Talk: Stephane Bressan: Current Approaches to XML Benchmarking |
| 10:30 | 11:00 | Break |
| 11:00 | 12:30 | Session 1:
|
| 12:30 | 13:30 | Lunch |
| 13:30 | 15:00 | Session 2:
|
| 15:00 | 15:10 | End of BenchmarX'09 |
Stephane Bressan
Stephane Bressan is Associate
Professor in the Computer Science department of the School of Computing (SoC) at the National
University of Singapore (NUS). He joined the National University of Singapore in 1998. He is also
adjunct Associate Professor at Malaysia University of Science and Technology (MUST) since 2004.
He obtained his PhD in Computer Science from the University of Lille, France, in 1992. Stephane was research scientist at the European Computer-industry Research Centre (ECRC), Munich, Germany, and at the Sloan School of Management of the Massachusetts Institute of Technology (MIT), Cambridge, USA. Stephane's research is concerned with the management and integration of multi-modal and multimedia information from distributed, heterogeneous, and autonomous sources. He is author and co-author of more than 100 papers. He is co-author of the XOO7 benchmark for XML data management systems. Stephane is member of the XML working group of the Singapore Information Technology Standards Committee (ITSC) and advisory member of the executive committee of the XML user group of Singapore (XMLone).
Current Approaches to XML Benchmarking
The performance evaluation of XML-based systems, tools and techniques
can either use benchmarks that consist of a predefined data set and
workload or it can use a data set with an ad hoc workload. In both cases
the data set can be real or synthetic. XML data generators such as
Toxgene and Alphawork can generate XML documents whose characteristics,
such as depth, breadth and various distributions, are controlled. It is
also expected that benchmarks provide data generator with a fair amount
of control of the size and shape of the data, if the data is
synthesized, or offer a suite of data subsets of varying size and shape,
if the data is real. Application level evaluation emphasizes the
representativeness of the data set ad workload in terms of typical
applications while micro-level evaluation focuses on elementary and
individual technical features.
The dual view of XML, data view and document view, is reflected in its
benchmarks. There exist several well established benchmarks for XML data
management systems that can be used for the evaluation of the
performance of query processing. The main application level benchmarks
in this category are XOO7, XMach1, XMark, and XBench while The Michigan
Benchmark is a micro-benchmark. For the evaluation of XML information
retrieval the prevalent benchmark is the series of INEX corpora and
topics. However, in practice, whether for the evaluation of XML data
management techniques or for the evaluation of XML-retrieval techniques,
researchers seem to favor real or synthetic data sets with ad hoc
workloads when needed. The university of Washington repository gathers
links to a variety of XML data sets. Noticeably most of these data sets
are small. The largest is 603MB. Popular data sets like Mundial or the
Baseball Boxscore XML are much smaller. The Database and Logic
Programming Bibliography XML data set, also used by many scientists, is
around 500MB. All of these data sets are generally relatively structured
and quite shallow thus not necessarily conveying the expected challenges
associated with the semi-structure nature of XML.
If the application level data sets and workloads are not satisfactory,
It may well be the case that XML as a language used to structure and
manage content has not yet matured. We must ask ourselves the question
as to what is there really to benchmark. As of today, XML data are most
commonly produced by office suites and software development kits. Office
suites supporting Office Open XML and in Open Document Format are or
will soon become the principal producers of XML. Yet in these
environments XML is principally used to represent formatting
instructions. Similarly, the widespread adoption of Web service
standards in software development frameworks and kits (in the .Net
framework, for instance) also contributes to the creation of large
amounts of XML data. Again here XML is primarily used to represent
formats (e.g. SOAP messages).
Although both XML-based document standards and Web service standards
have intrinsic provision for XML content and have been designed to
enable the management of content in XML, few users have yet the tools,
the wants and the culture to manage their data in XML. Consequently, at
least for now, it seems that these huge amounts of XML data created in
the background of authoring and programming activities need neither be
queried nor searched but rather only need to be processed by the office
suites and compilers. The emphasis is still on format rather than
content structuring and management. Of course, it is hoped by proponent
of XML as a format for content that the XML-ization of formats will
facilitate the XML-zation of the content.
With XML-based protocols and formats, XML as a "standards' standard" (as
there are compiler compilers) has been most successful at the lower
layers of information management. The efforts for content organization
and management, on the other hand, do not seem to have been as pervasive
and prolific (in terms of the amount of XML data produced and used). For
instance, the volume of data in the much talked about business XML
standards (Rosettanet or Universal Business Language, for instance) is
still difficult to measure and may not be or become significant.
In this presentation we critically review the existing approaches to
benchmarking of XML-based systems and applications. We try to analyze
the trends in the usage of XML and in order to determine the needs and
requirements for the successful design, development and adoption of
benchmarks.

