Major Relational Database Vendors and How They Support XML
January 16, 2004
I wrote the following article for EContent magazine late in 2002, and I am re-examining its conclusions now that I am taking a fresh look at RDMBS engines from Microsoft, Oracle, and elsewhere. Do the major RDBMS vendors do enough to support XML, or is there still a case for dedicated XML repository technology?
Now that XML has moved beyond being the latest cool thing, and is in fact being widely adopted and deployed, some practical questions are being asked about it. But these questions are only starting to be answered. Perhaps the biggest question about XML is, "Now that I've got it, where am I supposed to keep it?" Some of the big database players think they've got the answer. Organizations are replete with storage technologies: relational databases, file servers, and document management systems, to name a few. And, perhaps to no one's surprise, XML data is found in all of these places and more. There are also newer, specialized technologies specifically designed for native XML storage.
Yet, for most organizations, relational databases are the dominant mechanism for storing and managing data. Moreover, there is great concentration in the relational database market (with technology from Oracle, IBM, and Microsoft dominating). Given this concentration of technology and vendors, it's worth looking at what these vendors plan to do about XML. Specifically, it's worth looking at each vendor's flagship database products: Oracle's 9i Database, IBM's DB2, and Microsoft SQL Server.
It's clear why these key players are taking XML seriously: The market for XML storage is a big one. According to the analyst firm ZapThink, the market for XML storage will grow from $75 million in 2000 to over $4.1 billion in 2005. And while the relational database vendors currently consume only 15% of the XML storage market, that percentage will grow to 65% by 2005. That leaves plenty of money for the specialized XML vendors to make, but it also means that the relational databases will be storing plenty of XML for years to come.
XML Versus Relational Data
The distinctions between XML and relational data are by now widely discussed and, for the most part, well-understood. But regardless of what a salesperson may be telling you this week, the differences are fundamental. Relational data is all about tidy rows and columns of well-understood, previously defined chunks of information--like names, addresses, prices, and product codes. People have come to use the word "structured" to refer to relational data, and the term makes sense.
XML data can also be somewhat structured. A set of names and addresses can be represented, perhaps equally well, as both relational data and XML data. But XML has two fundamental differences: 1) XML can embed hierarchies of parent-child relationships in ways that relational data cannot; and 2) XML doesn't care a lick how long or complex a given "field" or "record" is, while relational data is all about how long and complex the fields and records are.
Take the extreme (but not all that unusual) case of a lengthy technical document coded in XML. The entire "record" or XML document can be megabytes in length. It can consist of many parent and child nodes. Thus, an XML document is not likely going to fit neatly into columns and rows. As a result, XML data can be an odd fit in a relational database. So, Oracle, Microsoft, and IBM have been working hard to extend their products to better ingest, store, manage, and manipulate XML data.
To begin with, all of the major vendors have improved on an already available method of storing large chunks of data as a means of better supporting XML. The so-called BLOB (Binary Large Object) space in a relational database can be used to store large XML documents, and the vendors have refined these to differentiate BLOBs from CLOBs (Character Large Objects). Using BLOBs or CLOBs, whole XML documents can be securely moved in and out of a database, and secondary tools, such as an XML parser, can then be used to manipulate the XML as it is moved in and out of the BLOB.
For Robert Shimp, vice president for Oracle 9i Database marketing, the emergence of XML is part of the broader problem enterprises face as the growth and importance of "unstructured data" begins to rival the growth and importance of structured data. "Organizations are looking for a unified view of their data, both structured and unstructured," said Shimp. Moreover, according to Shimp, organizations suffer from a proliferation of too many data sources, many of which are too loosely managed. And this loose proliferation of assets is not good for companies, as it makes it difficult for them to efficiently manage and act on their intellectual capital. "It would be analogous to the CFO of a company handing out $100 of the company's money for each employee to manage," noted Shimp, "with no controls on how each employee would do it."
Solving the "Single Source" Publishing Problem
For organizations that have significant amounts of content, managing XML data becomes even more important. Publishers and others with large content stores are looking to solve the "single source" publishing problem, where they increasingly rely on both XML and structured data to be rendered into HTML, WAP, and other formats--often on-the-fly. Already, such automation could involve tying together many repositories, where a rendered HTML page could be derived from both structured and unstructured sources. In a manufacturing application, this could be a parts catalog where price and inventory data comes from a relational database and the product descriptions come from a document management system. In a magazine publishing application, this could be where the article content originates in a content management system while a related directory listing comes from a relational database.
Creating such a unified view of both unstructured and structured data is precisely where the major vendors see their offerings headed. Oracle, for instance, talks about "unifying...business data...and XML content," and IBM talks about "combining XML...and the power of data integration." And all of them, Microsoft included, are embracing the broader notions of Web services, where content and data are integrated over the Internet, using loosely-coupled components and XML as the all- purpose glue.
Besides the single-source publishing problem, other factors are driving the need for XML storage. ZapThink's research points to the growth in Web services, the increased use of XML for messaging, and the need for improved searching and querying of the XML. Taken together, these drivers suggest a growing need for storage technologies that provide more sophisticated management of XML data.
Looking Under the Hood
Database vendors are working to make their products function better with XML. While the products and approaches differ-- and the big three all have both new commercial offerings and significant R&D under wraps--the approaches have some things in common.
To varying extents, they all rely on technologies for mapping the XML data to the relational fields, and back again. For example, IBM is developing "Extenders" for its DB2 database that will allow developers to map XML data to DB2 tables, and back again, and Microsoft has a programming facility called SQLXML for mapping and querying between, as the name suggests, SQL and XML. Oracle would argue--and industry analysts would tend to agree--that their mapping technologies are more deeply embedded in the product, especially with Oracle 9i, Revision 2, which is now generally available.
The major vendors all fully support the more stable XML standards, such as the core XML syntax, though they vary in their support of emerging standards. So the different products can parse the XML, at least against a document type definition (DTD), and in some cases against an XML schema. The products can also use XPath to traverse the hierarchical structure of the XML, but many have stopped short of supporting newer, developing standards, such as Xquery and other emerging standards for querying. Ron Schmelzer, senior analyst at ZapThink, follows XML data storage closely, and sees such standard support as critical to differentiating the various product offerings. Whereas relational database systems use SQL for querying, Schmelzer points out that SQL simply doesn't work "as well with the hierarchical nature of XML documents." As a result, said Schmelzer, "a number of initiatives exist to deal with XML-centric data query, insertion, and update operations."
And all of the vendors support emerging programming languages and application programming interfaces (APIs). The big three support Java for database access and connectivity, and emerging APIs for processing XML, such as the Document Object Model (DOM) and the Streaming API for XML (SAX). The emphasis, correctly, seems to be on giving software developers a ready toolkit for accessing, manipulating, retrieving, and updating XML data, and quickly transforming it to other forms--HTML, relational, other forms of XML, and so on.
Integration as a Crutch?
This last point--integration--is a focus for several of the vendors, notably IBM and Microsoft, both of which are heavily invested in marketing software development tools and methodologies. IBM as well has a huge professional services business, a large chunk of which is dedicated to database and XML integration. The rollout of the Web has meant an explosion of database integration and access, and the continued growth of XML will only accelerate this trend.
Oracle's Shimp, among others, would caution that application integration is only part of the problem, that there is underlying and more fundamental data analysis and modeling that needs to be done. In a situation where the data stores have multiplied (often for reasons of expediency), Shimp reasons that simply integrating the various databases may be a "crutch" to avoid the harder work that is being left undone. Indeed, the increased mix of data types--relational, nonrelational; structured, unstructured; and XML especially--have brought a new challenge to organizations. This challenge is to truly analyze all the data and develop a more unified and comprehensive data model. XML isn't so much a new problem, as a new and complex dimension on an existing problem.
The Data Model is the Key
Yet while all organizations would be wise to invest in this kind of comprehensive data modeling, the organization that has a lot of XML data indeed has some unique problems on its hands, and perhaps extra motivation to take a step back and analyze things. By its very nature, XML data is going to be different, and is going to require some different integration and handling. If you have various business databases, and a large store of XML, you likely are going to require at least separate instances of a relational database. For example, you could have all of your business or transactional data in one database, tuned to maximize the performance of that data. Your XML data could then reside in a second relational database that supports XML, such as the products from Oracle, IBM, and Microsoft.
These companies would likely argue that a single-vendor solution is preferable, and that, of course, their solution would be best. However, the reality is that you likely have many data sources already, from different vendors, and will likely live with some of these for some time to come. So while a comprehensive data model and more monolithic solution may be in your future, you will likely still have to knit some things together to create a comprehensive solution, at least for now.
Posted by Bill Trippe at January 16, 2004 9:31 PM








