Storing your XML

September 3, 2003

Many organizations are now working with XML data in one or more applications. As the use of XML grows, an important question arises&—where should XML data be stored?

I originally wrote about this in February of last year for Transform magazine. It’s interesting that the primary argument still holds up more than a year and a half later; some of the vendors have changed of course.

If you take even a cursory glance at the XML storage market, you will see many vendors vying for your attention and your dollars. These include major database companies like Oracle and Microsoft, and a long list of companies with specialized XML repositories. The various products reflect completely different approaches to data storage and management, and understanding them in detail will require your technical staff to dig into some subtle and complex technical questions.

In fact, XML has reopened some fundamental questions of data storage that some people, certain vendors especially, had felt were already answered. For many years, object-oriented database vendors argued that their systems were superior for storing document data, but they never gained much marketshare or mindshare against the giant relational database vendors like Oracle. Customers seemed to accept the argument that relational databases would do the job well enough, and IT organizations wanted to shorten, not lengthen, the list of technologies in use and under maintenance.

Have things changed now where you should consider a specialized database for XML storage? The answer, of course, is it depends. It depends on three things mainly--how much persistent XML data you have, whether you need high-performance, real-time access to it, and what kinds of querying you think you might want to do with the data. Let's take a quick look at each of these.

Persistent XML. Applications use XML in one of two ways--as a source format, or as something that is created and used for exchanging data between two data sources, usually temporarily. But you may well have source XML data that needs updating and maintenance. The more XML data you have, and the more user interaction and updating it requires, the more you may need a specialized tool for storing it.

Real-time access. If the data is stored in a format besides XML, and you need instantaneous access to it in XML format, then access could be slowed by requiring the application to go through transformation processes. A database that is optimized for XML storage will provide better performance, so if performance is key, you should consider an XML repository.

Querying. Relational databases support SQL for querying relational data, of course, but XML data cannot be queried with SQL. SQL is designed for the row-and-column orientation of relational data, and both end users and developers are comfortable with its approach. XML data, with its mix of elements, attributes, and textual data, requires a different approach. Instead, XML data is queried with tools based on XPath. While relational database vendors such as Oracle have basic XPath support, you may need a specialized repository if you have extensive and complex requirements for XPath-based querying.

If you do have one or more of these requirements, then it makes a lot of sense for you to look at specialized tools for storing your XML. And while there are many players in the space, there are two clear market leaders, Software AG with its Tamino XML Server (http://www.softwareag.com/tamino/), and Ixiasoft Corporation (http://www.ixiasoft.com/) with its TEXTML Server (XIS). Both companies bring a lot to the table. In Software AG's case, they are a long-established database company, having introduced the Adabas product some 30 years ago. They have made an aggressive and very strong entry to the market with Tamino. In Ixiasoft's case, they are a newer vendor with a sole focus on this technology.

This is not to say you would be saying goodbye to Oracle or Microsoft anytime soon. For one thing, you will still have many uses for relational data and the tools that work best with it. But as your use of XML increases, your need for specialized tools will also likely increase. This will be especially true if you have large volumes of XML data, real-time processing needs, or complex queries to run. Those are the same questions you ask of your database today, and will ask of your XML database tomorrow.

UPDATE (04/09/06): This was written a few years ago, but the general ideas still apply. I would say, since I originally wrote this, MS SQL Server has come on as more of a presence in storing XML, and Mark Logic's XML content server has carved out an impressive chunk of this market. And as Dave Kellogg, the CEO of Mark Logic noted recently, the question of XML storage is still a very live one for IBM and Oracle.

Posted by Bill Trippe at September 3, 2003 5:19 PM

support this blog