January 26, 2004

EForms Discussion at the Gilbane Conference

As part of the Gilbane Conference on Content Management, I will be moderating a session on EForms entitled, "Electronic Forms and Content Management." (In fact, it's going to be two sessions back to back.) I'm excited about the event because I think we have exactly the right people involved in the session. The three speakers are:

We are still thinking through details of the event, but the following is the proposed outline of the session that I sent to the three speakers.

The extra time is going to allow us to provide a much more instructive session. I have been thinking of the following format:

The problem application* will be something we all agree on. I was thinking of applications such as e-commerce, Web site registration and membership, or perhaps an on-line questionnaire. I am open on this, though, as long as it is a fairly complex, common, and meaningful application that will allow each of you to highlight your core approach to e-Forms development and what you think are some advantages of your technology. For this purpose, I would ask Micah to take off his Cardiff hat and wear his XForms hat instead, but we should at least let him mention Cardiff.

Let me know your thoughts as well. What would you like to hear from the speakers? What sorts of demonstrations or explanations would be most useful to you?

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 8:18 PM

January 23, 2004

Correct Feature List for XML Editors?

I have been looking in detail at the commercial XML editing tools on the market. One of the things I would like to include in the CMSWatch report is a feature matrix, showing how the various tools compare in key features. The following is a first cut at the feature list.

XML Editor Features

OS Windows Mac Linux Solaris VALIDATION XML Validation SGML Validation DTD Support W3C Schema Support Relaxing Schema Support Namespace Support Xinclude XML Catalog Interactive Validation Batch Validation Edit/manage Entities Edit/manage CDATA Edit/manage Attributes Tag/Attribute help and support in context Support special characters and character entities TABLES and MATH WYSIWYG CALS Tables Editing WYSIWYG HTML Tables Editing WYSIWYG MathML Editing EDITING INTERFACE WYSIWYG Editing Source View Editing Grid View editing Tree view editing Pretty printing of XML Syntax coloring Multiple Editing Windows EDITORIAL FEATURES Spell checking Grammar checking Collaboration features Versioning Document Compare Document Merge Search and Replace Search and Replace with Regular Expression Search and Replace with XML Context Multiple-level Undo/Redo PLUG-INS and SUPPORTING APPLICATIONS XSLT Engine XSL-FO Engine Raster Image Editing Raster Image Display Vector Image Display Vector Image Editing CUSTOMIZATION FEATURES Menus Macros Tool Bars Keyboard Shortcuts Forms/Interface Designer BROWSER SUPPORT

ActiveX
Java
JavaScript

LOCALIZATION

Menus
Documentation
Dictionary/Spellchecking

Posted by Bill Trippe at 7:45 PM | Comments (3)

January 16, 2004

Major Relational Database Vendors and How They Support XML

I wrote the following article for EContent magazine late in 2002, and I am re-examining its conclusions now that I am taking a fresh look at RDMBS engines from Microsoft, Oracle, and elsewhere. Do the major RDBMS vendors do enough to support XML, or is there still a case for dedicated XML repository technology?

Now that XML has moved beyond being the latest cool thing, and is in fact being widely adopted and deployed, some practical questions are being asked about it. But these questions are only starting to be answered. Perhaps the biggest question about XML is, "Now that I've got it, where am I supposed to keep it?" Some of the big database players think they've got the answer. Organizations are replete with storage technologies: relational databases, file servers, and document management systems, to name a few. And, perhaps to no one's surprise, XML data is found in all of these places and more. There are also newer, specialized technologies specifically designed for native XML storage.

Yet, for most organizations, relational databases are the dominant mechanism for storing and managing data. Moreover, there is great concentration in the relational database market (with technology from Oracle, IBM, and Microsoft dominating). Given this concentration of technology and vendors, it's worth looking at what these vendors plan to do about XML. Specifically, it's worth looking at each vendor's flagship database products: Oracle's 9i Database, IBM's DB2, and Microsoft SQL Server.

It's clear why these key players are taking XML seriously: The market for XML storage is a big one. According to the analyst firm ZapThink, the market for XML storage will grow from $75 million in 2000 to over $4.1 billion in 2005. And while the relational database vendors currently consume only 15% of the XML storage market, that percentage will grow to 65% by 2005. That leaves plenty of money for the specialized XML vendors to make, but it also means that the relational databases will be storing plenty of XML for years to come.

XML Versus Relational Data
The distinctions between XML and relational data are by now widely discussed and, for the most part, well-understood. But regardless of what a salesperson may be telling you this week, the differences are fundamental. Relational data is all about tidy rows and columns of well-understood, previously defined chunks of information--like names, addresses, prices, and product codes. People have come to use the word "structured" to refer to relational data, and the term makes sense.

XML data can also be somewhat structured. A set of names and addresses can be represented, perhaps equally well, as both relational data and XML data. But XML has two fundamental differences: 1) XML can embed hierarchies of parent-child relationships in ways that relational data cannot; and 2) XML doesn't care a lick how long or complex a given "field" or "record" is, while relational data is all about how long and complex the fields and records are.

Take the extreme (but not all that unusual) case of a lengthy technical document coded in XML. The entire "record" or XML document can be megabytes in length. It can consist of many parent and child nodes. Thus, an XML document is not likely going to fit neatly into columns and rows. As a result, XML data can be an odd fit in a relational database. So, Oracle, Microsoft, and IBM have been working hard to extend their products to better ingest, store, manage, and manipulate XML data.

To begin with, all of the major vendors have improved on an already available method of storing large chunks of data as a means of better supporting XML. The so-called BLOB (Binary Large Object) space in a relational database can be used to store large XML documents, and the vendors have refined these to differentiate BLOBs from CLOBs (Character Large Objects). Using BLOBs or CLOBs, whole XML documents can be securely moved in and out of a database, and secondary tools, such as an XML parser, can then be used to manipulate the XML as it is moved in and out of the BLOB.

For Robert Shimp, vice president for Oracle 9i Database marketing, the emergence of XML is part of the broader problem enterprises face as the growth and importance of "unstructured data" begins to rival the growth and importance of structured data. "Organizations are looking for a unified view of their data, both structured and unstructured," said Shimp. Moreover, according to Shimp, organizations suffer from a proliferation of too many data sources, many of which are too loosely managed. And this loose proliferation of assets is not good for companies, as it makes it difficult for them to efficiently manage and act on their intellectual capital. "It would be analogous to the CFO of a company handing out $100 of the company's money for each employee to manage," noted Shimp, "with no controls on how each employee would do it."

Solving the "Single Source" Publishing Problem
For organizations that have significant amounts of content, managing XML data becomes even more important. Publishers and others with large content stores are looking to solve the "single source" publishing problem, where they increasingly rely on both XML and structured data to be rendered into HTML, WAP, and other formats--often on-the-fly. Already, such automation could involve tying together many repositories, where a rendered HTML page could be derived from both structured and unstructured sources. In a manufacturing application, this could be a parts catalog where price and inventory data comes from a relational database and the product descriptions come from a document management system. In a magazine publishing application, this could be where the article content originates in a content management system while a related directory listing comes from a relational database.

Creating such a unified view of both unstructured and structured data is precisely where the major vendors see their offerings headed. Oracle, for instance, talks about "unifying...business data...and XML content," and IBM talks about "combining XML...and the power of data integration." And all of them, Microsoft included, are embracing the broader notions of Web services, where content and data are integrated over the Internet, using loosely-coupled components and XML as the all- purpose glue.

Besides the single-source publishing problem, other factors are driving the need for XML storage. ZapThink's research points to the growth in Web services, the increased use of XML for messaging, and the need for improved searching and querying of the XML. Taken together, these drivers suggest a growing need for storage technologies that provide more sophisticated management of XML data.

Looking Under the Hood
Database vendors are working to make their products function better with XML. While the products and approaches differ-- and the big three all have both new commercial offerings and significant R&D under wraps--the approaches have some things in common.

To varying extents, they all rely on technologies for mapping the XML data to the relational fields, and back again. For example, IBM is developing "Extenders" for its DB2 database that will allow developers to map XML data to DB2 tables, and back again, and Microsoft has a programming facility called SQLXML for mapping and querying between, as the name suggests, SQL and XML. Oracle would argue--and industry analysts would tend to agree--that their mapping technologies are more deeply embedded in the product, especially with Oracle 9i, Revision 2, which is now generally available.

The major vendors all fully support the more stable XML standards, such as the core XML syntax, though they vary in their support of emerging standards. So the different products can parse the XML, at least against a document type definition (DTD), and in some cases against an XML schema. The products can also use XPath to traverse the hierarchical structure of the XML, but many have stopped short of supporting newer, developing standards, such as Xquery and other emerging standards for querying. Ron Schmelzer, senior analyst at ZapThink, follows XML data storage closely, and sees such standard support as critical to differentiating the various product offerings. Whereas relational database systems use SQL for querying, Schmelzer points out that SQL simply doesn't work "as well with the hierarchical nature of XML documents." As a result, said Schmelzer, "a number of initiatives exist to deal with XML-centric data query, insertion, and update operations."

And all of the vendors support emerging programming languages and application programming interfaces (APIs). The big three support Java for database access and connectivity, and emerging APIs for processing XML, such as the Document Object Model (DOM) and the Streaming API for XML (SAX). The emphasis, correctly, seems to be on giving software developers a ready toolkit for accessing, manipulating, retrieving, and updating XML data, and quickly transforming it to other forms--HTML, relational, other forms of XML, and so on.

Integration as a Crutch?
This last point--integration--is a focus for several of the vendors, notably IBM and Microsoft, both of which are heavily invested in marketing software development tools and methodologies. IBM as well has a huge professional services business, a large chunk of which is dedicated to database and XML integration. The rollout of the Web has meant an explosion of database integration and access, and the continued growth of XML will only accelerate this trend.

Oracle's Shimp, among others, would caution that application integration is only part of the problem, that there is underlying and more fundamental data analysis and modeling that needs to be done. In a situation where the data stores have multiplied (often for reasons of expediency), Shimp reasons that simply integrating the various databases may be a "crutch" to avoid the harder work that is being left undone. Indeed, the increased mix of data types--relational, nonrelational; structured, unstructured; and XML especially--have brought a new challenge to organizations. This challenge is to truly analyze all the data and develop a more unified and comprehensive data model. XML isn't so much a new problem, as a new and complex dimension on an existing problem.

The Data Model is the Key
Yet while all organizations would be wise to invest in this kind of comprehensive data modeling, the organization that has a lot of XML data indeed has some unique problems on its hands, and perhaps extra motivation to take a step back and analyze things. By its very nature, XML data is going to be different, and is going to require some different integration and handling. If you have various business databases, and a large store of XML, you likely are going to require at least separate instances of a relational database. For example, you could have all of your business or transactional data in one database, tuned to maximize the performance of that data. Your XML data could then reside in a second relational database that supports XML, such as the products from Oracle, IBM, and Microsoft.

These companies would likely argue that a single-vendor solution is preferable, and that, of course, their solution would be best. However, the reality is that you likely have many data sources already, from different vendors, and will likely live with some of these for some time to come. So while a comprehensive data model and more monolithic solution may be in your future, you will likely still have to knit some things together to create a comprehensive solution, at least for now.

Posted by Bill Trippe at 9:31 PM

January 15, 2004

What to Consider in Evaluating Databases for XML Storage

As I dig into the XML and relational repositories for the CMSWatch work I am doing, I find myself asking the vendors to provide detailed demonstrations of the products. In order to get an apples-to-apples comparison, I am asking for a demo that touches on the certain consistent points and capabilities. The following is how I described it in correspondence with one of the vendors.

I need to understand how Product X can be used to store XML content.

This is best done by a demonstration of the product that shows a typical project:
--How the database is designed
--How the content is loaded and administered
--How the content is then accessed and updated
--How programmers typically will interact with, query, and manipulate the content
--Specifically how the XML will be stored in the database

Since this is focused on XML, I would also like to learn:
--How Product X supports the various XML technologies (especially XSLT, XQUERY, XPATH)
--How Product X's functionality compares with both relational and XML-specific repositories.

I have also considered asking them to work with the same data, but that may not be completely necessary. Any thoughts?

Posted by Bill Trippe at 9:08 PM

January 12, 2004

The Future of Content Delivery: Services-Oriented Architectures

For an upcoming Transform magazine article, I will be writing about Services-Oriented Architectures and how they are beginning to change document delivery. What is the future likely to bring? Will SOAs change content delivery (I think they will), and how will they?

Ideally, the piece will address how things will change, and the implications of such change. I would like to get some concrete and live examples of what some organizations are tryting to do today, and how the vendors are supporting such activities.

This ties into areas such as business process management, workflow, and content delivery. What about organizations where the delivery of content is a critical step in a business process? Examples that come to mind are insurance claims, mortgage applications, other kinds of financial vehicles, and so forth.

Some of the vendors that come to mind are IBM and Filenet, along with the other enterprise content management players (Documentum, Vignette, Interwoven, Stellent).

Have some insight or experience with this? Feel free to contact me offline, or post your thoughts here.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 9:20 PM

January 6, 2004

Will XForms Matter?

My recent column on XForms is now live on the Transform Magazine site. To briefly quote:

XML was born when a bunch of very smart people realized that HTML, while easy to use and widely deployed, wasn't a robust enough technology to build the Web infrastructure and interfaces of the future. While there is still plenty of HTML around, many of the underpinnings of the Web are now based on XML. Virtually every mid- to large-sized organization uses XML to store, transform or integrate various data sources that end up on the Web. The emerging XForms standard was born out of much the same motivation as the language on which it is based; a bunch of very smart people realized HTML-based forms were not a long-term solution to building user interfaces and data collection and validation tools for the Web. As a result, the World Wide Web Consortium (W3C) established a forms working group years ago. Their first formal recommendation, XForms 1.0, was issued in November.

Also on XForms specifically and EForms in general: I will be moderating a session on "Electronic Forms and Content Management" at the upcoming Gilbane Conference on Content Management, to be held March 24-26 in Los Angeles. Confirmed speakers thus far are Chuck Myers from Adobe and Micah Dubinko from Cardiff. Chuck is a great speaker and is Technology Strategist at Adobe. Micah literally wrote the book on Xforms.

Posted by Bill Trippe at 9:03 PM

January 4, 2004

Does Context Rule?

In the fall of 2000, there was a spate of e-book conferences--two in New York, one week apart, for example--and the same sorts of arguments about the advantages of digital content for publishers were once again trotted out. There's the lower publication costs point, together with the felling of fewer trees angle. There's the faster time to market due to virtual distribution across the Web, and because the aforementioned trees don't have to get chopped, chewed, and rolled out for printing presses. There's the "digital document is better " docket, where the tried and true search and retrieval achievements are pointed to, along with other usability improvements such as the ability to cut and paste, annotate, and customize dynamic documents. Updating information, integrating information, navigating information, and disseminating information are all part of the "digital is better" formation.

E-books are still with us, of course, but they never lived up to their hype. I remember sitting in one of these e-book conferences and trying to decide which was the better metaphor--e-book as 8-track tape, or e-book as videotext.

Just because these arguments can be mapped across a few decades--back to the online information services of the 1970s, through the first blush of CD-ROM in the 1980s, and right up to the Internet and Web and enterprise portals of today--doesn't take away from the force of these convictions. On the other hand, after so long a time this argument has been made--and as variously applied as it has been--there's a certain impulse to say, Been there, done that.

In fact, despite the presence of new digital content delivery platforms in the form of e-book readers, there is little new coming out of such conferences about e-books that goes much further than offering--ironically--an electronic analog of the print book. Never mind some of the new wrinkles being brought to bear in the digital publishing scene, of which digital rights management (DRM) has been thrust to the fore, right along with (and in the case of XrML, in combination with) XML-based content tagging and management systems.

As important as DRM and standards-based content management are to rational, efficient, and cost-effective document and information serving, and yes, even if that document is a book, there remains one challenge that often still comes up short: getting users of digital document systems the exactly right content these users need at the exactly right time these users need it. While it is a great idea to get any content seeker the content he or she seeks, most of the real action of managing digital information is taking place within companies that have a real ROI interest to motivate good content handling, and among these businesses' partners and value chain participants.

Giving Content in Context

For enterprises wishing to benefit from the creation and management of content portals, the challenge is clear. Systems that manage content without managing the context fail.

Searching for content is a frustratingly difficult and easily overwhelming exercise. This is true even as search engines are increasingly bolstered by technologies and processes to help make them more effective--spiders, meta-data, standardized taxonomies, and human editorial intervention. The problem of course is an ever-greater avalanche of data. The projections for simply and effectively finding content are dire, and hardly a case of Chicken Little; for example, where there were less than 200,000 web sites in 1995, there are, a half-decade later, 22 million, and these numbers don't include most intranet sites that are closed off to Web indexing efforts by firewalls.

The Web's promise (among others) is to improve communication both within and outside the enterprise. To succeed, however, customizing the content delivered to employees, partners, and customers becomes important.

Getting Personal about Content

Personalization requires enterprises to have the means to capture information--the term "profiles" is typically used--about the information users. These profiles need to be useful in directing specific content to those profiled, which means that an enterprise also needs to know about its own content and, if used, third party content.

There are many elements that can be used to deliver content in context. These include:

� Registering content meta-data for Web and enterprise-wide search engines
� Implementing effective search engines (e.g., relevancy)
� Collecting and managing profiles of site users (e.g., personalization engines)
� Creating and maintaining taxonomies of content (e.g., subject classifications)
� Identifying communities of interest (e.g., portals)
� Enabling pass-along content delivery (e.g., superdistribution using DRM)
� Sending email content offers/links to profiled users

Some companies rely on powerful search engines that possess tools such as relevancy ranking, natural language query, built-in thesauruses, contextual hit results, and other improvements to the electronic searching.. Other companies simply rely of self-selection of its content users, where the assumption--as in the case of many enterprise and vertical portals--that the focus of the site carries enough implied context. The more effective solutions, of course, are those that use as many contextual content delivery strategies as possible.

The more robust, detailed, and accurate the meta-data, the easier it is to find content in huge content bases and return find hits and serve the content itself. If such search effectiveness is tied to personalization profiles that track a content user's interests and requirements and content delivery mechanisms, content delivered in context becomes powerful indeed.

But for enterprises today, perhaps the biggest benefit is gained by mastering how enterprise content can be served into specific contexts within the business process and partner chains, to deliver more on the promise of automation. Look for such management of content (which could be called "syndication") to play a growing role in tying the information of the enterprise to the many different parts of the enterprise's business actions.

(My thanks to David Guenette, who collaborated with me on an earlier version of this article.)

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 10:21 AM

support this blog