XML and Print Publishing

August 20, 2003

One of the traditional arguments for document and content management is that, “everyone’s ‘second business’ is publishing.” That is, regardless of the nature of your business or organization, you are in the information creation and distribution business, so you would be wise to automate it.

If that traditional argument still holds true, then everyone’s second business is still publishing—but now to both in print and to the Web. Why? To paraphrase Mark Twain, the death of print has been greatly exaggerated. Indeed, print is not going away, even as the Web becomes an essential channel for organizational communications of all types—marketing, sales, and customer support to name a few.

The result is a new, compound requirement for organizations--to efficiently manage the flow of content into printed form, while at the same time getting this same content out to the Web. The task is made more challenging as organizations try to do this multichannel publishing economically--and with a mix of platforms for content creation, print production, and Web distribution.

If this problem sounds vaguely familiar, it is. Multichannel publishing is a more common problem because of the Web, but it is not an entirely new problem. Since the 1980s, organizations have been looking to distribute their information in ways besides print. CD-ROM was a popular format at one point, but it was overlapped by the Web--with its ubiquity and low cost of entry for basic communication.

Not only is the problem an old one, but the solution happens to be as well. In the 1980s and 1990s, organizations looked to an ISO standard called SGML--the Standard Generalized Markup Language. The promise of SGML was that you could capture content in a way that was format neutral--and then publish it to as many formats as you needed. "In the early days of SGML it was considered a breakthrough to mark up a document in a way that let it be published on more than one imaging device," noted Jon Parsons, Director of Product Marketing at XyEnterprise, a longtime vendor of content management and electronic publishing technology. "Then came the idea that that same generically marked content could also be published in a browsable version on CD."

Indeed, some organizations implemented SGML-based publishing systems, and a few were able to realize significant productivity gains from this approach. In the end, though, SGML proved to be too expensive and too complex for the average organization. Just as CD-ROM gave way to the Web, SGML would also give way to a generalized markup language that was more suited to the Web.

Enter XML

The eXtensible Markup Language (XML) was conceived by the World Wide Web Consortium as a "lighter weight" SGML that was more suitable for the wide distribution and HTML-oriented browsers of the Web. The thinking--correct then and correct now--was that HTML was too irregular and too format-oriented, and SGML was too complex. XML, then, emerged as a relatively simpler way to encode content in a format-neutral manner that would allow multichannel publishing from a single source.

As XyEnterprise's Parsons observed, "What's consistent is the idea that adding intelligence with granular mark-up and lots of metadata creates flexibility, increases efficiency through content reuse, and meets the goal of 'write once, use many.'" In fact, it is this ability to reuse content that is so powerful, and where organizations see the most dramatic return on investment (ROI). According to Parsons, "We've seen astounding ROI from single-source implementations in aerospace and automotive technical documentation, legal publishing, defense-related maintenance information, e-learning companies, and other markets."

Deja vu All Over Again?

If you have been in this business for a while, this is now sounding all too familiar. Is XML simply the latest all-purpose technology to fix the same problem that SGML never really solved? Well, in a word, no. XML may be heavily based on SGML, but it is succeeding where SGML didn't for many important reasons.

--Most significantly, XML is a key piece of all major software development platforms and components. This begins at the database, where major vendors such as Oracle, Microsoft, and IBM have made XML features a key part of their product roadmap. XML then permeates all the key applications and platforms--portal software, enterprise application integration, application servers, and, yes, content management. This is a significant change from SGML, which was only supported by a much smaller number of specialized products.
--As a result, XML is widely understood by programmers, who use XML in their daily work. This is becoming truer as organizations use XML-based approaches such as Web Services to tie existing and new applications together. Doug Tidwell, XML Evangelist at IBM has pointed out that XML is seen as "the universal data access language data access language" for the Web. Again, this is an enormous change from SGML, which was understood by a small cadre of specialists, and never became a part of the programmer's toolkit the way XML has.

Why is XML so much more useful and widespread than SGML? While there are some advantages to the XML language itself over SGML (mainly, it is lighter weight and easier for programmers to parse and process), the more important factor is that XML is supported by many important related standards and technologies. This begins with the transformation language XSLT (Extensible Stylesheet Language Transformations), which allows programmer to easily map XML to other formats (including HTML, other XML vocabularies, and document formats such as PostScript). But it also includes the XML Path Language (Xpath), which is used to access specific objects within an XML document, and the Document Object Model (usually referred to as the DOM), which is an industry standard programming interface for XML documents. The result is a ready toolkit for programmers to create, access, update, and transform XML data from one form to another.

Indeed, to this date, XML has become much more of a general-purpose data representation tool for programming than a markup language for document encoding. But it is still ideally suited for encoding content for single-source publishing, and industry experts say the time is right to begin leveraging XML in the enterprise. "Information technologists have understood the value of managing a single source of information that can be used in multiple ways for some time." said Frank Gilbane, editor of the Gilbane Report (www.gilbane.com), "The problem has been that the benefits were not apparent to business managers, and it was simply too difficult and expensive to accomplish. Today's need to deliver synchronized information to multiple channels (print, web, wireless, etc.) is something all business managers understand. This business need has also driven technology development and adoption to a point where single-source strategies, especially XML-based, should always be considered."

XML for Everything?

Gilbane's careful emphasis--that XML-based single sourcing should be considered--is precisely the right advice. In other words, don't drop everything and convert all your content to XML. As Parsons from XyEnterprise observed, "Successful single source solutions require careful analysis of the content, a clear focus on defined and measurable business objectives, and solid software support at each step in the workflow." So a reasonable first step would be to understand the business objectives tied to single sourcing--what do you hope to gain from single sourcing, and how will you know if you have achieved the objective?

For one engineering firm that I work with, the business objective was to make all their key documents available in print and on the Web--and as soon after updates occurred as possible. They employ a group of 16 technical writers and editors who are responsible for incorporating all updates into a document database of over 70,000 pages. When the documents only had to be available in print, this was a manageable but somewhat slow process. Updates could take several months to appear in a reprinted report. When they began to also produce HTML versions of the documents for distribution over the Web, the delays--and costs for contract help--only increased. They implemented an XML-based system for print and Web publishing with the goal of reducing the time for an update to be distributed--while maintaining current staffing levels. Two years into the project, they have dramatically shortened turnaround time and are producing print and Web versions of their documents with the same staff.

I advise clients to look first at a key business objective for their content, and then to undertake a pilot single-sourcing project that could support that business objective. For example, the business objective could be to make customers more self sufficient in the customer support process. The content tie-in could be to make key service bulletins, heretofore only available in print, also available for download in a searchable HTML database.

Consider a Pilot Project

The pilot project could be as simple as encoding a small sampling of content in XML, and then designing processes for print and Web rendering. You would begin by analyzing the content for its suitability for single sourcing. In XML parlance, this involves creating a Document Type Definition (DTD) or XML Schema that defines the content elements--how they are used, what content or subordinate elements they consist of, and what attributes they share. For example, a technical document may include a number of sequenced tasks, where a parts catalog may include part numbers and descriptions. Writing a DTD or schema is the formal expression of these elements. It's a marriage of the often well-understood but perhaps not formally codified rules of your content and the formal structure of XML encoding. It's important in a pilot project to keep this analysis relatively simple and high-level; remember this is a proof of concept.

To see what XML encoding is like for a business user, you could have an experienced user test an XML editing tool such as Corel's XMetal. This could give you a sense of the learning curve some users may face, and could also give you some metrics for future reference. (Keep in mind, though, that a full system may use a variety of tools and processes for the XML encoding, such as forms interfaces, so the actual tagging processes will likely differ.)

Once you have the XML-encoded content, you would need some means to render print and HTML versions of the content for distribution. Assuming you have kept the DTD or schema relatively simple, a programmer can quickly create an XSLT stylesheet for the HTML output. XSLT, or its companion language XSL-FO (XSL Formatting Objects), can be used to create the print output.

You would then have sample content, sample print and HTML output, and some metrics--the time it took to create the content, the informal DTD, and the associated stylesheets. Armed with this, you would be well positioned to plan a larger implementation--either with available in-house resources or by working with a vendor or system integrator.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at August 20, 2003 9:08 PM

support this blog