August 28, 2003

Planning for Content Management

For this year's Gilbane Report Conference on Content Management at Seybold, I am co-chairing the Projects Track with Tony Byrne of CMSWatch. I am especially looking forward to the session on Planning and Choosing a CMS, which will feature Rita Warren of the consultancy ZiaContent and Dana Hallman, who has been managing a major CMS implementation for the US General Services Administration.

Even though there are plenty of content management systems that have been installed, I still fear that there are not enough successful, mature implementations out there. Too often projects have bogged down, stopping short of completion. Either not all the features have been implemented, not all the content has been digitized and placed under management, or not all the users have been equipped with tools to begin managing their content. The result? A lack of critical mass, and a system that falls well short of its business goals.

So it's worth considering the big questions—what is the purpose of the system? What is the scope of initial project, and what is the long-range vision for the system? What will constitute success, and how will you measure it?

Rita Warren has an excellent perspective on the business implications of implementing content management technology; her presentation at Seybold last year was one of the most useful of the conference. Rita has provided the following abstract for this year's presentation, which she has titled, "The Artful Balancing Act of Choosing a CMS."

Abstract
(Courtesy of Rita Warren, ZiaContent)

It's human nature to want to jump right in on a project and see tangible results. In the case of a content management project, one of the first ways to see progress is to say "look, we've bought this software." It's not surprising then, that many organizations make software product selection the first milestone of their content management initiative. What they may not realize, however, that is the real progress on the project occurs when you get a solid grip on the business issues that are driving the need for content management, a vision for what the future state of your managed content will look like, and a clear understanding of the budgetary and technical constraints that will necessarily limit your CMS choices.

Opposite on the spectrum of the CMS "impulse buy" is the tendency to fall into analysis paralysis, spending so much time figuring out what the problems are, that you never actually get to a solution. The truth is, there are some very basic business problems that content management can help solve. By looking at your content management business goals relative to core CMS functionality, it's fairly easy to nail down a set of criteria that will streamline the software selection process. The key to successfully planning and choosing a CMS is to balance the time spent on business analysis with the software due diligence effort, while minimizing risk.

This session will be held on Tuesday, September 9 at the Moscone Conference Center in San Francisco.

Click here to register for the conference.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 8:40 PM

August 20, 2003

XML and Print Publishing

One of the traditional arguments for document and content management is that, "everyone's 'second business' is publishing." That is, regardless of the nature of your business or organization, you are in the information creation and distribution business, so you would be wise to automate it.

If that traditional argument still holds true, then everyone's second business is still publishing--but now to both in print and to the Web. Why? To paraphrase Mark Twain, the death of print has been greatly exaggerated. Indeed, print is not going away, even as the Web becomes an essential channel for organizational communications of all types--marketing, sales, and customer support to name a few.

The result is a new, compound requirement for organizations--to efficiently manage the flow of content into printed form, while at the same time getting this same content out to the Web. The task is made more challenging as organizations try to do this multichannel publishing economically--and with a mix of platforms for content creation, print production, and Web distribution.

If this problem sounds vaguely familiar, it is. Multichannel publishing is a more common problem because of the Web, but it is not an entirely new problem. Since the 1980s, organizations have been looking to distribute their information in ways besides print. CD-ROM was a popular format at one point, but it was overlapped by the Web--with its ubiquity and low cost of entry for basic communication.

Not only is the problem an old one, but the solution happens to be as well. In the 1980s and 1990s, organizations looked to an ISO standard called SGML--the Standard Generalized Markup Language. The promise of SGML was that you could capture content in a way that was format neutral--and then publish it to as many formats as you needed. "In the early days of SGML it was considered a breakthrough to mark up a document in a way that let it be published on more than one imaging device," noted Jon Parsons, Director of Product Marketing at XyEnterprise, a longtime vendor of content management and electronic publishing technology. "Then came the idea that that same generically marked content could also be published in a browsable version on CD."

Indeed, some organizations implemented SGML-based publishing systems, and a few were able to realize significant productivity gains from this approach. In the end, though, SGML proved to be too expensive and too complex for the average organization. Just as CD-ROM gave way to the Web, SGML would also give way to a generalized markup language that was more suited to the Web.

Enter XML

The eXtensible Markup Language (XML) was conceived by the World Wide Web Consortium as a "lighter weight" SGML that was more suitable for the wide distribution and HTML-oriented browsers of the Web. The thinking--correct then and correct now--was that HTML was too irregular and too format-oriented, and SGML was too complex. XML, then, emerged as a relatively simpler way to encode content in a format-neutral manner that would allow multichannel publishing from a single source.

As XyEnterprise's Parsons observed, "What's consistent is the idea that adding intelligence with granular mark-up and lots of metadata creates flexibility, increases efficiency through content reuse, and meets the goal of 'write once, use many.'" In fact, it is this ability to reuse content that is so powerful, and where organizations see the most dramatic return on investment (ROI). According to Parsons, "We've seen astounding ROI from single-source implementations in aerospace and automotive technical documentation, legal publishing, defense-related maintenance information, e-learning companies, and other markets."

Deja vu All Over Again?

If you have been in this business for a while, this is now sounding all too familiar. Is XML simply the latest all-purpose technology to fix the same problem that SGML never really solved? Well, in a word, no. XML may be heavily based on SGML, but it is succeeding where SGML didn't for many important reasons.

--Most significantly, XML is a key piece of all major software development platforms and components. This begins at the database, where major vendors such as Oracle, Microsoft, and IBM have made XML features a key part of their product roadmap. XML then permeates all the key applications and platforms--portal software, enterprise application integration, application servers, and, yes, content management. This is a significant change from SGML, which was only supported by a much smaller number of specialized products.
--As a result, XML is widely understood by programmers, who use XML in their daily work. This is becoming truer as organizations use XML-based approaches such as Web Services to tie existing and new applications together. Doug Tidwell, XML Evangelist at IBM has pointed out that XML is seen as "the universal data access language data access language" for the Web. Again, this is an enormous change from SGML, which was understood by a small cadre of specialists, and never became a part of the programmer's toolkit the way XML has.

Why is XML so much more useful and widespread than SGML? While there are some advantages to the XML language itself over SGML (mainly, it is lighter weight and easier for programmers to parse and process), the more important factor is that XML is supported by many important related standards and technologies. This begins with the transformation language XSLT (Extensible Stylesheet Language Transformations), which allows programmer to easily map XML to other formats (including HTML, other XML vocabularies, and document formats such as PostScript). But it also includes the XML Path Language (Xpath), which is used to access specific objects within an XML document, and the Document Object Model (usually referred to as the DOM), which is an industry standard programming interface for XML documents. The result is a ready toolkit for programmers to create, access, update, and transform XML data from one form to another.

Indeed, to this date, XML has become much more of a general-purpose data representation tool for programming than a markup language for document encoding. But it is still ideally suited for encoding content for single-source publishing, and industry experts say the time is right to begin leveraging XML in the enterprise. "Information technologists have understood the value of managing a single source of information that can be used in multiple ways for some time." said Frank Gilbane, editor of the Gilbane Report (www.gilbane.com), "The problem has been that the benefits were not apparent to business managers, and it was simply too difficult and expensive to accomplish. Today's need to deliver synchronized information to multiple channels (print, web, wireless, etc.) is something all business managers understand. This business need has also driven technology development and adoption to a point where single-source strategies, especially XML-based, should always be considered."

XML for Everything?

Gilbane's careful emphasis--that XML-based single sourcing should be considered--is precisely the right advice. In other words, don't drop everything and convert all your content to XML. As Parsons from XyEnterprise observed, "Successful single source solutions require careful analysis of the content, a clear focus on defined and measurable business objectives, and solid software support at each step in the workflow." So a reasonable first step would be to understand the business objectives tied to single sourcing--what do you hope to gain from single sourcing, and how will you know if you have achieved the objective?

For one engineering firm that I work with, the business objective was to make all their key documents available in print and on the Web--and as soon after updates occurred as possible. They employ a group of 16 technical writers and editors who are responsible for incorporating all updates into a document database of over 70,000 pages. When the documents only had to be available in print, this was a manageable but somewhat slow process. Updates could take several months to appear in a reprinted report. When they began to also produce HTML versions of the documents for distribution over the Web, the delays--and costs for contract help--only increased. They implemented an XML-based system for print and Web publishing with the goal of reducing the time for an update to be distributed--while maintaining current staffing levels. Two years into the project, they have dramatically shortened turnaround time and are producing print and Web versions of their documents with the same staff.

I advise clients to look first at a key business objective for their content, and then to undertake a pilot single-sourcing project that could support that business objective. For example, the business objective could be to make customers more self sufficient in the customer support process. The content tie-in could be to make key service bulletins, heretofore only available in print, also available for download in a searchable HTML database.

Consider a Pilot Project

The pilot project could be as simple as encoding a small sampling of content in XML, and then designing processes for print and Web rendering. You would begin by analyzing the content for its suitability for single sourcing. In XML parlance, this involves creating a Document Type Definition (DTD) or XML Schema that defines the content elements--how they are used, what content or subordinate elements they consist of, and what attributes they share. For example, a technical document may include a number of sequenced tasks, where a parts catalog may include part numbers and descriptions. Writing a DTD or schema is the formal expression of these elements. It's a marriage of the often well-understood but perhaps not formally codified rules of your content and the formal structure of XML encoding. It's important in a pilot project to keep this analysis relatively simple and high-level; remember this is a proof of concept.

To see what XML encoding is like for a business user, you could have an experienced user test an XML editing tool such as Corel's XMetal. This could give you a sense of the learning curve some users may face, and could also give you some metrics for future reference. (Keep in mind, though, that a full system may use a variety of tools and processes for the XML encoding, such as forms interfaces, so the actual tagging processes will likely differ.)

Once you have the XML-encoded content, you would need some means to render print and HTML versions of the content for distribution. Assuming you have kept the DTD or schema relatively simple, a programmer can quickly create an XSLT stylesheet for the HTML output. XSLT, or its companion language XSL-FO (XSL Formatting Objects), can be used to create the print output.

You would then have sample content, sample print and HTML output, and some metrics--the time it took to create the content, the informal DTD, and the associated stylesheets. Armed with this, you would be well positioned to plan a larger implementation--either with available in-house resources or by working with a vendor or system integrator.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 9:08 PM

August 19, 2003

Content Management and the Enterprise

The growth of content management technology has far outpaced the technologies that were forebears to content management, such as document management and knowledge management.

Perhaps more significantly, content management has a broader and more important role in organizations than these other technologies. In the case of document management, the technology was often relegated to departmental roles; in the case of knowledge management, it often failed to move beyond pilot installations. Content management is taking on a central role within the enterprise.

Indeed, the very term “content management” has been edged out by “enterprise content management” as analysts, journalists, and IT professionals draw more and more direct connections between the content of the organization and the many, varied applications and interfaces that enterprises are deploying around an Internet-based infrastructure.

The bottom line is content management solutions must deal with a broad spectrum of challenges—beginning with the effective management of many types of content, and ending with the integration of this content into a wide and growing number of applications.

Content is becoming part of the complex synthesis of once-independent business processes, even while these business processes shift outside the enterprise’s four walls, and move towards integration with those of its customers, partners, and suppliers. Furthermore, enterprises are making not just content but also operational data available to consumers, customers, and business partners via the Web, in what many call the “extended enterprise.”

The Internet has made the Web into a huge business community, and with the development of Java programming language and the standardization with XML for data interchange, the movement of content and data from back office applications on to the Web and from the Web into back office applications has increased tremendously. When it comes to the integration of content management in the new extended enterprise, two fundamental questions arise:

· How can the enterprise leverage its existing infrastructure, applications, and data?
· How can the enterprise pursue content management integration with least effort and most success?

While much is being claimed for the “extended enterprise,” the fact is that for very many businesses today, content management needs remain simple. These companies’ needs can be met but literally hundreds of products available today, ranging from application service providers that, in effect, create, manage, and serve a company’s content for them, to basic HTML tools such as Microsoft FrontPage and Macromedia Dreamweaver that helps Web designers and Webmasters create Web pages for serving through the company’s own Web server, to standalone Web content management platforms that provide much higher functionality, a wider range of contributor interfaces, and workflow and versioning administration.

But because of the growing role of the Internet in commerce, enterprises can not stay with simple content management without peril. It behooves the strategically-minded enterprise to look at contant management tools that not only meet relatively simple content environments today, but which can also anticipate and address the more complex challenges of fusing content and other business processes.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 5:50 PM

August 18, 2003

XML and the Writing Process

I have always liked a quote attributed to Tim Berners-Lee, early in the life of the Web. Commenting on HTML, he supposedly said, "Who would want to type this stuff?" Without knowing the context of his remarks, I have to guess he said it amidst a discussion of tools—and the need for tools to make the author's life easier in order for the Web to flourish as a medium.

Well here we are many years later; have things really changed all that much?

There are all kinds of HTML editors, of course. You can save HTML out of your Word documents and such. Moreover, XML authoring of content is increasing, and with it has come an increase in the number of XML editing tools.

Despite this growth in tools, authoring for the Web remains, by and large, a fairly difficult proposition. The tool I am currently using (MovableType) allows me to do rudimentary authoring (paragraphs and some formatting essentially) without resorting to hand coding. But the easiest way for me to accomplish slightly more complex formatting (lists, for example) is to dive back into HTML coding. Moreover, the editing window I use within the MovableType application does not have a couple of basic editorial tools I use pretty heavily--spell checking, a dictionary, and a thesaurus to start with. So I find myself writing in a word processor, then copying text over to the Movable Type window, reformatting it as necessary... really, there has to be an easier way than this, and I am not even talking about writing more lengthy documents or more complex text (with tables, math, and figures).

There are some emerging tools that are beginning to cover this gap. Ektron, for example, has its EWebEditPro and EWebEditPro+XML. These both provide an Active X control that provides a WYSIWIG editing interface where otherwise the user would face a plain text interface to XML and/or HTML. And Office 11 is adding additional support for XML.

The ultimate tool would combine a familiar word processing interface, including tools such as spell-checking and a dictionary, with an ability to automatically embed the appropriate HTML (and ideally XML) markup. Along with the content creation itself, users would be able to easily add, review, and update metadata related to the content. This would be a rudimentary set of functions for content creation, and should be the starting point for a solid tool.

Posted by Bill Trippe at 3:41 PM | Comments (2)

August 15, 2003

XML and Content Categorization

I have begun researching an upcoming article for EContent Magazine on the role that XML is playing in content categorization tools and approaches. This seems to be a quickly widening and changing world. I am sure I will touch on certain core approaches such as RDF; I am not so sure yet if I will be talking about Topic Maps.

The larger vendors in this space seem to include Autonomy, NStein, and Verity on the tools side, and content management vendors such as Documentum and Stellent. It will be interesting to see what kinds of customer case studies they will be able to provide for the research and validation. It will be also interesting to see if XML is core to their categorization approach, is one of several approaches, or is an output or byproduct of their approach.

Some of my early research, and a reference from colleague Bob Boeri, points me to a Medical Subject Headings system (MeSH), and what looks to be a pretty comprehensive effort at the National Library of Medicine to create XML-encoded public databases that use MeSH encoding. Such databases are beginning to answer the "chicken and egg" problem XML initiatives have encountered. Many initiatives represent great ideas and well thought-out approaches to implement the great ideas, only to wither on the vine for lack of real data. Such critical masses of data seem to be emerging in areas such as scientific research and financial services.

Some other resources I will be exploring include the following:

(These last two were both written by Eric van der Vlist, cited on the www.xml.com Web site as an ODP editor and publisher and editor of XMLfr.)

Posted by Bill Trippe at 6:00 PM

August 14, 2003

Welcome

Web sites suffer from a number of maladies, but the most common one, by far, is atrophy. I have battled this in my own practice. I am so busy advising others on how to do things with their content, that I never get around to maintaining my primary Web site nearly as much as I should. Articles that I wrote last month won't appear as a link until months from now. Meanwhile, two year old articles—some of them hopelessly outdated—continue to be prominently featured.

So I have decided to try using a blogging tool as a means of keeping my primary site up to date. In addition, I will be trying some new features, many of which are only in the idea phase.

As a consultant and writer, I am fascinated with how much content the average person creates and consumes in the normal course of doing business. For example, I purchased a new notebook computer and began using it four months ago. As of today, the "My Documents" folder contains 2439 files totalling 349 MB of data. My quick analysis tells me that about 150 MB of that content was transferred over in bulk from another computer. The remaining 200 MB has been created or accumulated by me in the course of my work.

That's a lot of content.

One project is pretty typical, I think, of the constellation of content that one creates, consumes, or otherwise accumulates in the course of doing work. For a report that I am writing, I have accumulated 88 files, spread over three folders, amounting to 14 MB of information. My written output—including correspondence, outlines, summaries of interviews and research, and various drafts of the document and its sections—totals 2 MB. The remaining content is source material that I am researching and citing. Most of these are in PDF, HTML, or PowerPoint format. Only one (at 1.4 MB) is a space hog; the rest range from 75 to 300 KB. The final report, without graphics, will probably be around 200 KB, perhaps around 60 pages.

To date, not a single electron regarding this project has appeared on my site or anywhere else on the Web.

So perhaps a blogging tool can begin to solve this problem. Perhaps it can give me an easier, less painful way of bringing content from the hidden world of my C: drive to readers who are potentially interested. We shall see. I welcome your feedback, comments, and response.

Bill Trippe

Posted by Bill Trippe at 8:35 AM

support this blog