November 25, 2003

Nice Example of SVG, Curious Results?

One of my clients, Houghton Mifflin, has begun doing some SVG to create animated maps of civil war battlefields. An example of the work is available online. I love this example, but am puzzled by the varying results from different combinations of browsers and operating systems. Please take a look, and enjoy the map. If you run into some strange behavior, feel free to post a comment or email me at btrippe@nmpub.com.

Posted by Bill Trippe at 9:04 PM

Build vs. Buy in Content Management Systems

The listserv cms-list has an interesting discussion lately on build vs. buy when it comes to content management systems. As one poster correctly noted today, build vs. buy is a simplistic way of stating the question. Most CMSs require extensive customization, and the work to make a CMS work for you needs to be thought of—at least— as a substantial extension to a given CMS technology.

I wrote a white paper about the build vs. buy question a couple of years ago, but I think it still holds up fairly well. It was written for a particular vendor (Enigma) to distribute, but most of it is neutral.

Posted by Bill Trippe at 8:57 PM | Comments (2)

November 20, 2003

Another Archiving Tale

In the course of researching the use of metadata standards for the long-term archiving of information, I spent time researching GILS--the Government Information Locator Service. The following is a brief case study on how the state of Texas is using GILS for government records.

The state of Texas built a GILS-based system to provide easy access to information from over 180 state agencies. Each agency operated as an isolated information silo and could not communicate with one another. It is important to note that the Texas archivists were only interested in making sure that metadata would be created for the legacy data, and the data would be available from a centrally searchable location. The project did not deal with the conversion of legacy data from one format into another, but is considered here because it was concerned with the conversion of Web pages into resources that could be archived and repeatedly accessed over time, even if certain web pages were no longer being maintained online. This allows a user - for legal or cultural reasons - complete access to the collection of pages.

When the state of Texas first instituted the project, it relied on the participating agencies to alert archivists to any changes on their Web sites. In this manner, 8,000 records - representing either a single Web page or a collection of Web pages (whatever the agency considered to be a Web "publication") were harvested for archiving.

However, the Texas archivists have been finding that relying on the state agencies is not reliable; many records are missed. They are currently upgrading their software to include a "harvesting" application that will automatically "crawl" the state's web sites and identify any changed, deleted, or added publications that need to be archived and tracked. Kevin Marsh of the Texas State Library and Archives Commission (TLSAC), who is overseeing the project, said that a test of the new system harvested 32,000 records, a four-fold improvement over the manually-intensive older system.

Harvested publications are preserved on a TSLAC server and published on request through a server at the University of North Texas (UNT). TRAIL records for currently online publications or Web sites are linked directly to the Web sites in question. Non-current records are moved into TSLAC's Electronic Depository Program and matching publications are moved to the UNT server. Users can search by subject, agency, keyword, and descriptor fields, as well as by date range and full text. Additionally, MARC records are automatically generated and provided to UNT for their catalog.

TRAIL is based on Blue Angel Technology's MetaStar Enterprise Suite. This software provides the following functionality:
• Data entry.
• Database management.
• Database search and retrieval utilizing Z39.50 (an ISO standard defining a protocol for computer-to computer information retrieval).
• User gateway design and management.

MetaStar Enterprise utilizes Oracle 8i as the underlying database. The data is formatted and manipulated with XML tagging. MetaStar Enterprise also utilizes PCDocs Fulcrum technology for harvesting data directly from web servers.
This software was selected for the following capabilities:
• Z39.50 compliance.
• Ability to work effectively with multiple metadata formats including Dublin Core.
• Capability to blend targeted record searching and full-text harvest and searching.

TRAIL runs on two Sun Enterprise 450 servers under Solaris7.

The State of Texas estimated that the Blue Angel Technologies solution cost less than one-quarter of the price it would have incurred to develop the system internally and took far less time. Just as importantly, the cost to maintain the system is extremely low. Currently, less than one full time person within the Texas State Library and Archives Department manages TRAIL.

Lessons learned include :
• GILS metadata is difficult to capture.
• Limited updating and maintenance of GILS records is necessary.
• No clear agreement could be reached on the adequacy of GILS record data elements (perhaps the richer structure provide by EAD could allay this problem).
• Different types of resources are represented in GILS records and user community is sometimes confused by:
o An inordinately high degree of user sophistication required to exploit GILS.
o Users were interested in or expecting to gain access to full text.
o GILS records were hard to read, contained unnecessary information, and were not linked to the actual source identified.
o Variances existed in the extent of information contained in GILS records.
o The service seemed qualitatively and quantitatively unpredictable and uneven.

Posted by Bill Trippe at 4:08 PM

November 18, 2003

Important Emerging Trends in XML and Content Management?

For an upcoming Gilbane Report article, I am going to be writing on important trends in XML and what impact they will have on content management technology. Part of this has been spurred by the blizzard of announcements coming out of the W3C this month, including updates and last calls related to XQuery, XSLT, and XPath.

It is also driven by all the product announcements, and recent improvements and changes to various core open source tools.

I would love your thoughts on this. What is important among all of the recent announcements and changes? What will have an impact on content management, and what will not?

Posted by Bill Trippe at 3:42 PM | Comments (3)

November 17, 2003

Some Fine-Tuning

This represents the 41st entry in this blog, and it has now been in existence for almost three months. I began the blog with the idea that my current primary Web site does not manage to reflect my recent professional activity. Three months into it, I do think the blog is fulfilling that goal--it does reflect the bulk of my recent activity. There is one irony, though, which I am sure is not unique to me--the busier I am, the less time I have to work on the blog.

A few observations thus far. Please post a comment or email me directly with your thoughts.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 2:12 PM

November 14, 2003

Gilbane Report Tutorial on the State of the Art in XML Content Management

As part of the upcoming XML 2003 conference, I will be presenting a day-long tutorial, Gilbane Report Tutorial on the State of the Art in XML Content Management. As described in the brochure and on the Web site:

As conference attendees know, XML is one of the most important technologies for any kind of information management today. It is important for data applications and critical for content applications. Application integration requires content and information integration, and XML facilitates the sharing and management of both. XML is also key to building applications that assemble and deliver content to multiple media channels such as the web, mobile devices, as well as paper media and CD-ROMs. But how, specifically, should businesses use XML for applications that manage content? What methodologies make sense? What strategies have early adopters used and which have been successful? This day-long session will provide attendees with an overview of current approaches to XML-based content management, and incorporate the lessons learned from case studies presented by guest speakers who have implemented solutions.

Prerequisites: Background in XML and content management, including a basic understanding of XML terminology, concepts, syntax, and related standards.

I will have some additional speakers over the course of the day, who will address different aspects of XML and content management. To date, I have confirmed:

Some of the topics to be covered include XML repositories; XML transformation and publishing; metadata, taxonomies, and topic maps; and SVG.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 10:23 AM

November 11, 2003

Houghton Mifflin eReference

One of my clients, Houghton Mifflin, has launched a new e-commerce Web site for their eReference product line. eReference is a downloadable version of the Fourth Edition of the American Heritage Dictionary and a companion thesaurus, Roget's II: The New Thesaurus. eReference has the full databases of each of these two fine books, with a number of interactive features including search, spell correction, and spoken pronunciations.

The databases have been created and maintained in XML, and the electronic version stores XML-encoded entries that are converted to HTML on the fly for display and printing. The database, supporting software, and multimedia elements make this, in my opinion, the best tool of its kind on the market. The eReference tool is downloaded to your hard drive, and will eventually accommodate other reference works that Houghton is producing. My congratulations to Houghton Mifflin on the successful launch of this new product.

Posted by Bill Trippe at 3:01 PM

Webinar on Electronic Delivery of Documentation

I delivered the Webinar on the electronic delivery of documentation today. This was hosted by the TechDoc Community of Practice. Brian Travis of Architag kicked things off and kept the ball rolling throughout. A PDF of the slides can be downloaded here. The slides can also be downloaded from the Architag Web site, which will also shortly have an MPG3 of the audio for the presentation.

Among other things, it was nice to have a real-world discussion of one of my favorite topics, SVG.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 2:50 PM

November 10, 2003

New Book on XForms

In the course of my research on XForms, InfoPath, and Adobe, I had occasion to to interview Micah Dubinko from Cardiff. Cardiff is one of the established vendors in the eForms space, and Micah has been a key contributor on the W3C XForms working group. Micah has published a new O'Reilly book, XForms Essentials. This is the first authoritative book on an important new topic. Dubinko is one of the primary contributors to the W3C working group, and the book has been reviewed extensively by his peers. I started reading the book last week, and it is excellent.

For other books that I recommend, please see my primary Web site.

Bill Trippe
btrippe@nmpub.com

Posted by Bill Trippe at 1:19 PM

November 7, 2003

Upcoming Webinar on Delivering Documentation Electronically

I have mentioned the new TechDoc Community of Interest hosted by IdeaAlliance. At their invitation, I will be giving a Webinar next Tuesday, the 11th, on Electronic Delivery. To register, please click here. The slides will be available shortly, but an outline of the presentation follows:

Assumptions
--A growing need to produce multichannel output
--A desire to do this economically
--A mix of platforms for print production, web production
Some views from 50,000 feet
--Print still counts and PDF is often the first electronic choice
--Platform support still drives choices of approach (Windows, HTML, Java Help)
--A given group faces its own mix of electronic delivery requirements
--E.g., software vendor who provides print, PDF, Help, including HTML and Java help

Many Delivery Options
--PDF for screen viewing and remote printing
--Help formats, including Windows Help (more legacy now), Java Help, HTML-based Help
--Flat HTML files, Templated HTML tied to some kind of delivery engine, XHTML
--XML/RSS for Syndication
--Wireless delivery through WAP, SVG variants (SVG Tiny and SVG Basic)

Other Delivery Requirements
--Delivery to and integration with customer support, CRM
--Integration with engineering systems (CAD/CAM), source code control, logistics support, ERP
--Specialized electronic delivery, such as IETM in the DOD

IETMs Specifically
--Interactive Electronic Technical Manual
--DOD Standard
--Well established concept, growing in actual use and complexity
--Classified from Class 0 to 5
--Ranging from Class 0 (imaged pages and little or no navigation)
--Class 2 is indexed, scrollable, hyperlinked
--Class 5 is an integrated database capable of dynamic content presentation and integration with other systems

IETMs and the Rest of Us
--It's a useful taxonomy
--They made good technology choices (first SGML now XML)
--In some ways, any company with complex products to document is trending toward IETM-like functionality
--DOD has a long commitment to XML and a realized and growing ROI (SGML before that)
--Navy Preventive Maintenance System

How Have Groups Automated
--Smaller groups tend to be authoring tool centric
--Word, Frame, and immediate add-ons
--Organic growth over time
--Larger groups tend to go with more centralized automation, including some with XML

Current Challenges
--Both small groups and large can end up with the silo problem
--Dedicated repositories of material
--Unique processes for creating different formats
--Dedicated workflow for each format
--This is a workable solution
--Each delivery channel can be accommodated
--But not very efficient or scalable

Where Automation Begins to Pay
--Repurposing content whole cloth into other formats
--Reuse of modular content for dynamic publishing
--Management of content modules for more controlled revisions, translation and localization

How to Grow Beyond Tools
--The answer is modular management of content in a standard, generic data structure (yes, XML)
--Adoption of a Minimum Reusable Unit (MRU) that supports all required outputs
--In maintenance manuals, this could be the task, for example
--In software manuals, this could be at a functional or command level

What about graphics
--Some repurposing
--Versions for print versus versions for Web
--Little or no reuse
--Where graphic components are assembled into larger or compound graphics
--Some use of CAD/CAM libraries in heavy industry, aerospace
--Great promise of SVGďż˝

Scalable Vector Graphics
--W3C Recommendation
--XML vocabulary for 2D vector graphics and animation
--Modular in design, accessible via the DOM
--Perfect for reuse of graphic components

Complexity of SVG
--Still not in the browser
--Microsoft has been quiet about this
--Adobe, Corel have plug-ins
--Supported for viewing in latest Acrobat
--Some people are doing on-the-fly conversion of SVG to PDF, HTML, other formats
--There are tools for conversion to JPEG
--Adobe, Savage Software, others
--A near-term problem that should be solved
--Also not unreasonable to require the plug-in, in some applications
--Boeing is now using SVG in some applications

Posted by Bill Trippe at 12:04 PM

Adobe, InfoPath, Xforms, and eForms

The lastest version of The Gilbane Report contains my initial analysis of the recent changes in the eForms space, including, notably, the release of InfoPath and the approval of Xforms as a W3C recommendation. I have never taken a keen interest in the eForms world, but the vendor announcements and the standards efforts are important.

To quote briefly from the article:

Electronic forms (eForms) have always represented a significant piece of the Enterprise Content Management puzzle. On one end of the marketplace, eForms have been implemented to replace traditional paper processes, such as in government and paper-intensive industries such as financial services. In a number of other applications, such as Web Content Management, eForms are the de facto user interface for such tasks as content entry, editing and system administration.

As eForms have proliferated in both of these types of applications and others, the functional and architectural requirements for eForms have grown. Where early eForms were successful merely for capturing and perhaps storing data, it didn't take long for developers to want to manipulate and work with the captured data. On the end of the market where dedicated eForms tools were being used to automate paper processes, such development typically involved working with the proprietary data structures and programming interfaces of the eForms vendor. In applications such as Web content management, the functionality and architecture of eForms were bounded mainly by HTML and related technologies such as JavaScript. As organizations have moved toward application server-centric architectures such as J2EE and .NET, both the proprietary approaches and HTML-based forms have failed to keep up.

Posted by Bill Trippe at 8:11 AM

November 6, 2003

Full-Text Indexing of Books at Amazon

Amazon.com has rolled out a new feature, where the full text of about 100,000 books is indexed. I did some basic testing with the DRM book I co-wrote, and am pleased with the results. The idea of so much "finished" text being available on the Web is an intriguing one. This does bring the Web (a small step) closer to being an interconnected network of essential human knowledge, and it will be interesting to see how people end up using the search.

Posted by Bill Trippe at 3:29 PM | Comments (2)

November 4, 2003

TechDoc Community of Practice

The IdeaAlliance has launched a new community of practice for technical documentation. Given the IdeaAlliance's focus on standards for technology, it makes sense that this new group will focus on XML and other standards for documentation. The TechDoc CoP has kicked off its efforts with a Webinar series. I attended one of the first Webinars today--an excellent presentation from XML industry veteran Brian Travis and Empolis CTO Colin Kingsbury. The TechDoc CoP will also be sponsoring a number of presentations at XML 2003 in Philadelphia.

Posted by Bill Trippe at 10:31 PM

November 3, 2003

XML and the Technologies for Taxonomy Development and Support

I have a new article in EContent Magazine that asks and answers the question, "Can XML Drive Taxonomies and Categorization?" The answer is yes, of course. As I suggest in the lead to the article:

If you google "XML," you do get a stunning 20.5 million hits, which is about four times as many as "Britney," but--sensibly--half as many as "God." So I guess XML falls short of omniscience. Still, the prevalence of XML has led to its being a too-ready answer to seemingly every question about information technology in general and content management in particular. The assumption seems to be that, no matter the requirement or problem, XML is the answer.

As always, the answer is in the details. Please see the article for more.

Posted by Bill Trippe at 8:06 PM

support this blog