XML and Content Categorization

August 15, 2003

I have begun researching an upcoming article for EContent Magazine on the role that XML is playing in content categorization tools and approaches. This seems to be a quickly widening and changing world. I am sure I will touch on certain core approaches such as RDF; I am not so sure yet if I will be talking about Topic Maps.

The larger vendors in this space seem to include Autonomy, NStein, and Verity on the tools side, and content management vendors such as Documentum and Stellent. It will be interesting to see what kinds of customer case studies they will be able to provide for the research and validation. It will be also interesting to see if XML is core to their categorization approach, is one of several approaches, or is an output or byproduct of their approach.

Some of my early research, and a reference from colleague Bob Boeri, points me to a Medical Subject Headings system (MeSH), and what looks to be a pretty comprehensive effort at the National Library of Medicine to create XML-encoded public databases that use MeSH encoding. Such databases are beginning to answer the "chicken and egg" problem XML initiatives have encountered. Many initiatives represent great ideas and well thought-out approaches to implement the great ideas, only to wither on the vine for lack of real data. Such critical masses of data seem to be emerging in areas such as scientific research and financial services.

Some other resources I will be exploring include the following:

(These last two were both written by Eric van der Vlist, cited on the www.xml.com Web site as an ODP editor and publisher and editor of XMLfr.)

Posted by Bill Trippe at August 15, 2003 6:00 PM

support this blog