Archive for 2010

DITA: Too Much Information?

Friday, August 6th, 2010

There is a reality television program at the moment that follows hoarders, those poor folks who cannot help amassing stuff, either by collecting it or through their inability to throw anything away. The value of the stuff is immaterial; they just feel happier knowing that they have it, stored somewhere, even if it’s more or less inaccessible to them. The amount of stuff they have is often limited only by the space they have available to store it, and once they’ve filled their house they may go on to rent storage units so they can keep even more stuff.

DITA and Hoarding

DITA (Darwin Information Typing Architecture) is IBM’s stab at producing standardized mechanisms for hoarding, or rather for “authoring, producing, and delivering technical information.” It needn’t be only technical information – the same principles can be applied to all sorts of written information, and the system is designed to be extensible. It basically refers to the process of storing information at a fairly granular level (individual information units or “topics”) along with meta-information that provides useful descriptions in case you want to re-use all or part of the information in future. It’s the foundation for content management and repurposing; other tools can then assemble and transform these topics as required for specific audiences, purposes, and output formats.

There are a number of related efforts, all focused on the task of labeling and organizing information. So by now we’re starting to understand what it takes to hoard all the information an organization produces, and build software to help us identify and retrieve it in future. In that regard – managing and retrieving our data – we’re better off than the hoarders who have so much stuff they couldn’t tell you what they have or where it is. But we do have one major problem in common with them; it’s hard to distinguish between information that’s likely to be useful, and information that we will likely never reference again.

Information Assets and Liabilities

Since storage is cheap, it’s tempting to keep all the information we produce. Every document, internal and external, every memo, e-mail message, meeting agenda and minutes – everything. And this gets more attractive as we develop DITA-related tools that help streamline the cataloguing and storing process. But this mass warehousing approach suggests that all our information is either of equal value, or of equally unknown value. And more is not always better, especially when it comes to rapid search and retrieval of information in future.

One thing DITA, and other information typing technologies cannot help with is the ranking of information in terms of likely relevance. This is likely to be specific to your particular topic domain or specialization, but in order to avoid hoarding and amassing too much undifferentiated information we need some standardization in the way we approach assigning value to topics.

Imputed versus Apparent Relevance

We can impute value, which pretty much means make a good guess when we produce information as to whether we might reference it again in future. Writers could do this based on the feedback they received from subject matter experts (SMEs) when they gathered the information; they can get a pretty good feel for how important or controversial topics were, and how much interest there was in them. They can also use reviewer comments to distinguish the hot topics from the less interesting ones, and assign the more debated information a higher relevance.

There’s also the apparent value of information, which is well-known to those engaged in search engine optimization. This is indicated by how well linked the information is – how much other information references it or is referenced by it. Topics at the center of a web are more likely to have lasting relevance than those that attract less attention, and we can imagine automated tools to track and assign a dynamic relevance quota that represents this more measurable or apparent value of a topic.

Based on their likely relevance, then, topics can also be assigned a “time to live” number that determines how slowly or rapidly their likely relevance expires. This value ticks down to zero (or “irrelevant”) for as long as the topic remains untouched; at that point it might be removed from the system.

What’s the Payback?

The payback for DITA and similar technologies is clear. If we don’t learn from past topics, decisions, meetings, and e-mail threads – then we are doomed to repeat them. Building a library of tried and tested, edited and legalized information topics has clear value. But we need to develop a hierarchy of information such that the likely most valuable is the most thoroughly tagged and categorized, and information that proves to be of little use can be quietly and efficiently sent to a “stack” or jettisoned for good. Only by doing this can we most efficiently use the time and expertise of writers, editors, and program managers who are tasked with maintaining that core company asset – information.

More Information

This topic, and more, is further discussed in an upcoming issue of rewrite, where we provide short, immediately useful and engaging articles relating to obtaining, organizing, transforming and producing written information. Take a look and see if a subscription might be useful for you; in the meantime, thanks for your interest in the readytext blog.