Feature-Practical Taxonomies

January 1999

Practical Taxonomies

Hard-won wisdom for creating a workable knowledge classification system

-By Sarah L. Roberts-Witt

The only thing harder than finding any particular item of information may be finding it again. As organizations and individuals struggle over the intractable problem of data smog, the issue isn't so much acquiring the information in the first place, but remembering just where it was left.

The solution is a knowledge classification system-including a taxonomy structure for navigating the system-that categorizes all the information the organization chooses to track in a logical manner so that it can be reliably accessed by anyone in the organization.

Sounds easy, right? Just ask Yahoo! The opening page of Yahoo! provides an entrŽe to one of the largest, most familiar and most frequently accessed knowledge classification systems in the world, fronted by a basic but highly usable taxonomy. "We continually revisit and re-evaluate the directory structure to make sure it still works for the people who use it," said Srinija Srinivasan, Yahoo!'s editor-in-chief. "We are all about determining which differences we can carve out, as well as defining which similarities are meaningful."

When it comes to creating a knowledge-classification system, it is fair to say that the majority of organizations don't face the challenges Yahoo! does in terms of information volume. Conversely, most organizations don't have the luxury of attacking the problem with an army of human indexers and editors. But there is a common thread-the need to tame the seemingly endless influx of data in a way that is usable, effective and simple.

Inxight's LinguistX analyses and summarizes the content of documents, allowing automated classification.

The need for information to have structure readily recognized by humans obviously predates the digital era. Think of organizational charts, topic indexes, outlines, product catalogs, family trees, food chains and animal and plant taxonomies. All are arranged to provide human context-the threshold for knowledge-based action.

But in this age of information glut, increasingly competitive markets and high turnover, the needs of businesses to put such systems in place has taken on greater and greater urgency. "Companies know they need to do this," said Don DePalma, senior analyst at Forrester Research, Inc. "But there are major issues involved in putting the information somewhere that anyone in the organization can find it."

Formally categorizing organizational knowledge is an involved and often difficult undertaking that requires asking, and answering, some tough questions. What types of data and information should the system index? Can it be fully automated or will dedicated knowledge professionals be required to make it really hum? How can such a system possibly serve the needs of every knowledge worker within an organization? Should it try to? Is the current corporate culture one that fosters knowledge sharing or must it be reshaped?

The answers to these questions vary from company to company, and, in some cases, from department to department. To understand the challenges faced by organizations of all types as they embark on knowledge classification efforts, we surveyed managers from organizations that have already made the trek.

"We have clients spending $100 million putting content online but not analyzing or organizing it. They struggle under the weight of available information." - Mark Post, vice president, Renaissance Worldwide

According to knowledge managers for the State of Washington, Ernst & Young and KPMG, as well as a range of expert knowledge consultants and product marketers, the news isn't all bad. As slippery as the problem is, a number of clear principles are emerging that, used wisely, can guide a company through the process of designing and implementing a practical knowledge classification system. In addition, more and better software tools and consulting services are becoming available to provide assistance.

Emergent relationships As with many exercises in process improvement, the first step in organizing knowledge involves difficult self- inspection. The members of the organization must, on some level, admit there's a problem. Acknowledging that existing knowledge is inaccessible, unusable or both, even if an intranet or similar solution is already in place, will take a company a long way in its quest toward a successful knowledge classification system.

"We have customers coming to us all the time saying, 'I need a way that I can link knowledge and available information,'" said Mark Post, vice president of the knowledge management solutions group at Renaissance Worldwide. "We also have clients spending over $100 million putting content online but not analyzing and organizing it, so that people struggle under the weight of available information."

Sometimes the need for classification is self-evident. For Phil Coombs, project director for the government information locator project at the Washington State Library, the increasing volume of state and local government information posted to the Internet and the increasing demand for that information by Web-connected members of the public made it clear that something needed to be done.

"Our first step when we set out to tackle this project was to recognize that information exists and lots of it," said Coombs. "At the time, the Internet and the Web were new, and the government was making tons of information available, so it was clear that there was enough data to fuel this thing." His solution, Find-it! Washington, has proven to be such a success that today, courtesy of the federal government, Coombs is advising state government organizations around the country about his methodologies.

Other times, the need to develop a classification system may sneak up on organizations, even one fully steeped in a knowledge culture.

"KPMG has been in the business of managing knowledge for a long time. We had always done knowledge-sharing at meetings or training sessions or via private library," said Nilesh K. Shah, partner in charge of Tax Knowledge Management. What the firm lacked was a centralized, user-friendly, contextual system to unearth, catalog and link information for KPMG's 5,000 tax professionals.

Two years ago, a series of partial efforts culminated in one centralized initiative to classify all the firm's tax information in one system-an effort that paid off in the release last year of the KPMG Tax Knowledge System. According to Shah, "Our motto was 'Invent Wheel Once.'"

Once the need for knowledge classification has been recognized, the next task is to define the system requirements. The most important decision to be made concerns the system's intended audience.

"You have to figure out the problem you're trying to solve," said Glenn Kelman, vice president of marketing at Plumtree Software, makers of Plumtree Server. "Knowledge management and classification are enabling strategies. They are not answers in and of themselves."

For Ernst & Young, figuring out how to tap into and organize the vast amounts of unstructured knowledge its army of consultants possessed was of paramount importance. Doing so could shave time off the delivery of services, create new products and in turn boost profits. The answer, KnowledgeWeb, was born in 1993.

"Knowledge is imperative for a consulting firm," said Giovanni Piazza, director with Ernst & Young's Center for Business Knowledge and one of the original KnowledgeWeb architects. "We had to work hard to figure out how to empower the individual consultants by providing this information." Apparently, they succeeded. The KnowledgeWeb currently contains approximately half a million documents and serves more than 100,000 consultants worldwide.

But the real proof is in the number of people who use it. According to Piazzi, the KnowledgeWeb received three million hits for the month of October 1998 alone.

Despite the common notion that providing consultants with instant access to a firm's cumulative knowledge could prove valuable in productivity and profit, the Ernst & Young and KPMG systems are very different in terms of scope and intended audiences.

Ernst & Young's KnowledgeWeb served the broad audience of associated consultants across practices and geographies. "Overwhelmingly, Ernst & Young professionals don't see the office, so portability of knowledge was and is important for us," said Piazzi. "But we also had to figure out how to share the knowledge among different cultures."

The KPMG system, on the other hand, was designed for a more specialized group of tax professionals, so breadth of classification was less important than depth in the relevant fields.

In the case of Find-It! Washington, Coombs said that the key step was making the audience-focus decision early and single-mindedly sticking to it. So doing, he and his team were able to create a limited but definitive structure and vocabulary for document classification. "At its most atomic level, this limited set of terms allow people to at least have a chance of finding what they're looking for," he said.

According to Forrester's DePalma, organizations have a greater chance of success if they begin with a problem that is manageable, measurable and has a well-defined scope. "Large-scale efforts that are launched as a Big Bang kind of thing rarely succeed," he said. "We often suggest that organizations pick a customer-facing problem, such as cutting customer-service calls in half, because you can easily quantify that."

Broad and flat

Charting the Course to Knowledge Classification

Determining an organization's need for a knowledge classification system and settling on the business problem that needs to be addressed are preliminary hurdles. Once those obstacles have been overcome, the hard work of actually developing the classification system can finally begin. "The taxonomy defines the terms you're interested in and then you have to determine what makes up those terms," said Thomas Trimmer, president of grapeVine Technologies. "Though figuring out where to start can be frustrating, a good taxonomy is recognized as a central part of a knowledge management system."

A taxonomy, such as a hierarchical tree of terms, is a structure for providing guidance over a classification system. It shows the user the groupings and relationships that can emerge from information in many patterns. The user can, for instance, either zoom down to fine-grained levels to pin down exact items, or they can scan and highlight different parts of a system, enabling the potential creation of knowledge through open associations of relevant material.

There are several widely recognized approaches for developing taxonomies. A priori categorization simply means pre-defining the categories and terms, and free categorization implies mining the data you have to see what categories bubble up. A third option begins with standard taxonomy templates that are then customized for the particular application. For example, grapeVine offers a series of starter templates optimized for specific industries.

No matter which approach is employed to determine the taxonomy's lexicon, shooting for perfection could result in disaster. Simplicity and practicality are at the core of effectively rendered taxonomies. Broad, flat taxonomies are more effective than deep vertical ones, for example. And a straightforward classification scheme as opposed to an obtuse one will make users much happier. "It's not yet clear what will work best for knowledge classification, but there is the danger that if you have a complicated approach, people will never use the system," said Nick Walker, development director at PC Docs/Fulcrum. "People shy away from time-consuming and complicated systems."

Ernst & Young stayed on the side of simplicity by going with a taxonomy that was flat and reasonably close to the firm's business practices. "The risk with taxonomy is that you fall prey to the search for perfection," said Piazza. "But you have to remember that a good taxonomy should describe a reduced set of attributes so that it's meaningful to everyone. You can do amazing things with five or six attributes, each one of which has 20 values or so."

The same held true for Coombs' project. He and his group were pushed to comply with the State of Washington's information indexing standard, but instead they landed on their own narrow set of indexing terms. Even though Find- it! Washington covers nearly any type of state or local government information imaginable, the public-facing taxonomy has only 25 categories.

Allowing for the system's organic evolution and growth will also increase the odds for success. For example, one giant taxonomy may not work for certain organizations. Creating smaller more specialized taxonomies for individual departments or even communities of practice may be a better solution. Or perhaps different tool sets or classification values are in order.

Just add humans

KM software from grapeVine maintains custom category lists and provides taxonomy templates by industry.

At Ernst & Young, Piazza and other knowledge-management architects are constantly developing better, more streamlined ways of using the information on the KnowledgeWeb. "Two years ago we were working on how to improve the submission and development of filtered knowledge," said Piazza. "Now we're pushing for a more integrated environment." Though replicating a Yahoo!-like system of dedicated ers and manual processes is typically unrealistic, planning to rely exclusively on technology to keep a classification system buzzing is an equally unattainable goal. "We would love to use more automated processes to streamline our work, but we will never rely on them completely," said Yahoo's Srinivasan.

"Machines and technology may be frightfully efficient and consistent, but they can't replace the knowledge in people's heads."

Plumtree's Kelman agrees that a combination of automation and the human touch is the only way to go. "Too much faith is placed in technology-companies think that by purchasing technology they're purchasing a classification system," said Kelman. "People have to be involved in the solution or it will never work."

For its classification system, Find-it! Washington relies heavily on automated processes, but it also depends on at least some submitters being conscientious enough to embed indexing tags within the documents they post. If the author of a document has taken the time to fill in and code the index fields, it ensures that the document will be correctly classified.

By contrast, untagged documents are scanned for related terms and then compared against categorization rules, which enables them to be placed in the proper information bucket at least most of the time. However, there is a built- in margin of error. "The accuracy automated classification is limited because we cannot choose the same terms as the author," said Coombs. "But incremental improvements and refinements can make a huge difference."

"Large-scale efforts that are launched as a Big Bang kind of thing rarely succeed." - Don DePalma, senior analyst, Forrester Research

At Ernst & Young, automation does not play as prominent a role. Though automated processes conduct some initial document classification, categorization is usually set in motion via a submission form that is completed by the Ernst & Young professional who introduces a document to the KnowledgeWeb. After documents are submitted, subject-matter experts or knowledge workers examine the document-referred to at Ernst & Young as unfiltered knowledge-and determine how the information within can and should be used. The filtered and refined data that enters the KnowledgeWeb is called a Knowledge Object. "Ernst & Young classification is done primarily by people because of the tacit knowledge that comes with that," concluded Piazzi.

For its Tax Knowledge System, KPMG also uses subject-matter experts to filter and review the content. Shah says that certain aspects of the data must be sanitized, primarily for confidentiality purposes, before being made fully available. On the other hand, KMPG uses automation to process the inputs from external news feeds so that relevant articles are classified within the existing taxonomy.

Choose your weapon

As companies determine the right balance between automated and manual methods, they will find a number of software suppliers inhabiting the space of document and data classification and taxonomy creation, some of which have been fine-tuning their technologies through several releases. Among them is Aptex, a subsidiary of HNC Software. The Aptex product family, which includes ResourceMiner and Convectis, is based on content-mining technology. "Content mining does well with numbers, jargon and technical data," says John Gaffney, Aptex's vice-president of marketing. "It's different from natural language processing because it doesn't use phrase and syntactical recognition." Aptex's products have been heavily used for email classification.

Another prominent player is Inxight, a spin-off of the famed Xerox Palo Alto Research Center (PARC). Inxight's product offerings include its LinguistX natural language processing platform as well as the Hyperbolic Tree software and Summarizer Plus, which can extract summaries from documents so that users don't have to view an entire document or even be in the application that created the document.

Also recently in the spotlight is newcomer grapeVine Technologies. Its grapeVine for (Lotus) Notes and grapeVine for Compass provide rules-based document discovery and classification, as well as collaboration components, in addition to its set of vertical taxonomy templates. Autonomy, Plumtree and PC Docs/Fulcrum also have applicable product offerings as well as significant product developments in the pipe.

In addition to software tools, numerous consulting firms feature knowledge management areas of practice with expertise in taxonomy creation. Renaissance Worldwide, KPMG, Ernst & Young, Deloitte & Touche, Delphi Group and grapeVine Technologies, just to name a few, offer such services.

Experts agree that no cookie-cutter solution exists; every organization must face up to the hard work of self- inspection, analysis and system development to create a workable knowledge classification system. If present conditions are any indication, however, the amount of available information will continue to increase, as will the amount of time wasted in the search for that which is valuable. Nevertheless, available tools and services and the hard-won wisdom of those who have already made the journey, provide useful accelerators.

Creating and acquiring knowledge will always be the hard work of business. By developing practical taxonomies, organizations shouldn't also have to struggle with finding that knowledge the second time around.

Sarah L. Roberts-Witt is a freelance writer specializing in knowledge management and business intelligence technologies.

Knowledge Management o 29160 Heathercliff Road Suite 200, Malibu CA 90265 o Phone: 310-589-3100

Feature, 99/01/04, 99/02/03, ID=km199901/featureb1
Keywords=c040

© 1998 CurtCo Freedom Group Design: SuperSite.Net