January 1999
|
Practical Taxonomies
Hard-won wisdom for creating a workable knowledge classification system
-By Sarah L. Roberts-Witt
The only thing harder than finding any particular item of information may be finding it again. As organizations and
individuals struggle over the intractable problem of data smog, the issue isn't so much acquiring the information in the
first place, but remembering just where it was left.
The solution is a knowledge classification system-including a taxonomy structure for navigating the system-that
categorizes all the information the organization chooses to track in a logical manner so that it can be reliably
accessed by anyone in the organization.
Sounds easy, right? Just ask Yahoo! The opening page of Yahoo! provides an entrŽe to one of the largest, most
familiar and most frequently accessed knowledge classification systems in the world, fronted by a basic but highly
usable taxonomy. "We continually revisit and re-evaluate the directory structure to make sure it still works for the
people who use it," said Srinija Srinivasan, Yahoo!'s editor-in-chief. "We are all about determining which differences
we can carve out, as well as defining which similarities are meaningful."
When it comes to creating a knowledge-classification system, it is fair to say that the majority of organizations
don't face the challenges Yahoo! does in terms of information volume. Conversely, most organizations don't have the
luxury of attacking the problem with an army of human indexers and editors. But there is a common thread-the need
to tame the seemingly endless influx of data in a way that is usable, effective and simple.
|
Inxight's LinguistX analyses and summarizes the
content of documents, allowing automated classification. |
The need for information to have structure readily recognized by humans obviously predates the digital era. Think
of organizational charts, topic indexes, outlines, product catalogs, family trees, food chains and animal and plant
taxonomies. All are arranged to provide human context-the threshold for knowledge-based action.
But in this age of information glut, increasingly competitive markets and high turnover, the needs of businesses to
put such systems in place has taken on greater and greater urgency.
"Companies know they need to do this," said Don DePalma, senior analyst at Forrester Research, Inc. "But there are
major issues involved in putting the information somewhere that anyone in the organization can find it."
Formally categorizing organizational knowledge is an involved and often difficult undertaking that requires asking,
and answering, some tough questions. What types of data and information should the system index? Can it be fully
automated or will dedicated knowledge professionals be required to make it really hum? How can such a system
possibly serve the needs of every knowledge worker within an organization? Should it try to? Is the current corporate
culture one that fosters knowledge sharing or must it be reshaped?
The answers to these questions vary from company to company, and, in some cases, from department to department.
To understand the challenges faced by organizations of all types as they embark on knowledge classification efforts,
we surveyed managers from organizations that have already made the trek.
"We have clients spending $100 million putting content online but not analyzing or organizing it. They
struggle under the weight of available information." - Mark Post, vice president, Renaissance Worldwide
According to knowledge managers for the State of Washington, Ernst & Young and KPMG, as well as a range of
expert knowledge consultants and product marketers, the news isn't all bad. As slippery as the problem is, a number
of clear principles are emerging that, used wisely, can guide a company through the process of designing and
implementing a practical knowledge classification system. In addition, more and better software tools and consulting
services are becoming available to provide assistance.
Emergent relationships
As with many exercises in process improvement, the first step in organizing knowledge involves difficult self-
inspection. The members of the organization must, on some level, admit there's a problem. Acknowledging that
existing knowledge is inaccessible, unusable or both, even if an intranet or similar solution is already in place, will
take a company a long way in its quest toward a successful knowledge classification system.
"We have customers coming to us all the time saying, 'I need a way that I can link knowledge and available
information,'" said Mark Post, vice president of the knowledge management solutions group at Renaissance
Worldwide. "We also have clients spending over $100 million putting content online but not analyzing and organizing
it, so that people struggle under the weight of available information."
Sometimes the need for classification is self-evident. For Phil Coombs, project director for the government information
locator project at the Washington State Library, the increasing volume of state and local government information
posted to the Internet and the increasing demand for that information by Web-connected members of the public made
it clear that something needed to be done.
"Our first step when we set out to tackle this project was to recognize that information exists and lots of it," said
Coombs. "At the time, the Internet and the Web were new, and the government was making tons of information
available, so it was clear that there was enough data to fuel this thing." His solution, Find-it! Washington, has proven
to be such a success that today, courtesy of the federal government, Coombs is advising state government
organizations around the country about his methodologies.
Other times, the need to develop a classification system may sneak up on organizations, even one fully steeped in a
knowledge culture.
"KPMG has been in the business of managing knowledge for a long time. We had always done knowledge-sharing at
meetings or training sessions or via private library," said Nilesh K. Shah, partner in charge of Tax Knowledge
Management. What the firm lacked was a centralized, user-friendly, contextual system to unearth, catalog and link
information for KPMG's 5,000 tax professionals.
Two years ago, a series of partial efforts culminated in one centralized initiative to classify all the firm's tax
information in one system-an effort that paid off in the release last year of the KPMG Tax Knowledge System.
According to Shah, "Our motto was 'Invent Wheel Once.'"
Once the need for knowledge classification has been recognized, the next task is to define the system requirements.
The most important decision to be made concerns the system's intended audience.
"You have to figure out the problem you're trying to solve," said Glenn Kelman, vice president of marketing at
Plumtree Software, makers of Plumtree Server. "Knowledge management and classification are enabling strategies.
They are not answers in and of themselves."
For Ernst & Young, figuring out how to tap into and organize the vast amounts of unstructured knowledge its army of
consultants possessed was of paramount importance. Doing so could shave time off the delivery of services, create
new products and in turn boost profits. The answer, KnowledgeWeb, was born in 1993.
"Knowledge is imperative for a consulting firm," said Giovanni Piazza, director with Ernst & Young's Center for
Business Knowledge and one of the original KnowledgeWeb architects. "We had to work hard to figure out how to
empower the individual consultants by providing this information." Apparently, they succeeded. The KnowledgeWeb
currently contains approximately half a million documents and serves more than 100,000 consultants worldwide.
But the real proof is in the number of people who use it. According to Piazzi, the KnowledgeWeb received three
million hits for the month of October 1998 alone.
Despite the common notion that providing consultants with instant access to a firm's cumulative knowledge could
prove valuable in productivity and profit, the Ernst & Young and KPMG systems are very different in terms of scope
and intended audiences.
Ernst & Young's KnowledgeWeb served the broad audience of associated consultants across practices and
geographies. "Overwhelmingly, Ernst & Young professionals don't see the office, so portability of knowledge was and
is important for us," said Piazzi. "But we also had to figure out how to share the knowledge among different cultures."
The KPMG system, on the other hand, was designed for a more specialized group of tax professionals, so breadth of
classification was less important than depth in the relevant fields.
In the case of Find-It! Washington, Coombs said that the key step was making the audience-focus decision early and
single-mindedly sticking to it. So doing, he and his team were able to create a limited but definitive structure and
vocabulary for document classification. "At its most atomic level, this limited set of terms allow people to at least have
a chance of finding what they're looking for," he said.
According to Forrester's DePalma, organizations have a greater chance of success if they begin with a problem that is
manageable, measurable and has a well-defined scope. "Large-scale efforts that are launched as a Big Bang kind of
thing rarely succeed," he said. "We often suggest that organizations pick a customer-facing problem, such as cutting
customer-service calls in half, because you can easily quantify that."
Broad and flat
Charting the Course to Knowledge Classification |
|
Determining an organization's need for a knowledge classification system and settling on the business problem
that needs to be addressed are preliminary hurdles. Once those obstacles have been overcome, the hard work of
actually developing the classification system can finally begin.
"The taxonomy defines the terms you're interested in and then you have to determine what makes up those terms,"
said Thomas Trimmer, president of grapeVine Technologies. "Though figuring out where to start can be frustrating, a
good taxonomy is recognized as a central part of a knowledge management system."
A taxonomy, such as a hierarchical tree of terms, is a structure for providing guidance over a classification system. It
shows the user the groupings and relationships that can emerge from information in many patterns. The user can, for
instance, either zoom down to fine-grained levels to pin down exact items, or they can scan and highlight different
parts of a system, enabling the potential creation of knowledge through open associations of relevant material.
There are several widely recognized approaches for developing taxonomies. A priori categorization simply means
pre-defining the categories and terms, and free categorization implies mining the data you have to see what
categories bubble up. A third option begins with standard taxonomy templates that are then customized for the
particular application. For example, grapeVine offers a series of starter templates optimized for specific industries.
No matter which approach is employed to determine the taxonomy's lexicon, shooting for perfection could result in
disaster. Simplicity and practicality are at the core of effectively rendered taxonomies. Broad, flat taxonomies are
more effective than deep vertical ones, for example. And a straightforward classification scheme as opposed to an
obtuse one will make users much happier. "It's not yet clear what will work best for knowledge classification, but
there is the danger that if you have a complicated approach, people will never use the system," said Nick Walker,
development director at PC Docs/Fulcrum. "People shy away from time-consuming and complicated systems."
Ernst & Young stayed on the side of simplicity by going with a taxonomy that was flat and reasonably close to the
firm's business practices. "The risk with taxonomy is that you fall prey to the search for perfection," said Piazza. "But
you have to remember that a good taxonomy should describe a reduced set of attributes so that it's meaningful to
everyone. You can do amazing things with five or six attributes, each one of which has 20 values or so."
The same held true for Coombs' project. He and his group were pushed to comply with the State of Washington's
information indexing standard, but instead they landed on their own narrow set of indexing terms. Even though Find-
it! Washington covers nearly any type of state or local government information imaginable, the public-facing
taxonomy has only 25 categories.
Allowing for the system's organic evolution and growth will also increase the odds for success. For example, one
giant taxonomy may not work for certain organizations. Creating smaller more specialized taxonomies for individual
departments or even communities of practice may be a better solution. Or perhaps different tool sets or classification
values are in order.
Just add humans
| KM software from grapeVine
maintains custom category lists and provides taxonomy templates by industry. |
At Ernst & Young, Piazza and other knowledge-management architects are constantly developing better, more
streamlined ways of using the information on the KnowledgeWeb. "Two years ago we were working on how to
improve the submission and development of filtered knowledge," said Piazza. "Now we're pushing for a more
integrated environment."
Though replicating a Yahoo!-like system of dedicated ers and manual processes is typically unrealistic, planning to
rely exclusively on technology to keep a classification system buzzing is an equally unattainable goal. "We would
love to use more automated processes to streamline our work, but we will never rely on them completely," said
Yahoo's Srinivasan.
"Machines and technology may be frightfully efficient and consistent, but they can't replace the knowledge in people's
heads."
Plumtree's Kelman agrees that a combination of automation and the human touch is the only way to go. "Too much
faith is placed in technology-companies think that by purchasing technology they're purchasing a classification
system," said Kelman. "People have to be involved in the solution or it will never work."
For its classification system, Find-it! Washington relies heavily on automated processes, but it also depends on at least
some submitters being conscientious enough to embed indexing tags within the documents they post. If the author of
a document has taken the time to fill in and code the index fields, it ensures that the document will be correctly
classified.
By contrast, untagged documents are scanned for related terms and then compared against categorization rules,
which enables them to be placed in the proper information bucket at least most of the time. However, there is a built-
in margin of error. "The accuracy automated classification is limited because we cannot choose the same terms as the
author," said Coombs. "But incremental improvements and refinements can make a huge difference."
"Large-scale efforts that are launched as a Big Bang kind of thing rarely succeed." - Don DePalma,
senior analyst, Forrester Research
At Ernst & Young, automation does not play as prominent a role. Though automated processes conduct some initial
document classification, categorization is usually set in motion via a submission form that is completed by the Ernst &
Young professional who introduces a document to the KnowledgeWeb. After documents are submitted, subject-matter
experts or knowledge workers examine the document-referred to at Ernst & Young as unfiltered knowledge-and
determine how the information within can and should be used. The filtered and refined data that enters the
KnowledgeWeb is called a Knowledge Object. "Ernst & Young classification is done primarily by people because of
the tacit knowledge that comes with that," concluded Piazzi.
For its Tax Knowledge System, KPMG also uses subject-matter experts to filter and review the content. Shah says that
certain aspects of the data must be sanitized, primarily for confidentiality purposes, before being made fully available.
On the other hand, KMPG uses automation to process the inputs from external news feeds so that relevant articles are
classified within the existing taxonomy.
Choose your weapon
As companies determine the right balance between automated and manual methods, they will find a number of
software suppliers inhabiting the space of document and data classification and taxonomy creation, some of which
have been fine-tuning their technologies through several releases.
Among them is Aptex, a subsidiary of HNC Software. The Aptex product family, which includes ResourceMiner and
Convectis, is based on content-mining technology. "Content mining does well with numbers, jargon
and technical data," says John Gaffney, Aptex's vice-president of marketing. "It's different from natural language
processing because it doesn't use phrase and syntactical recognition." Aptex's products have been heavily used for
email classification.
Another prominent player is Inxight, a spin-off of the famed Xerox Palo Alto Research Center (PARC). Inxight's product
offerings include its LinguistX natural language processing platform as well as the Hyperbolic Tree
software and Summarizer Plus, which can extract summaries from documents so that users don't
have to view an entire document or even be in the application that created the document.
Also recently in the spotlight is newcomer grapeVine Technologies. Its grapeVine for (Lotus) Notes
and grapeVine for Compass provide rules-based document discovery and classification, as well as
collaboration components, in addition to its set of vertical taxonomy templates. Autonomy, Plumtree and PC
Docs/Fulcrum also have applicable product offerings as well as significant product developments in the pipe.
In addition to software tools, numerous consulting firms feature knowledge management areas of practice with
expertise in taxonomy creation. Renaissance Worldwide, KPMG, Ernst & Young, Deloitte & Touche, Delphi Group and
grapeVine Technologies, just to name a few, offer such services.
Experts agree that no cookie-cutter solution exists; every organization must face up to the hard work of self-
inspection, analysis and system development to create a workable knowledge classification system. If present
conditions are any indication, however, the amount of available information will continue to increase, as will the
amount of time wasted in the search for that which is valuable. Nevertheless, available tools and services and the
hard-won wisdom of those who have already made the journey, provide useful accelerators.
Creating and acquiring knowledge will always be the hard work of business. By developing practical taxonomies,
organizations shouldn't also have to struggle with finding that knowledge the second time around.
Sarah L. Roberts-Witt is a freelance writer specializing in knowledge management and business intelligence
technologies.
|