January 1999

In Search of KM's Holy Grail: Natural Language Processing

When it comes to the hard computational work of scanning, sorting and classifying documents, a method known as statistical processing has been carrying the bulk of the load. At its most basic level, statistical processing classifies information by tracking the number of times certain words and phrases are used. But even when more sophisticated algorithms that search for the proximity of words are applied, this technology makes no attempts to understand a document's meaning. However, a technological competitor known as natural language processing is ready to tackle the arduous task of interpreting the meaning of what the document's author is saying. "Natural language processing extracts concepts and content as opposed to just extracting words," said Ian Hershey, Director of Advanced Products at Palo Alto-based Inxight. "This allows for real information or knowledge extraction."

The technology is just beginning to come into its own in the knowledge management and document classification arena, as companies such as Inxight, Microsoft, IBM and InQuizit Technologies roll out product-based implementations. Microsoft, for one, has incorporated part of its MindNet semantic knowledge database into the Office 97 grammar checker. And Xerox Palo Alto Research Center spin-off Inxight recently unveiled LinguistX, a full-fledged natural language processing platform.

Several factors are converging to spur the commercial emergence of this technology. For starters, natural language processing is, after many years, evolving to a marketable point. Though its roots date back to 1950s-era government intelligence work, the foundation of modern natural language processing didn't surface until the 1970s, when researchers discovered that language patterns could be encapsulated in mathematical formulas. The subsequent rise of computational linguistics, which focuses on applying the rules and nuances of language within the constraints of a computing environment, has also propelled development forward. In addition, breakthroughs in processing power are enabling this computationally intensive technology to inch closer to real-life applications.

But however significant the advances of the last decades, the technology, as it exists, is far from perfect. "At Inxight, we're still just trying to recognize linguistic patterns, but the next step-the Holy Grail-is for the computer to truly understand language," said Hershey. "The problem is that human language is both ambiguous and complex and lots of the meaning in natural language comes from outside knowledge."

Though even optimists predict that natural-language-processing classification systems are at least three to five years away, these are definitely times to watch as the technology makes inroads to other product areas, such as search engines, document summarization and the like. "Once the technology starts showing up in content management and file systems, you'll see the economy of scale," said Don DePalma, senior analyst at Forrester Research. "That will give these natural-language guys an entrée into selling more back-end processing systems."

Knowledge Management o 29160 Heathercliff Road Suite 200, Malibu CA 90265 o Phone: 310-589-3100

Sidebar, 99/01/04, 99/02/03, ID=km199901/featureb2