Medical Document Categorization Using a Priori Knowledge

Lukasz Itert^1,2, Wlodzislaw Duch^1,3 and John Pestian²
¹Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
²Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio, USA, and
³School of Computer Engineering, Nanyang Technological University, Singapore.

Abstract.

A significant part of medical data remains stored as unstructured texts. Semantic search requires introduction of markup tags. Experts use their background knowledge to categorize new documents, and knowing category of these documents disambiguate words and acronyms. A model of document similarity that includes a priori knowledge and captures intuition of an expert, is introduced. It has only a few parameters that may be evaluated using linear programming techniques. This approach applied to categorization of medical discharge summaries provided simpler and much more accurate model than alternative text categorization approaches.

Preprint for comments in PDF, 308 KB.
Reference: Itert L, Duch W, Pestian J, Medical document categorization using a priori knowledge, Lecture Notes in Computer Science, Vol 3696, 641-646, 2005

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.