Influence of a priori Knowledge on Medical Document Categorization

Lukasz Itert1,2, Wlodzislaw Duch1,3 and John Pestian2
1Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
2Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio, USA, and
3School of Computer Engineering, Nanyang Technological University, Singapore.


A significant part of medical data remains stored as unstructured texts. Semantic search requires introduction of markup tags. Medical concepts discovered in hospital discharge summaries are used to create several feature spaces. Experts use their background knowledge to categorize new documents, and knowing category of the document disambiguate words and acronyms. A model of document similarity to reference sources that captures some intuitions of an expert is introduced. Parameters of the model are evaluated using linear programming techniques. This approach is applied to categorization of the medical discharge summaries providing simpler and more accurate model than alternative text categorization approaches.

Preprint for comments in PDF, 132 KB.
Reference: Itert L, Duch W, Pestian J, Influence of a priori Knowledge on Medical Document Categorization, IEEE Symposium on Computational Intelligence in Data Mining, IEEE Press, April 2007, pp. 163-170.

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.