Wlodzislaw Duch1,2,
Pawel Matykiewicz1,3,
John Pestian3
1Department of Informatics,
Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
2School of Computer Engineering,
Nanyang Technological University, Singapore.
3Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio, USA, and
Abstract.
Brain processes responsible for understanding language are approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. Approximation of this process is of great practical and theoretical interest. Snapshots of activations of various concepts in the brain spreading through associative network may be captured in a vector model. Medical ontologies are used to identify concepts of specific semantic type in the text, and add to each of them related concepts, providing expanded vector representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Results show significantly improved clustering and classification accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts.
Keywords: Natural language processing; Semantic networks; Spreading activation networks; Medical ontologies; vector models in NLP
Neural Networks 21(10), 1500-1510, 2008
Preprint for comments in PDF, 771 KB.
BACK to the publications of W. Duch.
BACK to the on-line publications
of the Department of Informatics, NCU.