Neurolinguistic Approach to Natural Language Processing with Applications to Medical Text Analysis

Wlodzislaw Duch^1,2, Pawel Matykiewicz^1,3, John Pestian³
¹Department of Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Torun, Poland.
²School of Computer Engineering, Nanyang Technological University, Singapore.
³Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio, USA, and

Abstract.

Brain processes responsible for understanding language are approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. Approximation of this process is of great practical and theoretical interest. Snapshots of activations of various concepts in the brain spreading through associative network may be captured in a vector model. Medical ontologies are used to identify concepts of specific semantic type in the text, and add to each of them related concepts, providing expanded vector representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Results show significantly improved clustering and classification accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts.

Keywords: Natural language processing; Semantic networks; Spreading activation networks; Medical ontologies; vector models in NLP

Neural Networks 21(10), 1500-1510, 2008

Preprint for comments in PDF, 771 KB.

BACK to the publications of W. Duch.
BACK to the on-line publications of the Department of Informatics, NCU.