Identification, expansion, and disambiguation of acronyms in biomedical texts

Authors

    Authors

    D. B. Bracewell; S. Russell;A. S. Wu

    Comments

    Authors: contact us about adding a copy of your work at STARS@ucf.edu

    Keywords

    acronyms; text cleansing; information retrieval; natural language; processing; Computer Science, Information Systems; Computer Science, Theory &; Methods

    Abstract

    With the ever growing amount of biomedical literature there is an increasing desire to use sophisticated language processing algorithms to mine these texts. In order to use these algorithms we must first deal with acronyms, abbreviations, and misspellings. In this paper we look at identifying, expanding, and disambiguating acronyms in biomedical texts. We break the task up into three modular steps: Identification, Expansion, and Disambiguation. For Identification we use a hybrid approach that is composed of a naive Bayesian classifier and a couple of handcrafted rules. We are able to achieve results of 99.96% accuracy with a small training set. We break the expansion up into two categories, local and global expansion. For local expansion we use windowing and longest common subsequence to generate the possible expansion. Global expansion requires an acronym database. To disambiguate the different candidate expansions we use WordNet and semantic similarity, Overall we obtain a recall and precision of over 91%.

    Journal Title

    Parallel and Distributed Processing and Applications - Ispa 2005 Workshops

    Volume

    3759

    Publication Date

    1-1-2005

    Document Type

    Article

    Language

    English

    First Page

    186

    Last Page

    195

    WOS Identifier

    WOS:000233739300021

    ISSN

    0302-9743; 3-540-29770-7

    Share

    COinS