ANSWER: Approximate name search with errors in large databases by a novel approach based on prefix-dictionary

Authors

    Authors

    O. Kursun; A. Koufakou; A. Wakchaure; M. Georgiopoulos; K. Reynolds;R. Eaglin

    Comments

    Authors: contact us about adding a copy of your work at STARS@ucf.edu

    Abbreviated Journal Title

    Int. J. Artif. Intell. Tools

    Keywords

    alias finding; data mining; data querying; data sharing; dirty data; duplicate elimination; edit distance; ftizzy name matching; record; matching; soundex; Computer Science, Artificial Intelligence; Computer Science, ; Interdisciplinary Applications

    Abstract

    The obvious need for using modem computer networking capabilities to enable the effective sharing of information has resulted in data-sharing systems, which store, and manage large amounts of data. These data need to be effectively searched and analyzed. More specifically, in the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information, as required in homeland security, criminology, and medical applications, amongst others. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. These techniques have demonstrated varying levels of success, but there is a pressing need for name matching approaches that provide high levels of accuracy in matching names, while at the same time maintaining low computational complexity. In this paper, such a technique, called ANSWER, is proposed and its characteristics are discussed. Our results demonstrate that ANSWER possesses high accuracy, as well as high speed and is superior to other techniques of retrieving fuzzy name matches in large databases.

    Journal Title

    International Journal on Artificial Intelligence Tools

    Volume

    15

    Issue/Number

    5

    Publication Date

    1-1-2006

    Document Type

    Article

    Language

    English

    First Page

    839

    Last Page

    848

    WOS Identifier

    WOS:000241502500011

    ISSN

    0218-2130

    Share

    COinS