Keywords

natural language processing, concept extraction, question answering, knowledge acquisition, knowledge representation, concept network

Abstract

Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.

Graduation Date

2006

Semester

Spring

Advisor

Gomez, Fernando

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0000985

URL

http://purl.fcla.edu/fcla/etd/CFE0000985

Language

English

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Glinos_Demetrios_G_200605_Phd_Attachment_1.pdf (740 kB)
KstAnswer.java

Glinos_Demetrios_G_200605_PhD_Attachment_2.pdf (679 kB)
KstChunk.java; KstConcept.java; KstDictionary.java

Glinos_Demetrios_G_200606_PhD_Attachment_3.pdf (678 kB)
KstDiscourse.java; KstExtract.java; KstFile.java; KstSplit.java; KstTuple.java

Glinos_Demetrios_G_200605_PhD_Attachment_4.pdf (703 kB)
KstGroup.java; semex.java; semexJNI.java; semexJNI.c; semexJNI.h

Glinos_Demetrios_G_200605_PhD_Attachment_5.pdf (632 kB)
Kst.Util.java

Share

COinS