Keywords
natural language processing, concept extraction, question answering, knowledge acquisition, knowledge representation, concept network
Abstract
Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.
Graduation Date
2006
Semester
Spring
Advisor
Gomez, Fernando
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0000985
URL
http://purl.fcla.edu/fcla/etd/CFE0000985
Language
English
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Glinos, Demetrios, "Syntax-based Concept Extraction For Question Answering" (2006). Electronic Theses and Dissertations. 793.
https://stars.library.ucf.edu/etd/793
KstAnswer.java
Glinos_Demetrios_G_200605_PhD_Attachment_2.pdf (679 kB)
KstChunk.java; KstConcept.java; KstDictionary.java
Glinos_Demetrios_G_200606_PhD_Attachment_3.pdf (678 kB)
KstDiscourse.java; KstExtract.java; KstFile.java; KstSplit.java; KstTuple.java
Glinos_Demetrios_G_200605_PhD_Attachment_4.pdf (703 kB)
KstGroup.java; semex.java; semexJNI.java; semexJNI.c; semexJNI.h
Glinos_Demetrios_G_200605_PhD_Attachment_5.pdf (632 kB)
Kst.Util.java