Keywords
Natural language processing (Computer science); Parsing (Computer grammar); Sequential machine theory
Abstract
Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2003
Advisor
Gómez, Fernando
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Department of Electrical Engineering and Computer Science
Degree Program
Electrical Engineering and Computer Science
Format
Pages
215 p.
Language
English
Rights
Written permission granted by copyright holder to the University of Central Florida Libraries to digitize and distribute for nonprofit, educational purposes.
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
Identifier
DP0000760
Subjects
Dissertations, Academic -- Engineering; Engineering -- Dissertations, Academic
STARS Citation
Van Delden, Sebastian Alexander, "Larger-first partial parsing" (2003). Retrospective Theses and Dissertations. 1059.
https://stars.library.ucf.edu/rtd/1059
Contributor (Linked data)
University of Central Florida. College of Engineering and Computer Science (Q7895235)
University of Central Florida. College of Engineering and Computer Science [VIAF]
University of Central Florida. College of Engineering and Computer Science [LC]
Accessibility Status
Searchable text