Keywords

Natural language processing (Computer science); Parsing (Computer grammar); Sequential machine theory

Abstract

Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2003

Advisor

Gómez, Fernando

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Department of Electrical Engineering and Computer Science

Degree Program

Electrical Engineering and Computer Science

Format

PDF

Pages

215 p.

Language

English

Rights

Written permission granted by copyright holder to the University of Central Florida Libraries to digitize and distribute for nonprofit, educational purposes.

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Identifier

DP0000760

Subjects

Dissertations, Academic -- Engineering; Engineering -- Dissertations, Academic

Accessibility Status

Searchable text

Share

COinS