Abstract
Text summarization is a rapidly growing field with many new innovations. End-to-end models using the sequence-to-sequence architecture achieve high scores according to automatic metrics on standard datasets. However, they frequently generate summaries that are factually inconsistent with the original article -- a vital problem to be solved before the summaries can be used in real-world applications. In addition, they are not generalizable to new domains, especially those with few training examples. In this dissertation, we propose to explicitly separate the two steps of content selection and surface realization in summarization. Content selection is the process of choosing important words/phrases/sentences from the document. Surface realization is the transformation of the selected content into a coherent, grammatical text summary. This paradigm more closely follows human patterns of summarization, as a human will often find important ideas within the article (content selection), and then write out a summary based on those ideas (surface realization). We make several contributions to the summarization field using this paradigm of separate content selection and surface realization steps. First, we present two techniques focusing on content selection: a model that can rank both single sentences and pairs of sentences in a unified space and a cascade approach that highlights salient words/phrases from sentences. Second, we present several studies on sentence fusion in summarization: an analysis of the quality of state-of-the-art summarizers for performing sentence fusion, a dataset containing points of correspondence between sentences, and a method utilizing these points of correspondence to improve sentence fusion. Finally, we introduce two methods with separate content selection and surface realization steps for multi-document summarization: a technique to adapt single document summarizers to the multi-document setting based on the Maximal Marginal Relevance (MMR) algorithm and a conceptual framework to model asynchronous endorsement between synopses and documents.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2020
Semester
Fall
Advisor
Liu, Fei
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0008346; DP0023783
URL
https://purls.library.ucf.edu/go/DP0023783
Language
English
Release Date
December 2021
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Lebanoff, Logan, "Separating Content Selection from Surface Realization in Neural Text Summarization" (2020). Electronic Theses and Dissertations, 2020-2023. 375.
https://stars.library.ucf.edu/etd2020/375