Abstract

Text summarization is a rapidly growing field with many new innovations. End-to-end models using the sequence-to-sequence architecture achieve high scores according to automatic metrics on standard datasets. However, they frequently generate summaries that are factually inconsistent with the original article -- a vital problem to be solved before the summaries can be used in real-world applications. In addition, they are not generalizable to new domains, especially those with few training examples. In this dissertation, we propose to explicitly separate the two steps of content selection and surface realization in summarization. Content selection is the process of choosing important words/phrases/sentences from the document. Surface realization is the transformation of the selected content into a coherent, grammatical text summary. This paradigm more closely follows human patterns of summarization, as a human will often find important ideas within the article (content selection), and then write out a summary based on those ideas (surface realization). We make several contributions to the summarization field using this paradigm of separate content selection and surface realization steps. First, we present two techniques focusing on content selection: a model that can rank both single sentences and pairs of sentences in a unified space and a cascade approach that highlights salient words/phrases from sentences. Second, we present several studies on sentence fusion in summarization: an analysis of the quality of state-of-the-art summarizers for performing sentence fusion, a dataset containing points of correspondence between sentences, and a method utilizing these points of correspondence to improve sentence fusion. Finally, we introduce two methods with separate content selection and surface realization steps for multi-document summarization: a technique to adapt single document summarizers to the multi-document setting based on the Maximal Marginal Relevance (MMR) algorithm and a conceptual framework to model asynchronous endorsement between synopses and documents.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2020

Semester

Fall

Advisor

Liu, Fei

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008346; DP0023783

URL

https://purls.library.ucf.edu/go/DP0023783

Language

English

Release Date

December 2021

Length of Campus-only Access

1 year

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Lebanoff, Logan, "Separating Content Selection from Surface Realization in Neural Text Summarization" (2020). Electronic Theses and Dissertations, 2020-2023. 375.
https://stars.library.ucf.edu/etd2020/375

Download

Included in

Computer Sciences Commons

COinS

Electronic Theses and Dissertations, 2020-2023

Separating Content Selection from Surface Realization in Neural Text Summarization

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations, 2020-2023

Separating Content Selection from Surface Realization in Neural Text Summarization

Author

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect