Scopus Export 2010-2014

Discovering Similar Passages Within Large Text Documents

Demetrios Glinos, University of Central Florida

Keywords

passage retrieval; plagiarism detection; text alignment

Abstract

We present a novel general method for discovering similar passages within large text documents based on adapting and extending the well-known Smith-Waterman dynamic programming local sequence alignment algorithm. We extend that algorithm for large document analysis by defining: (a) a recursive procedure for discovering multiple non-overlapping aligned passages within a given document pair; (b) a matrix splicing method for processing long texts; (c) a chaining method for combining sequence strands; and (d) an inexact similarity measure for determining token matches. We show that an implementation of this method is computationally efficient and produces very high precision with good recall for several types of order-based plagiarism and that it achieves higher overall performance than the best reported methods against the PAN 2013 text alignment test corpus. © 2014 Springer International Publishing.

Publication Date

1-1-2014

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

8685 LNCS

Number of Pages

98-109

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1007/978-3-319-11382-1_10

Copyright Status

Unknown

Socpus ID

84906777083 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84906777083

STARS Citation

Glinos, Demetrios, "Discovering Similar Passages Within Large Text Documents" (2014). Scopus Export 2010-2014. 9206.
https://stars.library.ucf.edu/scopus2010/9206

This document is currently not available here.

COinS

Scopus Export 2010-2014

Discovering Similar Passages Within Large Text Documents

Keywords

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2010-2014

Discovering Similar Passages Within Large Text Documents

Creator

Keywords

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect