Title

A Hybrid Architecture For Plagiarism Detection: Notebook For Pan At Clef 2014

Abstract

We present a hybrid plagiarism detection architecture that operates on the two principal forms of text plagiarism. For order-preserving plagiarism, such as paraphrasing and modified cut-and-paste, it contains a text alignment component that is robust against word choice and phrasing changes that do not alter the basic ordering. And for non-order based plagiarism, such as random phrase reordering and summarization, it contains a two-stage cluster detection component. The first stage identifies a maximal passage in the suspect document that is related to the source document, while the second stage determines whether the suspect passage corresponds to the entire source document or just to a passage within it. Three implementations of this architecture, involving a common text alignment component and three different cluster detection components, participated in the PAN 2014 Text Alignment task and performed very well, achieving very high precision, recall, and overall plagiarism detection scores.

Publication Date

1-1-2014

Publication Title

CEUR Workshop Proceedings

Volume

1180

Number of Pages

958-965

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Socpus ID

84981275126 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84981275126

This document is currently not available here.

Share

COinS