Ranked Document Retrieval For Multiple Patterns
Keywords
Compressed suffix array; Succinct encoding; Suffix array; Suffix tree; Weighted ancestor query
Abstract
Let D={T1,T2,…,TD} be a collection of D documents having n characters in total. Given two patterns P and Q, and an integer k>0, we consider the following queries. • top-k forbidden pattern query: Among all documents containing P, but not Q, report the k documents most relevant to P.• top-k two pattern query: Among all documents that contain both P and Q, report the k documents most relevant to P.For the above two queries, we provide a linear space index with O(|P|+|Q|+nk) query time, under some standard relevance functions such as PageRank and TermFrequency. The document listing version of the above two problems asks to report all t documents that either contain P, but not Q, or contain both P and Q, depending on the query type. As a corollary of the top-k result, we obtain a linear space and O(|P|+|Q|+nt) query time solution for the document listing problems. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. Finally, we present space-efficient indexes for these problems.
Publication Date
10-25-2018
Publication Title
Theoretical Computer Science
Volume
746
Number of Pages
98-111
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.tcs.2018.06.029
Copyright Status
Unknown
Socpus ID
85049522666 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85049522666
STARS Citation
Biswas, Sudip; Ganguly, Arnab; Shah, Rahul; and Thankachan, Sharma V., "Ranked Document Retrieval For Multiple Patterns" (2018). Scopus Export 2015-2019. 10460.
https://stars.library.ucf.edu/scopus2015/10460