A Context-Driven Extractive Framework For Generating Realistic Image Descriptions
Keywords
context discovery; heterogeneous information fusion; image semantics; Textual image description
Abstract
Automatic image annotation methods are extremely beneficial for image search, retrieval, and organization systems. The lack of strict correlation between semantic concepts and visual features, referred to as the semantic gap, is a huge challenge for annotation systems. In this paper, we propose an image annotation model that incorporates contextual cues collected from sources both intrinsic and extrinsic to images, to bridge the semantic gap. The main focus of this paper is a large real-world data set of news images that we collected. Unlike standard image annotation benchmark data sets, our data set does not require human annotators to generate artificial ground truth descriptions after data collection, since our images already include contextually meaningful and real-world captions written by journalists. We thoroughly study the nature of image descriptions in this real-world data set. News image captions describe both visual contents and the contexts of images. Auxiliary information sources are also available with such images in the form of news article and metadata (e.g., keywords and categories). The proposed framework extracts contextual-cues from available sources of different data modalities and transforms them into a common representation space, i.e., the probability space. Predicted annotations are later transformed into sentence-like captions through an extractive framework applied over news articles. Our context-driven framework outperforms the state of the art on the collected data set of approximately 20 000 items, as well as on a previously available smaller news images data set.
Publication Date
2-1-2017
Publication Title
IEEE Transactions on Image Processing
Volume
26
Issue
2
Number of Pages
619-632
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/TIP.2016.2628585
Copyright Status
Unknown
Socpus ID
85012924131 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85012924131
STARS Citation
Tariq, Amara and Foroosh, Hassan, "A Context-Driven Extractive Framework For Generating Realistic Image Descriptions" (2017). Scopus Export 2015-2019. 6072.
https://stars.library.ucf.edu/scopus2015/6072