Convolutional Dlstm For Crowd Scene Understanding
Keywords
CNN; Crowd Scene; end-to-end; LSTM
Abstract
With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, Convolutional DLSTM (ConvDLSTM), for crowd scene understanding. ConvDLSTM consists of GoogleNet Inception V3 convolutional neural networks (CNN) and stacked differential long short-term memory (DLSTM) networks. Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, ConvDLSTM utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of the semantic representation of CNN and the memory states of LSTM, ConvDLSTM can effectively analyze both the crowd scene and motion information. Existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be 'deep in time'. ConvDLSTM, however, models the spatial and temporal information in a unified architecture and achieves 'deep in space and time'. Extensive performance studies on the Violent-Flows and CUHK Crowd datasets show that the proposed technique significantly outperforms state-of-the-art methods.
Publication Date
12-28-2017
Publication Title
Proceedings - 2017 IEEE International Symposium on Multimedia, ISM 2017
Volume
2017-January
Number of Pages
61-68
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/ISM.2017.19
Copyright Status
Unknown
Socpus ID
85045876703 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85045876703
STARS Citation
Zhuang, Naifan; Ye, Jun; and Hua, Kien A., "Convolutional Dlstm For Crowd Scene Understanding" (2017). Scopus Export 2015-2019. 7474.
https://stars.library.ucf.edu/scopus2015/7474