Convolutional Nonlinear Differential Recurrent Neural Networks For Crowd Scene Understanding
Keywords
CNDRNN; CNN; crowd scene; nonlinear dRNN
Abstract
With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be "deep in time." Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves "deep in space and time." Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.
Publication Date
12-1-2018
Publication Title
International Journal of Semantic Computing
Volume
12
Issue
4
Number of Pages
481-500
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1142/S1793351X18400196
Copyright Status
Unknown
Socpus ID
85058818779 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85058818779
STARS Citation
Zhuang, Naifan; Kieu, The Duc; Ye, Jun; and Hua, Kien A., "Convolutional Nonlinear Differential Recurrent Neural Networks For Crowd Scene Understanding" (2018). Scopus Export 2015-2019. 9822.
https://stars.library.ucf.edu/scopus2015/9822