Deep cross-layer activation features for visual recognition
Convolutional Neural Networks (CNNs), which have nowadays dominated image analysis tasks, constitute feed-forward methods that model increasingly complex data structures and patterns along the subsequent hidden layers of the network. However, the common practice of using the activation features from the last network layer inevitably leads to a visual recognition bottleneck. This is due to the fact that discriminative features for different objects of varying complexity do not need to be extracted from the same layer. To this end, a novel frequency domain analysis of the feature maps of the same as well as of different network layers is proposed. In this way, the proposed method exploits more efficiently the knowledge that is stored in the actual CNN and facilitates in identifying the most discriminative features for every individual object type. Experimental results in a large-scale real-world Closed-Circuit Television (CCTV) surveillance and the PASCAL VOC 2012 datasets demonstrate the efficiency of the proposed approach.