Multi-Modal Region Selection Approach for Training Object Detectors
Our purpose in this work is to boost the performance of object classifiers learned using the self-training paradigm. We exploit the multi-modal nature of tagged images found in social networks, to optimize the process of region selection when retraining the initial model. More specifically, the proposed approach uses a small number of manually labelled regions to train the initial object detection classifiers. Then, a large number of loosely tagged images, pre-segmented by an automatic segmentation algorithm, is used to enhance the initial training set with additional image regions. However, in contrast to the typical case of self-training where the image regions are selected based solely on how well they fit to the original classification model, our approach aims at optimizing this selection by making combined use of both visual and textual information. The experimental results show that the object detection classifiers generated using the proposed approach outperform the classifiers generated using the typical self-training paradigm.