MSIDX: Multi-Sort Indexing for Efficient Content-based Image Search and Retrieval
In this paper, a novel approximate indexing scheme for efficient content-based image search and retrieval is presented, called Multi-Sort Indexing (MSIDX). The proposed scheme analyzes high dimensional image descriptor vectors, by employing the value cardinalities of their dimensions. The dimensions’ value cardinalities, an inherent characteristic of descriptor vectors, are the number of discrete values in the dimensions. As expected, value cardinalities significantly vary, due to the existence of several extraction methods. Moreover, different quantization and normalization techniques used in the extraction process, have a strong impact on the dimensions’ value cardinalities. Since dimensions with high value cardinalities have more discriminative power, a multiple sort algorithm is used to reorder the descriptors’ dimensions according to their value cardinalities, in order to increase the probability of two similar images to lie within a close constant range. The expected bounds of the constant range are defined in detail, following deterministic and probabilistic analyses. The proposed scheme is fully suitable (a) for real-time indexing of images, and (b) for searching and retrieving relevant images with an efficient query processing algorithm. In our experiments with five real datasets, we show the superiority of the proposed approach against hashing methods, also suitable for approximate similarity search.