Vasileios Mezaris
Electrical and Computer Engineer, Ph.D.
homepage curriculum vitae projects research demos downloads publications contact m r d b c p


Page contents:
- Latest research, from the multimedia analysis perspective: Video temporal decomposition to sub-shots, shots, scenes and regions of interest; Video concept detection and concept-based / cross-modal retrieval; Video summarization and thumbnail selection; Image/video quality, aesthetics and forensic assessment; Video event detection; Multimedia organization tools and applications
- Latest research, from the AI / machine learning perspective: Deep learning for image/video understanding; Learning with uncertainty; Subclass methods for dimensionality reduction and learning; Multi-label and multi-task learning; Cascades and other classifier combinations

Latest research, from the multimedia analysis perspective:
Video temporal decomposition to sub-shots, shots, scenes and regions of interest. We develop fast and accurate techniques for temporally fragmenting a video to its constituent parts (sub-shots, shots, scenes), often by combining a multitude of low- and high-level audio-visual cues. Our optimized software implementations run several times faster than real-time on a simple PC. We use our temporal decomposition methods as the basis for several applications, such as an on-line service for video fragmentation and reverse image search. We also develop techniques for identifying the spatio-temporal regions of interest within the video, to enable video smart-cropping for video re-purposing and re-use applications. [csvt11] [csvt12] [icassp14] [mm17b] [mmm18a] [mmm19d] [icip21]

Video concept detection and concept-based / cross-modal retrieval. We develop techniques for annotating video fragments with concept labels. For this, we work on and combine different video representations, new machine learning methods (such as methods coming from the domains of deep learning, multi-task learning and multi-label learning), new classifier combination methods (e.g. optimized cascades), and dimensionality reduction methods for dealing with very-high dimensional features. We examine the evaluation of concept detection methods in the somewhat different problems of concept-based video indexing and video annotation. We also develop methods for cross-modal / ad-hoc video search. [csvt14] [tetc15] [icip15a] [mmm16b] [icip16a] [mm16a] [mmm17] [icmr17a] [mm17a] [csvt19] [mmm20b] [icme20] [icmr20] [ism20] [iccvw21]

Video summarization and thumbnail selection. We develop deep learning architectures for automatic video summarization and thumbnail selection. We focus on architectures based on Generative Adversarial Networks (GANs), and we investigate the introduction of learning components such as compression layers and attention mechanisms, and stepwise training strategies. Our architectures are capable of learning in a fully unsupervised manner, i.e. without the need for ground-truth data such as human-generated video summaries. The resulting trained models can produce representative summaries for unseen videos, exhibiting state of the art performance. [mm19] [mmm20a] [mm20] [csvt21] [icmr21] [ism21a]; see also a survey: [pieee21]

Image/video quality, aesthetics and forensic assessment. We develop methods for the assessment of the quality and the aesthetics of visual content. These include features and classification methods for no-reference blur assessment in natural images; using basic rules of photography for aesthetics assessment of natural images; and, features and learning techniques (i.e., learning under uncertainty) for the aesthetic quality assessment of user-generated video. We also assembled and released annotated datasets for facilitating the evaluation of such methods. In addition, we develop methods for the forensic analysis of videos, to understand if a video has been manipulated. [icip14] [icip15b] [icip16b] [mmm19a] [mmm19b]

Video event detection. We develop methods for annotating video fragments with complex event labels. This involves employing and extending our concept-detection techniques, especially our top-performing subclass dimensionality reduction methods and their highly-optimized GPU implementations. We also develop techniques for zero-shot video annotation (associating textual resources with visual fragments, by text analysis and application of trained concept detectors on the video), video annotation using very few positive examples, and learning under uncertainty. [icmr14] [mm15a] [jivc15] [mmm16a] [mm16b] [icmr17b] [cvprw21]; see also a couple of surveys: [mtap14] [jivc16]

Multimedia organization tools and applications. We develop techniques and tools for the organization and re-use of multimedia content and for supporting novel applications such as digital preservation and forgetting. These include interactive tools for object re-detection, and automatic techniques for the clustering, summarization and event-based organization of media items, the temporal synchronization and the contextualization of photo collections, social event detection, human-brain-inspired photo selection methods, and video search engines. [icmr14] [icme15a] [icme15b] [icme15c] [mtap15] [mm15b] [mmm16c] [icmr17c] [icmr17d] [mm17b] [mmm18b] [MMmag18] [mmm19c] [mmm19e] [mmm19f] [mmm21] [imx21] [ism21b]

Latest research, from the AI / machine learning perspective:
Deep learning for image/video understanding. We develop new learning methods based on deep convolutional neural networks (DCNNs) and generative adversarial networks (GANs). These include, for instance, a new DCNN-based transfer learning method that uses multi-task learning in the target domain, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. Our methods also include fine-tuning strategies and layer extensions to optimize the fine-tuning of pre-trained deep neural networks, and the use of deep learning in matching visual and textual information, and in video summarization. [mm16a] [mmm17] [icmr17a] [icmr17b] [csvt19] [mmm19a] [mmm19b] [mm19] [mmm20a] [mmm20b] [icme20] [icmr20] [mm20] [ism20] [csvt21] [cvprw21] [icmr21] [ism21a]; see also a couple of surveys: [tomm17] [pieee21]

Learning with uncertainty. We develop linear and kernel-based methods for learning from uncertain data. These are a family of extensions of the Support Vector Machine classifier, which we call Support Vector Machine with Gaussian Sample Uncertainty. They treat input data as multi-dimensional distributions, rather than single points in a multi-dimensional input space. These distributions typically express out uncertainty about the measurements in the input space and their relation to the underlying noise-free data. [mmm16a] [icip16b] [fg17] [pami18]

Subclass methods for dimensionality reduction and learning. We develop very efficient subclass-based dimensionality reduction and learning methods, for dealing with high-dimensional data. These include a linear Subclass Support Vector Machine (SSVM) classifier, and extensions of Subclass Discriminant Analysis, such as the Mixture Subclass Discriminant Analysis (MSDA), Fractional Step MSDA (FSMSDA), Kernel MSDA (KMSDA) and Accelerated Generalised SDA (AGSDA). We also developed an optimized GPU software implementation of our AGSDA method. [spl11] [spl12] [tnnls13] [mm15a] [mm16b] [mm17a]

Multi-label and multi-task learning. Exploiting concept correlations is a promising way for boosting the performance of concept detection systems, aiming at concept-based video indexing or annotation. We develop multi-label learning methods, such as an improved method for employing stacked models that captures concept correlations in the last layer of the stack. Besides concept correlations, concept models for different concepts can be related at the feature representation or the task parameters level, i.e, the parameters of the binary classifiers learned from the training data. Motivated by this, we also develop multi-task learning methods that exploit task relations. [mmm14] [tetc15] [icip16a]

Cascades and other classifier combinations. In image/video annotation problems, we usually have lots of features that we could use, and combining features based on deep convolutional neural networks with other visual descriptors can significantly boost performace. We develop algorithms for efficiently combining multiple features and detectors, such as a subclass recoding error-correcting outputs (SRECOC) method for learning and combining subclass detectors, and a cascade-based algorithm that dynamically selects, orders and combines many base classifiers that are trained independently using different features. [icme13] [icip15a] [mmm16b]

© 2015-2022 Vasileios Mezaris