Vasileios Mezaris
Electrical and Computer Engineer, Ph.D.
homepage curriculum vitae projects research demos downloads publications contact j b c p

Page contents:
- Latest image/video analysis research: Video temporal decomposition to shots and scenes; Video concept detection; Video event detection and recounting; Image/video quality and aesthetics assessment; Image/video organization tools and applications
- Latest machine learning research: Deep learning for image/video understanding; Learning with uncertainty; Subclass methods for dimensionality reduction and learning; Multi-label and multi-task learning; Cascades and other classifier combinations
- Other research: Some older research directions

LATEST RESEARCH, from the image/video analysis perspective:
Video temporal decomposition to shots and scenes. We develop fast and accurate techniques for temporally fragmenting a video to its constituent shots (i.e., sequences of consecutive frames taken without interruption by a single camera), and for subsequently grouping the shots into scenes (i.e., sequences of consecutive shots that form the basic story-telling unit of the video) by combining a multitude of low- and high-level audio-visual cues. Our optimized software implementation of shot and scene segmentation runs several times faster than real-time on a simple PC. [csvt11] [csvt12] [icassp14]

Video concept detection. We develop techniques for annotating video fragments with concept labels. For this, we work on and combine different video representations (e.g., video tomographs), new machine learning methods (such as methods coming from the domains of deep learning, multi-task learning and multi-label learning), new classifier combination methods (e.g. optimized cascades), and dimensionality reduction methods for dealing with very-high dimensional features. We also examine the evaluation of concept detection methods in the somewhat different problems of concept-based video indexing and video annotation. [csvt14] [tetc15] [icip15a] [mmm16b] [icip16a] [mm16a] [mmm17]

Video event detection and recounting. We develop methods for annotating video fragments with complex event labels. This involves employing and extending our concept-detection techniques, especially our top-performing subclass dimensionality reduction methods and their highly-optimized GPU implementations. We also develop techniques for zero-shot video annotation (associating textual resources with visual fragments, by text analysis and application of trained concept detectors on the video), video annotation using very few positive examples, and learning under uncertainty. [icmr14] [mm15a] [jivc15] [mmm16a] [mm16b]; see also a couple of surveys: [mtap14] [jivc16]

Image/video quality and aesthetics assessment. We develop methods for the assessment of the quality and the aesthetics of visual content. These include features and classification methods for no-reference blur assessment in natural images; using basic rules of photography for aesthetics assessment of natural images; and, features and learning techniques (i.e., learning under uncertainty) for the aesthetic quality assessment of user-generated video. We also assembled and released annotated datasets for facilitating the evaluation of such methods. [icip14] [icip15b] [icip16b]

Image/video organization tools and applications. We develop techniques and tools for the organization and re-use of multimedia content and for supporting novel applications such as digital preservation and forgetting. These include interactive tools for object re-detection, and automatic techniques for the clustering, summarization and event-based organization of media items, the temporal synchronization and the contextualization of photo collections, social event detection, human-brain-inspired photo selection methods, and video search engines. [icmr14] [icme15a] [icme15b] [icme15c] [mtap15] [mm15b] [mmm16c]

LATEST RESEARCH, from the machine learning perspective:
Deep learning for image/video understanding. We develop new learning methods based on deep convolutional neural networks (DCNNs). These include, for instance, a new DCNN-based transfer learning method that uses multi-task learning in the target domain, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. Our methods also include fine-tuning strategies and layer extensions to optimize the fine-tuning of pre-trained deep neural networks. [mm16a] [mmm17]

Learning with uncertainty. We develop linear and kernel-based methods for learning from uncertain data. These are a family of extensions of the Support Vector Machine classifier, which we call Support Vector Machine with Gaussian Sample Uncertainty. They treat input data as multi-dimensional distributions, rather than single points in a multi-dimensional input space. These distributions typically express out uncertainty about the measurements in the input space and their relation to the underlying noise-free data. [mmm16a] [icip16b]

Subclass methods for dimensionality reduction and learning. We develop very efficient subclass-based dimensionality reduction and learning methods, for dealing with high-dimensional data. These include a linear Subclass Support Vector Machine (SSVM) classifier, and extensions of Subclass Discriminant Analysis, such as the Mixture Subclass Discriminant Analysis (MSDA), Fractional Step MSDA (FSMSDA), Kernel MSDA (KMSDA) and Accelerated Generalised SDA (AGSDA). We also developed an optimized GPU software implementation of our AGSDA method. [spl11] [spl12] [tnnls13] [mm15a] [mm16b]

Multi-label and multi-task learning. Exploiting concept correlations is a promising way for boosting the performance of concept detection systems, aiming at concept-based video indexing or annotation. We develop multi-label learning methods, such as an improved method for employing stacked models that captures concept correlations in the last layer of the stack. Besides concept correlations, concept models for different concepts can be related at the feature representation or the task parameters level, i.e, the parameters of the binary classifiers learned from the training data. Motivated by this, we also develop multi-task learning methods that exploit task relations. [mmm14] [tetc15] [icip16a]

Cascades and other classifier combinations. In image/video annotation problems, we usually have lots of features that we could use, and combining features based on deep convolutional neural networks with other visual descriptors can significantly boost performace. We develop algorithms for efficiently combining multiple features and detectors, such as a subclass recoding error-correcting outputs (SRECOC) method for learning and combining subclass detectors, and a cascade-based algorithm that dynamically selects, orders and combines many base classifiers that are trained independently using different features. [icme13] [icip15a] [mmm16b]

OTHER RESEARCH; some older research directions:
Spatial context for semantic image analysis. We develop and evaluate different approaches to utilizing object-level spatial contextual information (i.e., fuzzy directional relations between image regions) for semantic image analysis. Techniques such as a Genetic Algorithm (GA), a Binary Integer Programming (BIP) and an Energy-Based Model (EBM), are introduced in order to estimate an optimal semantic image interpretation, after classification results are computed using solely visual features. The advantages of each technique are theoretically and experimentally investigated. Evaluations are carried out on six datasets of varying problem complexity. [cviu11]

Event-based indexing of multimedia. We develop a joint content-event model for the automatic indexing of multimedia content with events. This model treats events as first class entities and provides a referencing mechanism for automatically linking event elements, represented using the event part of the model, with content segments, described using the content part of the model. The referencing mechanism uses a large number of trained visual concept detectors to describe content segments with model vectors, and the subclass discriminant analysis algorithm to derive a discriminant subspace of this concept space, facilitating the indexing of content segments with event elements. [eimm10] [cbmi11]

Local Invariant Feature Tracks for concept detection in video. We extract tracks of interest points (a.k.a. "key-point trajectories" or "feature trajectories") throughout the shot and encode each of them using a Local Invariant Feature Track (LIFT) descriptor. This jointly captures the appearance of the local image region and its long-term trajectory. We use the LIFT descriptors of each shot to generate a "Bag-of-Spatiotemporal-Words" model for it, which describes the shot using a vocabulary of "similar in appearance and similarly moving" local regions. [icip10] [watch a track formation example (animated gif, ~8MB)]

Statistical motion processing for video semantic classification. We use the kurtosis of the optical flow motion estimates for identifying which motion values originate from true motion rather than measurement noise. In this way activity areas are detected, and the motion energy distribution within each activity area is subsequently approximated with a low-degree polynomial function that compactly represents the most important characteristics of motion. This, together with a complementary set of features that highlight particular spatial attributes of the motion signal, are used as input to Hidden Markov Models (HMMs) that associate each shot with one semantic class. [csvt09]

Medical image analysis. We developed fully automated techniques for the detection of the lumen and media-adventitia borders in Intravascular Ultrasound (IVUS) images. Intensity information, as well as the result of texture analysis, generated by means of a multilevel Discrete Wavelet Frames decomposition, are used in two different techniques for contour initialization. For subsequently producing smooth contours, three techniques based on low-pass filtering and Radial Basis Functions are introduced. [ubm08]

Ontology-based video analysis and retrieval. We developed a methodology based on real-time video object segmentation, object- and shot-ontologies, as well as a relevance feedback mechanism, enabling object-based retrieval in video collections. We also developed a knowledge-assisted semantic video object detection framework, in which semantic concepts of a domain are defined in an ontology; are then enriched with qualitative attributes, low-level features, spatial relations, and multimedia processing methods; and finally rules in F-logic describe how the multimedia analysis tools should be applied for detecting the ontology concepts. [csvt04a] [csvt05] [mtap06]

Spatiotemporal video segmentation. We developed fully automated methods for the segmentation of video to differently moving objects, including a method for the spatiotemporal segmentation of raw video data by exploiting the long-term trajectories of differently moving homogeneous regions, and a method for the real-time spatiotemporal segmentation of MPEG-2 compressed video by exploiting the MPEG-2 motion vectors of I- and P-frames as well as the DC coefficients of DCT-encoded blocks of I-frames. [csvt04a] [csvt04b]

Spatial image segmentation. We developed a fully automated method for the fast segmentation of images to spatial regions, for use in object-based multimedia applications such as ontology-based indexing and retrieval. The algorithm operates in the combined intensity–texture–position feature space, in order to produce connected regions that correspond to the depicted real-life objects. A fast multi-layer processing scheme restricts the iterative part of the segmentation procedure to a low-resolution version of the image, and then derives full-resolution segmentation masks by partial pixel re-classification (watch animation for images zebra, castle). [ijprai04] [jasp04]

© 2015 Vasileios Mezaris