Vasileios Mezaris
Electrical and Computer Engineer, Ph.D.
homepage curriculum vitae projects research demos downloads publications contact m r d b c p


AC-SUM-GAN for Unsupervised Video Summarization. This is an implementation of our latest video summarization method, presented in our paper "AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization", IEEE Trans. on Circuits and Systems for Video Technology (IEEE TCSVT), 2020 (early accesss). This is, to date, our most complete and best-performing method for video summarization.
- Related publications [csvt20]
- Software package [download source code]

Structured Pruning of LSTMs. We provide the code for our paper "Structured Pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications", Proc. 22nd IEEE Int. Symposium on Multimedia (ISM), Dec. 2020. This code can be used for generating more compact LSTMs, which is very useful for mobile multimedia applications and deep learning applications in other resource-constrained environments.
- Related publications [ism20]
- Software package [download source code]

Video Summarization Evaluation: Performance over Random. We provide an implementation of our video summarization evaluation method presented in our publication "Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods", Proc. 28th ACM Int. Conf. on Multimedia (ACM MM '20). This software can be used for evaluating automatically-generated video summaries using the Performance over Random (PoR) evaluation protocol.
- Related publications [acmmm2020]
- Software package [download source code]

Dual Encoding Attention Network for ad-hoc Video Search. We provide an implementation of our extended dual encoding network for ad-hoc video search, presented at ACM ICMR 2020. This network makes use of more than one encodings of the visual and textual content, as well as two different attention mechanisms.
- Related publications [icmr2020]
- Software package [download source code]

Fractional Step Discriminant Pruning for DCNNs. This is an implementation of our filter pruning framework for DCNNs, presented at the IEEE ICME 2020 Mobile Multimedia Computing Workshop. This framework compresses noisy or less discriminant filters in small fractional steps, utilizing a class-separability criterion and an asymptotic schedule for the pruning rate and scaling factor, so that the selected filters' weights are gradually reduced to zero.
- Related publications [icme2020]
- Software package [download source code]

SUM-GAN-AAE for Unsupervised Video Summarization. We provide an implementation of the SUM-GAN-AAE deep learning architecture for automatic video summarization. This extends our previous SUM-GAN-sl architecture with an Attention Autoencoder, enabling the improved training of the overall model. Similarly to SUM-GAN-sl, this is an unsupervised learning method that is capable, after training, of producing representative summaries for unseen videos.
- Related publications [mmm2020]
- Software package [download source code]

Subclass deep neural networks. This is an extension of deep convolutional neural networks for classification, which selectively introduces subclasses in the training of the model. Specifically, we provide implementations of the VGG16 and Wide ResNet architectures that include a new criterion for identifying the so-called "neglected" classes during the training of the network, and a novel cost function that extends the cross-entropy loss using subclass partitions for boosting the generalization performance of the neglected classes.
- Related publications [mmm2020]
- Software package [download source code]

SUM-GAN-sl for Unsupervised Video Summarization. We provide an implementation of the SUM-GAN-sl deep learning architecture for automatic video summarization. Training is performed in a fully unsupervised manner without the need for ground-truth data (such as human-generated video summaries). After being trained, the SUM-GAN-sl model is capable of producing representative summaries for unseen videos, according to a user-specified time-budget about the summary duration.
- Related publications [acmmm19]
- Software package [download source code]

Fully convolutional deep networks in Keras. In this repository we provide an implementation of fully convolutional networks in Keras for the VGG16, VGG19, InceptionV3, Xception and MobileNetV2 models, for use in various image/keyframe annotation or classification tasks. We developed and used these deep networks in the context of assessing the aesthetic quality of images.
- Related publications [mmm19a]
- Software package [download source code]

DCNN for Multi-Label Video/Image Annotation (FVMTL-CCELC). This is a DCNN architecture for video/image concept annotation that exploits concept relations at two different levels: i) implicit relations, by learning concept-specific representations that are sparse, linear combinations of representations of latent concepts, and ii) explicit relations, by introducing a new cost term that explicitly models the correlations between concepts. The complete DCNN architecture can be trained end-to-end with standard back-propagation.
- Related publications [csvt19]
- Software package [download source code]

Support Vector Machine with Gaussian Sample Uncertainty (SVM-GSU). This is a novel maximum margin classifier that deals with uncertainty in data input; i.e., it allows each training example to be modeled as a multi-dimensional Gaussian distribution described by its mean vector and covariance matrix (the latter modeling the uncertainty), and defines a cost function that exploits the covariance information. Experimental results verify the effectiveness of this approach in various learning problems.
- Related publications [pami18]
- Software package [download source code]

Incremental Accelerated Kernel Discriminant Analysis. This is a novel incremental dimensionality reduction (DR) technique, that offers excellent numerical stability and is specifically designed for use in incremental learning problems. Coupled with a linear SVM classifier, it offers state-of-the-art classification accuracy and an impressive training time speedup over batch AKDA (which is the current state-of-the-art) and also over traditional LSVM and kernel SVM (KSVM) methods.
- Related publications [mm17a]
- Software package [download software (~40MB zip file)]

InVID Verification Plugin. This web-browser plugin is developed as part of the InVID EU project, to help journalists verify videos on social networks. It allows to quickly fragment videos from various platforms (Facebook, Instagram, YouTube, Twitter, Daily Motion) and extract keyframes, to perform reverse image search on Google, Baidu or Yandex search engines, to collect contextual information for Facebook and YouTube videos, to enhance and explore keyframes through a magnifying lens, to perform advanced queries in Twitter, and to apply forensic filters on still images.
- Related publications [mm17b]
- Software package [download software]

Accelerated Kernel Subclass Discriminant Analysis and SVM combination: An efficient dimensionality reduction and classification method, for very high-dimensional data. AKSDA is a new GPU-accelerated, state-of-the-art C++ library for supervised dimensionality reduction and classification, using multiple kernels. It greatly reduces the dimensionality of the input data, while at the same time it increases their linear separability. Used in conjunction with linear SVMs, it achieves SoA classification results, consistently higher than Kernel SVM approaches, at orders-of-magnitude shorter training times. AKSDA builds on our previous MSDA/GSDA/KMSDA methods.
- Related publications [mm15] [mm16]
(see also [tnnls13], [spl11] for more theoretical foundations)
- Software package [download software]

Image aesthetic quality assessment tools. This is a Matlab implementation of the feature extraction process for our Image Aesthetic Quality assessment method. Each image is represented according to a set of photographic rules, and five feature vectors are extracted, describing the image's simplicity, colorfulness, sharpness, pattern and composition.
- Related publications [icip15]
- Software package [download software]

Real-time video shot and scene segmentation. Software for the automatic temporal segmentation of videos into shots and scenes. The released software can detect both abrupt and gradual shot transitions with high accuracy, by jointly examining global and local image descriptors, and then also group the shots into scenes. The whole video analysis process is more than seven times faster than real-time-processing on an i7 PC, for the CPU version of the software. A GPU version is also available.
- Related publications [csvt11] [icassp14]
- Software package (v1.4.4, updated 10/4/2017) [download software]

Mixture Subclass Discriminant Analysis (MSDA) & Generalized Subclass Discriminant Analysis / Kernel Mixture Subclass Discriminant Analysis (GSDA/KMSDA) software. MSDA is a dimensionality reduction method that alleviates two shortcomings of Subclass Discriminant Analysis (SDA). In short, MSDA modifies the objective function of SDA, and utilizes a partitioning procedure to help with the discrimination of data with Gaussian homoscedastic subclass structure. KMSDA is the kernel-extension of MSDA; GSDA is a speeded-up version of KMSDA. We provide, for non-commercial use, an MSDA and a GSDA/KMSDA implementation in Matlab code.
- Related publications [tnnls13] [spl11]
- MSDA software package [download source code]
- GSDA/KMSDA software package [download source code]

GPU-accelerated LIBSVM. We have developed an open source package for GPU-assisted Support Vector Machines (SVMs) training, based on the LIBSVM package. Kernel SVMs have gained wide acceptance in many fields of science, due to their accuracy. However, depending on the amount and nature of the training data, as well as on the cross-valitation approach that is followed for optimizing the SVM parameters, the training of SVMs in practise can often become prohibitively slow. The modifications we implemented to the well-known LIBSVM package revolve around porting the computation of the kernel matrix elements to the GPU, so as to significantly decrease the processing time for SVM training without altering the classification results compared to the original LIBSVM.
- Related publication [wiamis11]
- GPU-LIBSVM package (updated Oct. 2013) [download source code]
- Short video (demo) [watch video]

Scene segmentation evaluation utility. We have developed a unidimensional measure for evaluating the goodness of video temporal segmentation to scenes. The developed measure is called Differential Edit Distance (DED). DED satisfies the metric properties, and is shown to be fast and effective in evaluating the results of scene segmentation methods and also in helping to optimize such methods' parameters.
- Related publication [csvt12]
- Scene segmentation evaluation utility [download software]

© 2015-2021 Vasileios Mezaris