el   en  

Embedding Big Data in Graph Convolutional Networks

Deep learning architectures and Convolutional Neural Networks (CNNs) have made a significant impact in learning embeddings of high-dimensional datasets. In some cases, and especially in the case of high-dimensional graph data, the interlinkage of data points may be hard to model. Previous approaches in applying the convolution function on graphs, namely the Graph Convolutional Networks (GCNs), presented neural networks architectures that encode information of individual nodes along with their connectivity. Nonetheless, these methods face the same issues as in traditional graph-based machine learning techniques i.e. the requirement of full matrix computations. This requirement bounds the applicability of the GCNs on the available computational resources. In this paper, the following assumption is evaluated: the training of a GCN with multiple subsets of the full data matrix is possible and converges to the full data matrix training scores, thus lifting the aforementioned limitation. Following this outcome, different subset selection methodologies are also examined to evaluate the impact of the learning curriculum in the performance of the trained model in small as well as very large scale graph datasets. Index Terms—Graph Convolutional Networks, Big Data, Large scale graphs, Node embeddings, Semi-supervised classification