High-Performance Graph Algorithms Using Linear Algebra

January 15, 2020

On January 14, 2020, the DNDS Seminar Series hosted a talk by Gábor Szárnyas, research fellow of the Hungarian Academy of Sciences and the Budapest University of Technology and Economics. Gábor works with graph theory and graph processing techniques, and he talked about a recent development in this area: the GraphBLAS approach. 

One of the challenges we commonly face in network science is processing limitation. For graphs or networks, even simple algorithms on small data sets may require a very long time to run. As an example, Gábor mentioned the computation of the clustering coefficient on the Paradise papers graph, with 2 million nodes and 3 million edges, which took him several days to complete. Contrary to other approaches, such as machine learning and relational query, graph processing cannot be significantly helped by improved CPU capacity, because the issue is related to the intrinsic structure of the data. Basically, computers are made for processing linear or hierarchical data structures, but they do not perform so well on random or graph structures. 

In this context, the GraphBLAS strategy emerged as a simple, elegant solution. It started with the BLAS framework, dating back to 1974, which dealt with dense matrices. Building on that, GraphBLAS encodes graphs as sparse adjacency matrices, and then uses matrix operations to express graph algorithms (see Fig. 1 for a simple example). Put differently, graph algorithms are decomposed into linear algebra operations, using semirings – and this improves processing performance tremendously. Although a rich literature has developed since the 1970s, proposing the use of linear algebra algorithms for graphs, there are very few practical implementations so far, and little consensus on how to translate graph algorithms into the language of linear algebra. This is the niche GraphBLAS has come to fill. 

Figure 1. Graph traversal can be coded as matrix multiplication. Starting from node 7, marked with 1 in the vector v, multiplication by the adjacency matrix A returns what nodes we are able to reach after one step (nodes 3, 4, and 5). Multiplying this result again by matrix A shows us what nodes we can reach with 2 steps, and in how many ways (from node 7 we can reach node 6 in 2 steps, in 2 different ways: via node 3 or via node 5).

The theoretical foundation of GraphBLAS is very simple. First, matrix multiplication is used to express graph traversals. Then, for other graph problems, a generalized formula is used, which accommodates arbitrary operations. For this generalization, arbitrary semirings are used. That is, the conventional operations of addition and multiplication that compose matrix multiplication are swapped around and/or replaced by different operations, such as min or max. Gábor illustrated the approach step by step, using a small graph and two examples of graph algorithms: single source shortest path, and local clustering coefficient. 

Some of the implemented graph algorithms, in increasing order of arithmetic intensity, are: traversal (BFS) and subgraph matching, centrality metrics (PageRank and betweenness), graph clustering, and shortest paths. For those interested in diving in, Gábor has put together a great list of resources on his Github page here. He calls attention to the need of understanding semirings, linear algebra computations and how they are related to graph operations, as well and some knowledge of C programming. However, there is quite thorough documentation already available, as well as a python wrapper.

Linear algebra is powerful in abstraction. We can express many graph algorithms in concise formulations using linear algebra, resulting in excellent processing performance. This is an active and promising avenue of research, and, moving forward, we can expect exciting new implementations of graph algorithms into GraphBLAS.

Blog post by Juliana Pereira