Network Centrality

Finding the most important pieces of information in a mass of data has always been a key task in any job. In network science the tools used to do this are known as "centrality measures". We look at the relationship between two of the oldest and most important examples, degree and closeness.
Published in Physics
Network Centrality
Like

Networks (for example see Coscia 2021) are a powerful way to organise the large data sets faced by modern researchers in many fields, from physics through biology to social sciences.  In their simplest form, the objects of interest are represented by nodes of a network, perhaps proteins in biology or people in social science. The edges of a network then represent relationships between pairs of nodes.

Zachary Karate Club Network
Zachary's Karate club network. The size of the node is proportional to the degree while the darker the colour of the node, the higher the closeness.  The shapes indicate the two clusters in the network.

One of the most common questions we ask about our data is what are the most important pieces of information?  In a network representation, this corresponds to finding the most important nodes. Not surprisingly, this was one of the earliest problems tackled in network science and the importance of nodes is measured by centrality measures mentioned above, developed in social network analysis since the 1950's. They are known as centrality measures because of the analogy with a city where most important institutions are found at the geographical centre of a city to enable the easiest access for the largest number of people. 

Finding the most important nodes is such a critical job that this problem has been studied many times and this has produced hundreds (if not thousands) of centrality measures. For instance, in 1998 the founders of Google developed their own centrality measure, PageRank (Brin & Page 1998), to rank web pages. All these centrality measures encode different interpretations of what is the "importance" of a node. There are so many centrality measures in the literature that there must be a large redundancy between them. David Schoch produced a periodic table of centrality measures which is a nice visualisation of this issue. To be more quantitative, one can measure the correlations between different values for centrality measures for the same node. These studies show that there are clear correlations between these measures but no clear pattern has emerged. As Valente et al. (2008) put it, “The amount of correlation between degree, betweenness, closeness, and eigenvector indicates that these measures are distinct, yet conceptually related”. This leaves researchers uncertain what to do when choosing a centrality measure for their problem.

Analytical Insights

In my recent Communications Physics paper with Bingsheng Chen "Linking the Network Centrality Measures Closeness and Degree" (Evans & Chen 2022) we looked at two of the oldest and most popular centrality measures: degree and closeness. The degree kv of a node v is simply the number of nearest neighbours, the number of nodes connected by a single edge to the node v. Closeness (Bavelas, 1950) is defined in terms of the length of the shortest path between two nodes. A path in a network is just a sequence of nodes where each node is connected by an edge to the previous node in the sequence. The length is the number of edges traversed as you move along the path. You can then calculate the average distance from a node v to all other nodes. The smaller this average value of distance is, the closer node v is to the other nodes in a network.  So a low value of this average indicates which nodes are the most central and, using the analogy of a city mentioned above, we might define these to be most important. Centrality measures are defined so that the larger the value of a  centrality measure, the more important a node is. So the centrality measure closeness cv for a node v is defined to be the inverse of the average distance to all other nodes.

Illustration of shortest path and degree
Example of degree, closeness and a shortest path in a small graph.  The degree of the red square node is three as it is connected to three other nodes. The distance (length of the shortest path) from the red square node is given next to each node. So the closeness of the red square node is 5/(1+1+1+2+2) = 5/7. One of the two shortest paths from the red square node to the red circular node (top left) is shown by the green edges. 

What Bingsheng and I suggested (Evans & Chen, 2022) is that you can do better for degree and closeness than the qualitative statement of Valente et al. (2008) that they are “conceptually related”.

In our work, we picture the shortest paths used to calculate closeness as a tree which is a network with no loops and where there is only one path between any two nodes. In our case, our shortest-path tree for node v has every node from the original network, the edges in the tree are a subset of the edges in the original network, and the path in the tree from the root node v to any other node is always a shortest path in the full network. Such trees can always be defined for each node though they are not in general unique. Shortest-path trees are easily created using standard breadth-first search algorithms.

Our key algebraic assumption is that the branches of these trees will look statistically similar for most networks. The reason for this is that the number of nodes at distance L from the root grows exponentially in most networks.  This is the reason why we see the phenomena of six degrees of separation in most networks. That is, even in very large networks, the length of the shortest path from any node to any other node is usually small, typically of order log(N) (where N is the number of nodes in the network) rather than N1/D, you would expect if the network is in a D-dimensional Euclidean space (e.g. if our network was a regular cubic lattice so D=3). 

Example of a Shortest Path Tree
One of the two possible shortest-path trees rooted on the square red node.  The edges in the tree are shown in green.  The two black dashed edges were in the original graph but are not part of the tree.  

We used the simplest model for the branches of the tree. Namely, we assume that on average the degree of each node in the tree is z. That is one edge links every node to the parent node that is one step closer to the root node v, while the remaining (z-1) nodes are one step further away. Given that this means the number of nodes which are L steps away from the root is growing in proportion to (z-1)L i.e. exponentially fast, we have to stop the growth somewhere and we do so by demanding that the total number of nodes in the tree is N, the number of nodes in the original network Since we know that the distance in the tree from any node to the root node is along a shortest path in the full network, we can use the shortest path tree to calculate the average distance from v to all other nodes and hence its inverse, the closeness centrality measure cv for node v.

So far the degree of node v has not appeared in our calculation.  However, the first step of the shortest path tree must be to the nearest neighbours. So the shortest path tree from node v starts with kv branches and after that we assume these branches are all statistically equivalent and the degree in the rest of the tree is on average z

Testing on Artificial Networks

Our simple analytical analysis suggests that for each node in a typical complex network, the inverse of the closeness is related linearly to the degree of that node. The only reason we can do these calculations is that we have made some sweeping generalisations in order to reduce our networks to simple trees which represent branching processes used widely in physics and science in general. Given the simple pictured used here, it would not be unreasonable to assume that even if our relationship between degree and closeness works for some simple numerical network models, the relationship will fail badly on most real-world networks.  And yet, if there is one thing our experience as physicists has taught us, it is that there are situations where we can ignore the messy microscopic details and still arrive at results that are broadly true on large scales across many different systems.  The derivation of the ideal gas law using ideas from statistical physics would be a classic example. So how good is our result on real examples?

Results for closeness and degree in three artificial networks.
Results showing degree on a log scale against the inverse of closeness for three artificial networks: Erdos-Reyni, Barabasi-Albert and the configuration model version of the same BA networks. Each model is used to produce three networks with average degree 10.0: one with 4000 nodes (top blue data), one with 2000 nodes (middle yellow points) and one with 1000 nodes (red bottom points). Data is log binned. The dashed line is the best linear fit to the data to each network.

The answer turns out that our relationship works far better than we thought it would do. On three simple artificial models for networks, we get fits with reduced chi-square values which are between 1.0 and 1.3. The good fit is clear visually in the figure above.

Testing on Real Data

The results for networks produced from artificial models are great but these models are a long way from what is found in real data. So the real surprise is that we find good fits we tried our relationship on over one hundred networks based on real data.  One of our referees suggested that since we had the code working for an initial set of 18 real-world networks, it would be relatively easy to look at more networks. So for the published version we used a python package graph-tool to download all the networks kept in the Netzschleude repository (both are maintained by Tiago Peixoto) and we processed all of the networks there provided no further programming effort was required.  A number of these networks were too big for our code or had some technical issue that broke our code (perhaps non-ASCII characters), but we ended up looking at a further 112 networks from real-world data sets without imposing any further bias. To our surprise that we still found our relationship held well (reduced chi-square of less than 2.0) for around two-thirds of these networks, while around 80% had a reduced chi-square below 3.0. Of those that failed, a quick look at their description would tell us that they were always unlikely to work well: they were very small, or very dense (that is every node is connected to most of the other nodes) or had some other special structural feature not covered in our simple picture.

Conclusions

So what does our result do for network science? First, our relationship is a clear quantitative example of how two centrality measures are related, in this case degree and closeness, two of the oldest and best-known examples. The fact that our relationship is non-linear highlights the fact that most previous studies look at possible linear correlations between centrality measures, e.g. by using Pearson correlation coefficients. If non-linear relationships exist between centrality measures, these would not be so clear and this might explain why the previous studies have not reached any very specific conclusions. Second, our results tell researchers that, to a first approximation, we can forget closeness as most of the information is already in the degree of a node which is much quicker to calculate.  However, the corollary of this is that if you do calculate closeness, you can now remove the general dependence of closeness on degree by comparing the measured value against the predicted value made using our formula. The difference from our predicted value of closeness is genuine new information in the closeness measurement that is not given by the degree value.

The final message is that the success of our method reminds us that for most networks, the exponential growth in the number of nodes as a function of the distance L from any node is the fundamental property that makes complex networks distinct from systems constrained by our three-dimensional world. 

References

Bavelas, A. 1950.  Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 725–730.

Brin, S. & Page,  L. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN systems, 30, 1–7.

Coscia, M. 2021. The Atlas for the Aspiring Network Scientist.  https://doi.org/10.48550/arXiv.2101.00863

Evans, T. S. & Chen, B., 2022. Linking the Network Centrality Measures Closeness and Degree. Communications Physics, 5, 172, doi: http://doi.org/10.1038/s42005-022-00949-5

Schoch, D. 2016. Periodic table of network centrality. URL http://schochastics.net/sna/periodic.html

Valente, T. W., Coronges, K., Lakon, C. & Costenbader, E. 2008. How correlated are network centrality measures? Connections 28, 16.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Physics and Astronomy
Physical Sciences > Physics and Astronomy

Related Collections

With collections, you can get published faster and increase your visibility.

Quantum simulation of gauge theories: steps and strides

This focus collection captures a slice of current progress and perspectives---deepening the understanding of gauge theory quantum simulation, broadly defined, and driving the community toward large-scale quantum simulations of gauge theories.

Publishing Model: Open Access

Deadline: Apr 24, 2024

Ultrafast X-ray Spectroscopy

This collection aims to provide a description of the state-of-the-art in time-resolved X-ray spectroscopic measurements, including applications, technological developments and theoretical studies to explore ultrafast phenomena in isolated quantum systems.

Publishing Model: Open Access

Deadline: May 13, 2024