S the trace of a matrix. This distance takes into account
S the trace of a matrix. This distance takes into account the number of edges that are in common between the two networks. Here we propose an extension of this metric, which we call the Generalised Hamming Distance (GHD), defined asGHD(A, B ) =1 N(N – 1)aij – biji,j,(1)Montana et al. BMC Bioinformatics (2015) 16:Page 3 ofwhere aij and bij are mean-centred edge weights defined as aij = aij – 1 N(N – 1) aij ,i,jA non-parametric test for network comparisonbij = bij -1 N(N – 1)biji,jand i,j denotes summation over distinct i and j. The edge weights, which depend on the topology of the networks, provide a measure of connectivity between every pair of nodes i and j in A and B , respectively. When aij and bij are binary values indicating the presence or absence of an edge, i.e. are the elements of A and B, GHD(A, B ) is related to the HD. The specific node weights we propose here instead quantify the topological PD98059 site overlap (TO) between a pair of nodes by taking into account the local neighbourhood structure around those nodes [21]. In the literature, the TO measure has been successfully applied for the detection of communities in biological networks, PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/26780312 and there is empirical evidence that it carries biological meaning [7, 22]. We use the one-step TO between nodes i and j indicating whether they share direct connections to other nodes. The weights are obtained from the adjacency matrix as follows: aij =l=i,j Ail AljFor inferential purposes, we require computing the probability that a distance as extreme or more extreme than the observed GHD value could have been observed by chance only. By treating the GHD as a random variable with unknown sampling distribution, this probability can be estimated non-parametrically via permutation testing. First, we specify the null hypothesis as beingH0 : networks A and B are independent.(3)+ Aijl=j Ailminl=i Ail- Aij ,- Aij +,(2)By taking B as reference, each permutation consists of shuffling the labels of the nodes in A while keeping the edges unchanged. This generates a permuted network A that is isomorphic to A, and the exchangeability property holds. In turn, this signifies that the original and permuted networks are generated from the same underlying, but unspecified, model [23, 24]. Since all permutation networks are isomorphic, permuting the labels of the network is equivalent to shuffling rows and columns of the adjacency matrix, an approach that bears some similarity with Mantel’s test [25] for the comparison of two distance matrices. All the the N! possible permutations are then collected in a set , and for each a permuted GHD value is denoted as GHD (A , B ) = 1 N(N – 1) a(i)(j) – biji,j,when i = j, and otherwise aij = 1, and analogously for bij . The GHD sums the squared differences (aij – bij )2 over all pairs of nodes in the network. Note that the term l=i,j Ail Alj is a count of all vertexes (i, l, j) containing node pair (i, j). This term measures the connectivity information of each (i, j) pair plus their common onestep neighbours. The denominator in (2) can be written as min(di , dj ) + 1 – Aij , where di and dj represent the node degrees of i and j, respectively. It is roughly equal to the smaller of (di , dj ) and normalises aij such that 0 aij 1. A large discrepancy between aij and bij indicates a topological difference localised around that pair of nodes. By exploring the neighbourhood of each node, the proposed GHD can detect subtle topological changes with higher sensitivity compared to t.