Community detection in Julia.
This package inspired by louvain-igraph.
It relies on Graphs.jl
for it to function. Besides the relative
flexibility of the implementation, it also scales well, and can be run on graphs
of millions of nodes (as long as they can fit in memory).
The core function is
optimize_partition!
which finds the optimal partition using the louvain algorithm
for a number of different methods. The methods currently implemented are:
-
Modularity. This method compares the actual graph to the expected graph, taking into account the degree of the nodes [1]. The expected graph is based on a configuration null-model. Notice that we use the non-normalized version (i.e. we don't divide by the number of edges), so that this Modularity values generally does not fall between 0 and 1. The formal definition is
H = sum_ij (A_ij - k_i k_j / 2m) d(s_i, s_j),
where
A_ij = 1
if there is an edge between nodei
andj
,k_i
is the degree of nodei
ands_i
is the community of node i. -
RBConfiguration. This is an extension of modularity which includes a resolution parameter [2]. In general, a higher resolution parameter will lead to smaller communities. The formal definition is
H = sum_ij (A_ij - gamma k_i k_j / 2m) d(s_i, s_j),
where
gamma
is the resolution value, and the other variables are the same as for Modularity. -
RBER. A variant of the previous method that instead of a configuration null-model uses a Erdös-Rényi null-model in which each edge has the same probability of appearing [2]. The formal definition is
H = sum_ij (A_ij - gamma p) d(s_i, s_j),
where
p
is the density of the graph, and the other variables are the same as for Modularity, withgamma
a resolution parameter. -
CPM. This method compares to a fixed resolution parameter, so that it finds communities that have an internal density higher than the resolution parameter, and is separated from other communities with a density lower than the resolution parameter [3].The formal definition is
H = sum_ij (A_ij - gamma ) d(s_i, s_j),
with
gamma
a resolution parameter, and the other variables are the same as for Modularity. -
Significance. This is a probabilistic method based on the idea of assessing the probability of finding such dense subgraphs in an (ER) random graph [4]. The formal definition is
H = sum_c M_c D(p_c || p)
where
M_c
is the number of possible edges in communityc
, i.e.n_c (n_c - 1)/2
for undirected graphs and twice that for directed grahs withn_c
the size of communityc
,p_c
is the density of the communityc
, andp
the general density of the graph, andD(x || y)
is the binary Kullback-Leibler divergence. -
Surprise. Another probabilistic method, but rather than the probability of finding dense subgraphs, it focuses on the probability of so many edges within communities [5, 6]. The formal definition is
H = m D(q || <q>)
where
m
is the number of edges,q
is the proportion of edges within communities (i.e.sum_c m_c / m
) and<q>
is the expected proportion of edges within communities in an Erdős–Rényi graph. -
Infomap (Map Equation) A method based on infomation theory, you can explore the mechanics of the map equation in here
This package also implement label propagation algorithm and neighbor strength driven label propagation algorithm.
julia> Pkg.clone("git://github.com/afternone/CommunityDetection.jl.git")
To start, make sure to import the packages:
using CommunityDetection
using GraphPlot # for testing graph
We'll create a simple graph for testing purposes:
g = graphfamous("karate")
For simply finding a partition use:
# construct modularity partition
mp = mpartition(g)
optimize_partition!(mp)
In case you want to use a weighted graph
edge_weights = ones(num_edge(g))
mp = mpartition(g, edge_weights)
optimize_partition!(mp)
Please note that not all methods are necessarily capable of handling weighted graphs.
Notice that mp
now contains the membership of nodes
mp.membership
You can also find partition using infomap algorithm
# construct flow partition
fp = flow_partition(g)
optimize_partition!(fp)
Some other examples
# Significance method
mp = mpartition(g)
optimize_partition!(mp, method = :Significance)
# label propagation
membership1 = lpa(g)
# neighbor strength driven label propagation
membership2 = nsdlpa(g)
# calculate modularity
modularity(g, membership1)
# calculate NMI of two partition
nmi(membership1, membership2)
# calculate VOI of two partition
voi(membership1, membership2)
Please cite the references appropriately in case they are used.
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
- Newman, M. & Girvan, M. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004).
- Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary degree distribution. Physical Review E 76, 015102 (2007).
- Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free community detection. Physical Review E 84, 016114 (2011).
- Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure. Scientific Reports 3, 2930 (2013).
- Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure of complex networks. Scientific reports 3, 1060 (2013).
- Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical Surprise. Forthcoming (2015).
- Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P. Community structure in time-dependent, multiscale, and multiplex networks. Science 328, 876–8 (2010).