-
Notifications
You must be signed in to change notification settings - Fork 183
Conversation
Current coverage is
|
visitor::SimpleGraphVisitor) # the visitor | ||
graph::SimpleGraph, # the graph | ||
queue::Vector{Int}, # an (initialized) queue that stores the active vertices | ||
colormap, # an (initialized) color-map to indicate status of vertices (0=unseen, 1=seen, 2=closed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This violates the LG principal of having type annotations for all parameters. Can we do something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Union{AbstractArray,Dict}
work? We could give that a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure we want to go fully typed here? we loose in flexibility and we gain nothing in performance
- relax type of edgecolormap
I made Dict{Int,Int}() the default for vertexcolormap, since some experiments with large erdos_renyi graph and |
I think we should have three methods on this function: one with no type constraints, another with an AbstractArray{T}, and the third with Dict{Int,T}. The current code can go in the unconstrained method and the other two methods would provide the "types as documentation" that @sbromberger wants. |
@jpfairbanks My initial reaction is that this is probably the worst of all possible solutions. I'd MAYBE be ok with I don't see the use case, though. Aside from some minor additional convenience, why are |
Do you dislike the type parameterization |
I'm honestly not a fan of either. What's wrong with integer colors? We use ints for coloring all over the place: partitions, biconnected components, .... this change will introduce an inconsistency AND make things more complex without an identified use case. |
I will leave the contained type for now and say it can stay as ints. Making the color map able to support both Dict and Vector is important for exactly the use case carlo has found. If you want to do a BFS but only go a constant number of hops, then you shouldn't allocate o(nv) memory. That is why we need to support some sparse vector for the colormap. Carlo has some experiments that show Dict works well enough. |
But here's the thing: if you don't want to allocate I'm missing something. What, specifically, is the use case with the limitation? All else being equal, we want to limit memory allocation. However, in this case, we're faced with a tradeoff: get more efficient memory allocs in limited, specific defined cases, but (on the other hand) increase the code complexity AND create a situation where we get HORRIBLE memory allocs in some other defined cases. I'm not sure this is a good tradeoff. |
the problem to me is not memory consumption, it is performance: the reason I started implementing neighborhood is that I need it inside an algorithm that has to call it many many times. Now, each call would imply allocating and deallocating O(nv) memory, and this would take most of the computational time. Unless we want to optionally pass to neighborhood an initialized colormap, which would be really a questionable design choice, the only solution here is to allow both dicts and vectors for colormap. I don't really see any problem in this: ####
julia> g=erdos_renyi(100_000,100_000)
{100000, 100000} undirected graph
# BEFORE
julia> @time is_bipartite(g)
0.045502 seconds (84.06 k allocations: 10.419 MB, 21.31% gc time)
false
julia> @time is_bipartite(g)
0.034474 seconds (84.06 k allocations: 10.419 MB)
false
#AFTER
julia> @time is_bipartite(g)
0.037780 seconds (84.96 k allocations: 9.797 MB)
false
julia> @time is_bipartite(g)
0.036506 seconds (84.96 k allocations: 9.797 MB)
false
#####
julia> g=erdos_renyi(1_000_000,10_000_000)
{1000000, 10000000} undirected graph
#BEFORE
julia> @time is_bipartite(g)
1.038482 seconds (74 allocations: 40.599 MB, 0.41% gc time)
false
julia> @time is_bipartite(g)
1.041845 seconds (74 allocations: 40.599 MB, 0.31% gc time)
false
#AFTER
julia> @time is_bipartite(g)
1.086443 seconds (119 allocations: 33.325 MB, 0.21% gc time)
false
julia> @time is_bipartite(g)
1.074744 seconds (119 allocations: 33.325 MB, 0.13% gc time)
false
# case in which it is actually bipartite
julia> g=erdos_renyi(100_000,10_000)
{100000, 10000} undirected graph
#BEFORE
julia> @time is_bipartite(g)
0.232985 seconds (481.32 k allocations: 1.243 GB, 18.40% gc time)
true
julia> @time is_bipartite(g)
0.278150 seconds (481.32 k allocations: 1.243 GB, 31.56% gc time)
#AFTER
julia> @time is_bipartite(g)
0.099805 seconds (484.05 k allocations: 167.945 MB, 34.27% gc time)
true
julia> @time is_bipartite(g)
0.080207 seconds (484.05 k allocations: 167.945 MB, 20.56% gc time)
true This results are not clear to me, they should be investigated, anyway in no case there is a regression and in one case there is a huge improvements using dicts. |
""" | ||
function components(labels::Vector{Int}) | ||
d = Dict{Int, Int}() | ||
c = Vector{Vector{Int}}() | ||
i = 1 | ||
for (v,l) in enumerate(labels) | ||
index = get(d, l, i) | ||
d[l] = index | ||
index = get!(d, l, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice.
@CarloLucibello your numbers are compelling, and my tests bear out what you're saying. Thanks for sticking with this. Please see more comments in code. I think we can do this so that it remains consistent with LG standards (which should probably be codified at some point). @jpfairbanks - I think I've been able to do this without type parameterization, of which I'm not a huge fan. Take a look? |
@CarloLucibello this actually shows a deeper problem in I know that I always think about the single connected component case and then do whatever it takes to apply that to each component. This strategy is efficient when the number of components is small. Based on reading the code, that is what happened is ############################################
# Test graph for bipartiteness #
############################################
type BipartiteVisitor <: SimpleGraphVisitor
bipartitemap::Vector{UInt8}
is_bipartite::Bool
end
BipartiteVisitor(n::Int) = BipartiteVisitor(zeros(UInt8,n), true)
function examine_neighbor!(visitor::BipartiteVisitor, u::Int, v::Int, vcolor::Int, ecolor::Int)
if vcolor == 0
visitor.bipartitemap[v] = (visitor.bipartitemap[u] == 1)? 2:1
else
if visitor.bipartitemap[v] == visitor.bipartitemap[u]
visitor.is_bipartite = false
end
end
return visitor.is_bipartite
end
"""Will return `true` if graph `g` is
[bipartite](https://en.wikipedia.org/wiki/Bipartite_graph).
"""
is_bipartite(g::SimpleGraph, s::Int) = _bipartite_visitor(g, s).is_bipartite
function is_bipartite(g::SimpleGraph)
cc = filter(x->length(x)>2, connected_components(g))
return all(x->is_bipartite(g,x[1]), cc)
end
function _bipartite_visitor(g::SimpleGraph, s::Int)
nvg = nv(g)
visitor = BipartiteVisitor(nvg)
traverse_graph(g, BreadthFirst(), s, visitor)
return visitor
end This code is correct and efficient when the number of connected components is small. The implementation with Dict alleviates one of these problems because We should fix this and add a note to the docs saying that if you notice terrible performance for many small connected components, then it is probably a bug. But we can also relax the colormap to be AbstractArray or Associative. |
@jpfairbanks if I'm understanding correctly, though, this isn't too big an issue since the sizes of the connected components sum up to nv(g) in any case, so the array allocation doesn't really impact memory, just the number of allocations which we have in any case. Is this not right? |
@sbromberger I think you are right so, with regard to vertexcolormap and edgecolormap, I wouldn't even bother parametrizing them, but from the abpve discussion it seems you are inclined to PS I didn't know the existence of Associative, nice to hear |
@sbromberger yeah the number of large allocations is the problem. using a dict reduces the sizes of each allocation and reusing one big array is optimal because it allocates the right amount of memory all at once. @CarloLucibello yes. |
Wait, no. Why |
- reimplement neighborhood using BFS
We still want some things to use Arrays, such as connected components where it is easy and efficient to use Array{Int, 1}. If you really want strict typing, |
function is_bipartite(g::SimpleGraph) | ||
cc = filter(x->length(x)>2, connected_components(g)) | ||
return all(x->is_bipartite(g,x[1]), cc) | ||
vmap = Dict{Int,Int}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be a Dict because it will be full at the end. A plain old array will work fine. The memory reuse is correct now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be full only in the case the graph is actually bipartite. It is hard to say what is the most efficient choice on average here
@@ -123,7 +133,7 @@ function discover_vertex!(vis::TarjanVisitor, v) | |||
end | |||
|
|||
function examine_neighbor!(vis::TarjanVisitor, v, w, w_color::Int, e_color::Int) | |||
if w_color > 0 # 1 means added seen, but not explored; 2 means closed | |||
if w_color >= 0 # >=0 means added seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful. See #71, JuliaAttic/OldGraphs.jl#187, and JuliaAttic/OldGraphs.jl#188.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! thanks
This should be ready to merge. I know, it has 17 commits, but even with a careful squashing I wouldn't go with less then 10 since there are a lot of incremental fixes that are explained in the commit messages. I wouldn't go through the pain of cherry squashing, but I can come up with just one big commit if asked |
bump on this |
tests pass => 🚢 it |
Awesome. Thanks, Dr. @CarloLucibello! (and also Dr. @jpfairbanks for the review.) |
as a follow up of the discussion in #329
the crucial line here is the one that uses
get
to treat on the same footing arrays and dictionariesthe "add neighborhood" commit got in by mistake, I'll remove it laterthis also adds the neighborhood and egonet functions, based on breadth first search.
this also fixes #330