[graph-tool] Speeding up graph copy?

Zouhair zoohair at gmail.com
Mon Jan 20 19:43:27 CET 2020


Hi,

I'm using graph-tool in my project, and my use case is as follows:
1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
edges)
2) attach some properties to the vertices
3) for about ~5000 sets of *new* candidate vertices (each set has about
2-30 vertices)
  3.a - add new vertices to graph
  3.b - add edges as necessary between new vertices and old ones
  3.c - update the properties
  3.d - run an algorithm which depends on properties and edges, save result
to a list
  3.e - remove new vertices from graph and restore properties (i.e. undo
3.a-c)

Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d runs
fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing is 3.a
and 3.e (about 80% of runtime is spent there)

I've read through the docs and tried different approaches:

i- make a copy of the graph using g.copy() or Graph(g): this avoids steps
3.e, but copying the graph is much slower
ii - use a GraphView: adding vertices to the graph view adds them to the
original graph, so still need to do step 3.e
iii- use GraphView but only filter on the original vertices in the graph:
this avoids having to delete the vertices, but making GraphView with a
filter function is still slow (doing `gt.GraphView(g, lambda v: v <
n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
original graph keeps growing...
iv - do this using multiprocessing with the hope that pickling the object
avoids having to copy the graph, but it it appears that a different process
modifying the graph ends up affecting the graph that another process
receives ( because multiprocessing doesn't pickle the underlying C++
object/memory?)

In all cases, it's surprising that the bulk of the runtime is in making
graph copies as opposed to running the algorithm. The cleanest solution is
to take a copy of the graph and work on it to avoid having to later delete
vertices and restore properties, so I'm hoping that there is a way to speed
up graph copies?
I'm hoping for some input on whether there is a better way to achieve this
using graph-tool, or some pythonic magic to help speed up the copy
operation.

Thanks!

P.S. sorry if this is a repost - my initial post was rejected by nabble,
trying again directly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.skewed.de/pipermail/graph-tool/attachments/20200120/b4c0c133/attachment.htm>


More information about the graph-tool mailing list