[graph-tool] Speeding up graph copy?

Zouhair zoohair at gmail.com
Wed Jan 22 06:54:25 CET 2020


>
>
> How are you doing steps 3.a and 3.e exactly?
>
> Are you calling g.add_vertex() and g.remove_vertex() repeatedly?
>

Yes I am. I assumed that's not the best thing but compared to copying the
graph, adding then removing the vertices was performing better.

I investigated a different approach whereas I allow additional vertices to
exist as long as I keep track of the original number of vertices (n_orig):
For a set of candidate vertices of size m
- if graph size is not large enough, add new vertices
- create a graph view that only keeps (n_orig + m)
- add edges as needed (this relies on vertex properties)
- do graph computation
- when done, remove edges (but not vertices)

When finished with all candidate sets, remove the extra vertices.

Now I am only spending ~25% of the time in resizing the graph (around 20%
in remove_edges, despite having set_fast_edge_removal(True)). That's still
more than I'd like, but it's an improvement from the 80% I had with adding
/ removing vertices. If you have other suggestions on how to tackle this
that'd be great (for example, I tried without remove_edges, but the edges
that get added to the graphview end up showing up in the original graph. Is
there a way to avoid that or to filter such that only the original edges
are kept in the graphview?)


as a minimal working example:
```
import graph_tool.all as gt
import numpy as np

g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)

g.set_fast_edge_removal(True)
n_orig = g.num_vertices()

def foo(g, m, skipEdgeRemoval = False):
    n_act = g.num_vertices()
    n_extra = n_act - (n_orig+m)

    if n_extra < 0:
        print(f'adding {-n_extra} vertices ...')
        g.add_vertex(-n_extra)
        n_extra = 0
        n_act = g.num_vertices()

    filt = np.ones(n_act)
    if n_extra > 0:
        filt[(-n_extra):] = 0


    gv = gt.GraphView(g, filt)
    #print(f'filt = {filt}')

    new_vtx_ids = np.arange(n, n+m)
    print(f'indices of new vertices = {new_vtx_ids}')

    #Check that all edges are expected
    for e in list(gv.edges()):
        if [e.source(), e.target()] not in original_edges:
            raise Exception(f'{e.source()} -> {e.target()} was not in
original graph')


    #Add a bunch of random edges connecting to new candidates
    edge_list = []
    for s, t in zip(np.random.choice(range(n_orig), m),
                    np.random.choice(new_vtx_ids, m)):
        edge_list.append([s,t])
        #also updates some vertex properties, but that's easy/fast to undo,
not done in this MWE


    gv.add_edge_list(edge_list)

    print('Current edges:')
    for e in list(gv.edges()):
        print(f'{e.source()} -> {e.target()}')

    #computation on new graph goes here ...
    pass


    #print('Cleanup ...')
    #If cleanup is not done, this fails as the edges that were added to the
graphview
    #end up being added to the original graph as well :s
    if not skipEdgeRemoval:
        for e in edge_list:
            gv.remove_edge(gv.edge(e[0], e[1]))


foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
#above is all good
foo(g, 3, True) #skip cleanup
foo(g, 4) #fails
```




> Message: 3
> Date: Tue, 21 Jan 2020 15:01:53 +0100
> From: Tiago de Paula Peixoto <tiago at skewed.de>
> To: Main discussion list for the graph-tool project
>         <graph-tool at skewed.de>
> Subject: Re: [graph-tool] Speeding up graph copy?
> Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360 at skewed.de>
> Content-Type: text/plain; charset="utf-8"
>
> Am 20.01.20 um 19:43 schrieb Zouhair:
> > Hi,
> >
> > I'm using graph-tool in my project, and my use case is as follows:
> > 1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
> > edges)
> > 2) attach some properties to the vertices
> > 3) for about ~5000 sets of *new* candidate vertices (each set has about
> > 2-30 vertices)
> > ? 3.a - add new vertices to graph
> > ? 3.b - add edges as necessary between new vertices and old ones
> > ? 3.c - update the properties
> > ? 3.d - run an algorithm which depends on properties and edges, save
> > result to a list
> > ? 3.e - remove new vertices from graph and restore properties (i.e. undo
> > 3.a-c)
> >
> > Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
> > runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
> > is 3.a and 3.e (about 80% of runtime is spent there)
>


>
> > iii- use GraphView but only filter on the original vertices in the
> > graph: this avoids having to delete the vertices, but making GraphView
> > with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
> > < n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
> > original graph keeps growing...
>
> This can be improved by passing a boolean-valued vertex property map
> instead of a lambda function. In this way the GraphView creation becomes
> O(1) instead of O(N).
>
> Best,
> Tiago
>
> --
> Tiago de Paula Peixoto <tiago at skewed.de>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 833 bytes
> Desc: OpenPGP digital signature
> URL: <
> https://lists.skewed.de/pipermail/graph-tool/attachments/20200121/4ead384b/attachment.asc
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> graph-tool mailing list
> graph-tool at skewed.de
> https://lists.skewed.de/mailman/listinfo/graph-tool
>
>
> ------------------------------
>
> End of graph-tool Digest, Vol 144, Issue 15
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.skewed.de/pipermail/graph-tool/attachments/20200121/d27ee174/attachment.htm>


More information about the graph-tool mailing list