[graph-tool] Speeding up graph copy?

Zouhair zoohair at gmail.com
Wed Jan 22 08:17:31 CET 2020


Update:
I realized that GraphView has an efilt as well, so I've added an
edge_property to keep track of which edges were part of the original graph.

Oddly enough, this didn't improve things as it seems like the bottle neck
wasn't in edge removal (which is supposed to be O(1) with
set_fast_edge_removal(True)), but rather with constructing the edge object
using g.edge(src, tgt).
I tried using gv.add_edge() in a loop and saving the result (instead of
using add_edge_list). That made setting the edge properties at the end
fast, but adding the edges one by one ends up being as slow as
reconstructing the edge object.
What seems to have worked is to instead add another edge property map to
keep track of the new edges, and set that with add_edge_list, then copy it
over to the property map that gets used to the filtering.

With all of this in place this has pushed the graph "copying" down to 5% of
the routine, which seems reasonable for now. If someone has other
suggestions I'm all ears :)

Thanks again,
-z

Here's the MWE using efilt implementation:
```
import graph_tool.all as gt
import numpy as np
g = gt.Graph(directed=False)
g.add_vertex(10)
original_edges = [[0, 1], [0, 4]]
g.add_edge_list(original_edges)
g.edge_properties['orig'] = g.new_edge_property('bool', val=True)
g.edge_properties['oorig'] = g.new_edge_property('bool', val=True)


g.set_fast_edge_removal(True)
n_orig = g.num_vertices()

def foo(g, m):
    n_act = g.num_vertices()
    n_extra = n_act - (n_orig+m)

    if n_extra < 0:
        print(f'adding {-n_extra} vertices ...')
        g.add_vertex(-n_extra)
        n_extra = 0
        n_act = g.num_vertices()

    filt = np.ones(n_act)
    if n_extra > 0:
        filt[(-n_extra):] = 0


    gv = gt.GraphView(g, vfilt=filt, efilt=g.edge_properties['orig'])
    #print(f'filt = {filt}')

    new_vtx_ids = np.arange(n, n+m)
    print(f'indices of new vertices = {new_vtx_ids}')

    #Check that all edges are expected
    for e in list(gv.edges()):
        if [e.source(), e.target()] not in original_edges:
            raise Exception(f'{e.source()} -> {e.target()} was not in
original graph')


    #Add a bunch of random edges connecting to new candidates
    edge_list = []
    for s, t in zip(np.random.choice(range(n_orig), m),
                    np.random.choice(new_vtx_ids, m)):
        edge_list.append([s,t,False])

    #Can't set this directly to 'orig' since we are filtering on it, and if
it's false
    #the edge won't be added to this view!
    gv.add_edge_list(edge_list, eprops = [gv.edge_properties['oorig']])

    print('Current edges:')
    for e in list(gv.edges()):
        print(f'{e.source()} -> {e.target()}')

    #computation on new graph goes here ...
    pass


    g.edge_properties['orig'] = g.edge_properties['oorig']


foo(g, 1)
foo(g, 4)
foo(g, 3)
foo(g, 5)
foo(g, 3)
foo(g, 4)
```

On Tue, Jan 21, 2020 at 9:54 PM Zouhair <zoohair at gmail.com> wrote:

>
>> How are you doing steps 3.a and 3.e exactly?
>>
>> Are you calling g.add_vertex() and g.remove_vertex() repeatedly?
>>
>
> Yes I am. I assumed that's not the best thing but compared to copying the
> graph, adding then removing the vertices was performing better.
>
> I investigated a different approach whereas I allow additional vertices to
> exist as long as I keep track of the original number of vertices (n_orig):
> For a set of candidate vertices of size m
> - if graph size is not large enough, add new vertices
> - create a graph view that only keeps (n_orig + m)
> - add edges as needed (this relies on vertex properties)
> - do graph computation
> - when done, remove edges (but not vertices)
>
> When finished with all candidate sets, remove the extra vertices.
>
> Now I am only spending ~25% of the time in resizing the graph (around 20%
> in remove_edges, despite having set_fast_edge_removal(True)). That's still
> more than I'd like, but it's an improvement from the 80% I had with adding
> / removing vertices. If you have other suggestions on how to tackle this
> that'd be great (for example, I tried without remove_edges, but the edges
> that get added to the graphview end up showing up in the original graph. Is
> there a way to avoid that or to filter such that only the original edges
> are kept in the graphview?)
>
>
> as a minimal working example:
> ```
> import graph_tool.all as gt
> import numpy as np
>
> g = gt.Graph(directed=False)
> g.add_vertex(10)
> original_edges = [[0, 1], [0, 4]]
> g.add_edge_list(original_edges)
>
> g.set_fast_edge_removal(True)
> n_orig = g.num_vertices()
>
> def foo(g, m, skipEdgeRemoval = False):
>     n_act = g.num_vertices()
>     n_extra = n_act - (n_orig+m)
>
>     if n_extra < 0:
>         print(f'adding {-n_extra} vertices ...')
>         g.add_vertex(-n_extra)
>         n_extra = 0
>         n_act = g.num_vertices()
>
>     filt = np.ones(n_act)
>     if n_extra > 0:
>         filt[(-n_extra):] = 0
>
>
>     gv = gt.GraphView(g, filt)
>     #print(f'filt = {filt}')
>
>     new_vtx_ids = np.arange(n, n+m)
>     print(f'indices of new vertices = {new_vtx_ids}')
>
>     #Check that all edges are expected
>     for e in list(gv.edges()):
>         if [e.source(), e.target()] not in original_edges:
>             raise Exception(f'{e.source()} -> {e.target()} was not in
> original graph')
>
>
>     #Add a bunch of random edges connecting to new candidates
>     edge_list = []
>     for s, t in zip(np.random.choice(range(n_orig), m),
>                     np.random.choice(new_vtx_ids, m)):
>         edge_list.append([s,t])
>         #also updates some vertex properties, but that's easy/fast to
> undo, not done in this MWE
>
>
>     gv.add_edge_list(edge_list)
>
>     print('Current edges:')
>     for e in list(gv.edges()):
>         print(f'{e.source()} -> {e.target()}')
>
>     #computation on new graph goes here ...
>     pass
>
>
>     #print('Cleanup ...')
>     #If cleanup is not done, this fails as the edges that were added to
> the graphview
>     #end up being added to the original graph as well :s
>     if not skipEdgeRemoval:
>         for e in edge_list:
>             gv.remove_edge(gv.edge(e[0], e[1]))
>
>
> foo(g, 1)
> foo(g, 4)
> foo(g, 3)
> foo(g, 5)
> #above is all good
> foo(g, 3, True) #skip cleanup
> foo(g, 4) #fails
> ```
>
>
>
>
>> Message: 3
>> Date: Tue, 21 Jan 2020 15:01:53 +0100
>> From: Tiago de Paula Peixoto <tiago at skewed.de>
>> To: Main discussion list for the graph-tool project
>>         <graph-tool at skewed.de>
>> Subject: Re: [graph-tool] Speeding up graph copy?
>> Message-ID: <1eb5aeab-0a3d-d3fa-84e3-60734afdd360 at skewed.de>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Am 20.01.20 um 19:43 schrieb Zouhair:
>> > Hi,
>> >
>> > I'm using graph-tool in my project, and my use case is as follows:
>> > 1) build an undirected graph with vertices and edges (~1e5 vtx and ~5e6
>> > edges)
>> > 2) attach some properties to the vertices
>> > 3) for about ~5000 sets of *new* candidate vertices (each set has about
>> > 2-30 vertices)
>> > ? 3.a - add new vertices to graph
>> > ? 3.b - add edges as necessary between new vertices and old ones
>> > ? 3.c - update the properties
>> > ? 3.d - run an algorithm which depends on properties and edges, save
>> > result to a list
>> > ? 3.e - remove new vertices from graph and restore properties (i.e. undo
>> > 3.a-c)
>> >
>> > Steps 1) and 2) are going well. For step 3), the algorithm in step 3.d
>> > runs fast, 3.b and 3.c are also fast, however, the bottleneck I'm facing
>> > is 3.a and 3.e (about 80% of runtime is spent there)
>>
>
>
>>
>> > iii- use GraphView but only filter on the original vertices in the
>> > graph: this avoids having to delete the vertices, but making GraphView
>> > with a filter function is still slow (doing `gt.GraphView(g, lambda v: v
>> > < n_nodes_orig)` takes 75% of runtime, and the number of vertices in the
>> > original graph keeps growing...
>>
>> This can be improved by passing a boolean-valued vertex property map
>> instead of a lambda function. In this way the GraphView creation becomes
>> O(1) instead of O(N).
>>
>> Best,
>> Tiago
>>
>> --
>> Tiago de Paula Peixoto <tiago at skewed.de>
>>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: signature.asc
>> Type: application/pgp-signature
>> Size: 833 bytes
>> Desc: OpenPGP digital signature
>> URL: <
>> https://lists.skewed.de/pipermail/graph-tool/attachments/20200121/4ead384b/attachment.asc
>> >
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> graph-tool mailing list
>> graph-tool at skewed.de
>> https://lists.skewed.de/mailman/listinfo/graph-tool
>>
>>
>> ------------------------------
>>
>> End of graph-tool Digest, Vol 144, Issue 15
>> *******************************************
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.skewed.de/pipermail/graph-tool/attachments/20200121/17db6812/attachment-0001.htm>


More information about the graph-tool mailing list