[graph-tool] Adding edge list from file too large to fit into memory

Tiago de Paula Peixoto tiago at skewed.de
Mon Feb 1 19:20:57 CET 2021


Am 01.02.21 um 12:32 schrieb James Ruffle:
> Dear Graph-Tool community,
> 
> I am trying to construct a graph with a large number of edges, using an 
> np file as an edge_list.
> There are 125760 vertices and an edge list of length (7907725920, 2).
> 
> In order to use the npy edge_list file, I have needed to load the edge 
> list as a readable memmap because, at a size of 126Gb, it is far too 
> large to load into memory. But, when calling add_edge_list to this 
> memmap, I think it is still being loaded into memory as the RAM will 
> fill and the python session crash. I suppose the alternative is that the 
> graph object becomes too large to hold into memory, but with previous 
> large graphs I did not find this to be the problem. Does anybody have a 
> solution to this issue?

I'm not sure what kind of solution you are expecting. Do you want 
Graph.add_edge_list() *not* to load the edges into memory?

> Lastly, after I find a means to add this number of edges, I need to 
> assign weights to the edges, again from a memmap file due to its size, 
> which gives me the same problem. Any advice?
> 
> 
> *Sample code:*
> 
> #prime the graph with the number of vertices
> g = Graph(directed=False)
> g.add_vertex(125760)
> 
> #load the edge list as memmap and add it
> idx_indi_mmap = np.load('idx_indi.npy', mmap_mode='r’)
> idx_indi_mmap.shape #(7907725920, 2)
> 
> g.add_edge_list(idx_indi_mmap) #script will crash at this point from 
> filling the RAM
> 
> #Then want to add weights by taking the indices from another memmap object
> node_matrix = np.load('node_matrix.npy', mmap_mode='r’)
> node_matrix.shape #(125760, 125760)
> weights = node_matrix[idx_indi_mmap]
> ew = g("double")
> ew.a = weights
> g.ep[‘weight'] = ew

You might save some intermediary memory by loading the weights in one go 
with Graph.add_edge_list() (see the 'eprops' parameter), which requires 
a numpy array with three values, (source, target, weight).

However, if you do not have enough RAM to hold the entire graph + 
weights into memory, this is also not going to work.

Best,
Tiago

-- 
Tiago de Paula Peixoto <tiago at skewed.de>


More information about the graph-tool mailing list