[graph-tool] Questions about inference algorithms

Andrea Briega annbrial at gmail.com
Wed Jun 8 10:14:54 CEST 2016


Thank you very much, your answers have been really helpful.  I am now on
the last step, model selection, and I would like to be sure that I’m doing
it right. I get the posterior odds ratio to compare two partitions throught
this way: e^-(dl1-dl2),  with dl1 and dl2 as higher and lower description
length respectively. I have obtained description length using
‘state.entropy()’ for nested models and ‘state.entropy(dl=True)’ for no
nested ones.
I have doubts about this because small differences in description length
cause much lower values than 0.01, so in most cases the evidence supporting
one of the models is decisive. I only get higher values than 0.01 if the
difference in description length is lower than 5 units. With my data
(24.000 nodes and 5.000.000 edges) I always obtain decisive supports,
either when I compare different models or when I compare different runs of
the same model. I wonder if this is rigth.

Thanks again,


Andrea

2016-05-19 9:55 GMT+02:00 Andrea Briega <annbrial at gmail.com>:

> Dear Mr Peixoto,
>
>
> I have just run a few analysis of the new version of your package and my
> results totally change between v2.13 and v2.16.
>
> Nested_minimize_blockmodel is the one that make most relevant changes and
> it is very difficult to get a biological explanation of the new results,
> mainly at the superior hierarchical levels.
> I would like to know the particular changes in these two analysis to
> better understanding of my results. Is it possible to change any parameter
> to run this function in a similar way to the v2.13? I used to run this
> function on this way:
>
> state = minimize_nested_blockmodel_dl(g, pclabel=vprop_double,
> overlap=False, nonoverlap_init=False, deg_corr=True, layers=False)
>
> And I have run the new version of the function on this way:
>
> state = minimize_nested_blockmodel_dl(g,
> state_args=dict(pclabel=vprop_double), overlap=False,
> nonoverlap_init=False,  deg_corr=True,  layers=False)
>
>
> Thank you very much,
>
> Andrea
>
> 2016-05-12 18:43 GMT+02:00 Andrea Briega <annbrial at gmail.com>:
>
>> Thank you very much! I was wrong, I meant "state =
>> gt.minimize_blockmodel_dl(g, pclabel=vprop_double)", it has been a mistake
>> while I was writing the mail. So the key was the use of "state_args", I
>> tried it but with a different notation and obviously it didn't work. Now I
>> can go on!
>>
>> Thanks again,
>>
>>
>> Andrea
>>
>>
>> 2016-05-12 10:56 GMT+02:00 Andrea Briega <annbrial at gmail.com>:
>>
>>> Hi again,
>>>
>>> I have recently noticed the actualization of graphtool and now I am a
>>> little bit confused about some changes. Sorry, I know my questions are very
>>> basic. I am not familiar with these language and I have some dificulties to
>>> get results.
>>>
>>> I am running inference algorithms to get the best model using different
>>> options of model selection. I want to set pclabel in the inference
>>> algorithms because I know a priori my network is bipartite, and next I want
>>> to get the description length. Before actualization I did this by this way:
>>>
>>> vprop_double = g.new_vertex_property("int") # g is my network
>>>  for i in range(0, 11772):
>>>      vprop_double[g.vertex(i)] = 1
>>>  for i in range(11773, 214221):
>>>      vprop_double[g.vertex(i)] = 2
>>>
>>> state = gt.minimize_blockmodel_dl(g, pclabel=True)
>>>
>>> state.entropy(dl=True) # I am not sure this is the right way to get the
>>> description length.
>>>
>>> But now I have some problems. First of all, minimize_blockmodel_dl
>>> doesn't have a pclabel argument so I don't know how indicate it in the
>>> inference algorithm. I have tried this:
>>>
>>> state.pclabel = vprop_double
>>>
>>> But I get the same result when I do "state.entropy(dl=True)" as before.
>>> Also, I get the same result doing "state.entropy(dl=True)" or
>>> "state.entropy()", and I don't understand why neither.
>>>
>>> And finally, in NestedBlockState objects I don't know to get description
>>> length because entropy hasn't a "dl" argument. In these objects entropy and
>>> dl are the same?
>>>
>>> In conclusion, I don't know how to set pclabel and to get the
>>> description length in hierarchical models, and I am not sure if I am
>>> getting it correctly in non-hierarchical ones.
>>>
>>> Sorry again for my basic questions but I can't go on because of these
>>> problems.
>>>
>>> Thank you very much!
>>>
>>> Best regards,
>>>
>>>
>>>
>>>
>>> Andrea
>>>
>>>
>>>
>>>
>>> 2016-05-10 11:41 GMT+02:00 Andrea Briega <annbrial at gmail.com>:
>>>
>>>> Thank you very much! your answer has been really helpful, now I
>>>> understand this much better. I'll think about the options you said.
>>>>
>>>> Thanks again,
>>>>
>>>>
>>>> Andrea
>>>>
>>>> 2016-05-09 16:33 GMT+02:00 Andrea Briega <annbrial at gmail.com>:
>>>>
>>>>> Dear Dr Peixoto,
>>>>>
>>>>>
>>>>> I would like to solve some questions I have about inference algorithms
>>>>> for the identification of large-scale network structure via the statistical
>>>>> inference of generative models.
>>>>>
>>>>> Minimize_blockmodel algorithm takes an hour to finish using my
>>>>> network  with 21000 nodes (like the hierarchical version), and it spends
>>>>> two days and a half with overlap. However, I have run an hierarchical
>>>>> analysis with overlap, and it is still running since 14 days ago. So my
>>>>> first question is: is this time normal, or maybe there is any problem? Do
>>>>> you know how long could it ussually takes?
>>>>>
>>>>> Secondly, I have repeated some of these analysis with exactly same
>>>>> options but I get different solutions (similar but different), so I wonder
>>>>> if the algorithm is heuristic (I thought it was exact).
>>>>>
>>>>> My last question question regards bipartite analysis. I have two types
>>>>> of nodes in my network and I wonder if there are any analytical difference
>>>>> when running these algorithms with the bipartite option (clabel=True, and
>>>>> different labels in each group of nodes) or not, because it seems that the
>>>>> program “knows” my network is bipartite in any case. If there are
>>>>> differences between bipartite and “unipartite” analysis (clabel=False), is
>>>>> it possible to compare description length between them to model selection?
>>>>>
>>>>> Thank you very much for your help!
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>>
>>>>>
>>>>> Andrea
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.skewed.de/pipermail/graph-tool/attachments/20160608/aa1d0e83/attachment.htm>


More information about the graph-tool mailing list