#### Computing persistence diagram in parallel

cai.507@...

Hi Dmitriy,

I am trying to compute persistence diagram for different graphs in parallel in python2. I use joblib to do multiprocessing computation. Here is a toy example:

def computePD(i):
import dionysus as d
import numpy as np
np.random.seed(42)
f1 = d.fill_rips(np.random.random((i+10, 2)), 2, 1)
m1 = d.homology_persistence(f1)
dgms1 = d.init_diagrams(m1, f1)
return dgms1
def get_dgms(n_jobs=1):
from joblib import delayed, Parallel
return Parallel(n_jobs=n_jobs)(delayed(computePD)(i) for i in range(10))
result = get_dgms(1)

However, when I change the n_jobs to 2(result = get_dgms(2)), I get a wired result.
[Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 6148914691236517774 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points]

The correct result should be something like
[Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 1 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 4 points]

At first, I thought I didn't use joblib correctly so I change computePD to some simple function like the following
def computePD(i):
return i ** 2
then I call the same function with different n_jobs, the result is the same when I change the n_jobs, just as expected.

Again, I am not sure the problem lies in Dionysus or misuse of joblib. Any suggestions will be appreciated.

Best,
Chen

Dmitriy Morozov

Running your example in Python 3 explains the problem.

result = get_dgms(2)

reports:

MaybeEncodingError: Error sending result: '[Diagram with 2 points]'. Reason: 'TypeError("can't pickle dionysus._dionysus.Diagram objects",)'

It seems joblib tries to pickle the return values and persistence diagrams are not pickleable. Why Python 2 fails silently, I don't know. The long-term solution is for me to add pickling to the diagram class (can you please file an issue on GitHub, so I don't forget about it?).

The quick fix for you would be to return a list rather than a diagram. So change the return line in computePD to

return [(p.birth, p.death) for p in dgms1]

Dmitriy

On Wed, Jun 20, 2018 at 1:27 PM, wrote:

[Edited Message Follows]

Hi Dmitriy,

I am trying to compute persistence diagram for different graphs in parallel in python2. I use joblib to do multiprocessing computation. Here is a toy example:

def computePD(i):
import dionysus as d
import numpy as np
np.random.seed(42)
f1 = d.fill_rips(np.random.random((i+10, 2)), 2, 1)
m1 = d.homology_persistence(f1)
dgms1 = d.init_diagrams(m1, f1)
return dgms1
def get_dgms(n_jobs=1):
from joblib import delayed, Parallel
return Parallel(n_jobs=n_jobs)(delayed(computePD)(i) for i in range(10))
result = get_dgms(1)

However, when I change the n_jobs to 2(result = get_dgms(2)), I get a wired result.
[Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 6148914691236517774 points,
Diagram with 0 points,
Diagram with 6822207245174201887 points,
Diagram with 0 points]

The correct result should be something like
[Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 1 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 2 points,
Diagram with 4 points]

At first, I thought I didn't use joblib correctly so I change computePD to some simple function like the following
def computePD(i):
return i ** 2
then I call the same function with different n_jobs, the result is the same when I change the n_jobs, just as expected.

Again, I am not sure the problem lies in Dionysus or misuse of joblib. Any suggestions will be appreciated.

Best,
Chen

cai.507@...

Hi Dmitriy,

Thank you very much. The quick fix works well. But after computing n diagrams, I want to compute a n*n distance matrix in parallel. As you mentioned, the persistence diagrams are not pickleable, making it hard for parallel computing in python.

I put the request on github.

Best,
Chen

Dmitriy Morozov

Why is the distance matrix a problem? You can convert a list to a diagram inside your function. That's not expensive. You only return a number, so that's picklable. I don't think I follow.

On Fri, Jun 22, 2018, 8:29 PM <cai.507@...> wrote:
Hi Dmitriy,

Thank you very much. The quick fix works well. But after computing n diagrams, I want to compute a n*n distance matrix in parallel. As you mentioned, the persistence diagrams are not pickleable, making it hard for parallel computing in python.

I put the request on github.

Best,
Chen