#### dionysus Wasserstein distances

Hi Dmitriy,

I have a question about computing the Wasserstein distance using Dionysus 2. If either of the the diagrams have a point of multiplicity greater than one, having a death time as inf, the wasserstein distance returns inf. What I am seeing from my data is persistence diagrams that look like:

d1=d.Diagram([(0,float('inf')),
...: (0,1.83587),
...: (0,3.29106),
...: (0,3.40428),
...: (0,2.3634),
...: (0,3.36883),
...: (0,2.40259),
...: (0,float('inf')),
...: (0,float('inf')),
...: (0,float('inf')),
...: (0,3.26498),
...: (0,3.80988),
...: (0,3.25368),
...: (0,0.916199)])

d2=d.Diagram([(0,float('inf')),
...: (0,2.1269),
...: (0,2.5984),
...: (0,2.70797),
...: (0,2.27383),
...: (0,1.60921),
...: (0,2.38453),
...: (0,2.16508),
...: (0,1.55326),
...: (0,2.06283),
...: (0,1.83689),
...: (0,1.57891),
...: (0,2.27507),
...: (0,2.94212)])

d.wasserstein_distance(d1, d2, q=2)
inf

d3=d.Diagram([(0,float('inf')),
...: (0,1.83587),
...: (0,3.29106),
...: (0,3.40428),
...: (0,2.3634),
...: (0,3.36883),
...: (0,2.40259),
...: (0,3.26498),
...: (0,3.80988),
...: (0,3.25368),
...: (0,0.916199)])

d.wasserstein_distance(d2, d3, q=2)
2.6976122856140137

In this example d3 is just d1 with the multiple (0, ‘inf’) pairs removed. Is there an easy way to remove these multiplicities, or should I be looking at something else?

Dmitriy Morozov

I'm not quite sure what you are after. If you want to reduce multiplicity of all points to one, you could convert a diagram to a set and then construct a new diagram out of it:

s = set([(p.birth,p.death) for p in d1])
d2 = d.Diagram(list(s))

I don't know of a situation where this actually makes sense, but presumably you have something in mind. If you want something more elaborate (like reducing multiplicity of only points at infinity), you'll have to tweak the above idea accordingly. But the point is that you can convert between standard python lists and persistence diagrams in a pretty straightforward way. And you can manipulate those lists in many different ways.

I hope this helps.
Dmitriy

On Fri, Jun 8, 2018 at 2:34 PM, Adam Spannaus wrote:
Hi Dmitriy,

I have a question about computing the Wasserstein distance using Dionysus 2. If either of the the diagrams have a point of multiplicity greater than one, having a death time as inf, the wasserstein distance returns inf. What I am seeing from my data is persistence diagrams that look like:

d1=d.Diagram([(0,float('inf')),
...: (0,1.83587),
...: (0,3.29106),
...: (0,3.40428),
...: (0,2.3634),
...: (0,3.36883),
...: (0,2.40259),
...: (0,float('inf')),
...: (0,float('inf')),
...: (0,float('inf')),
...: (0,3.26498),
...: (0,3.80988),
...: (0,3.25368),
...: (0,0.916199)])

d2=d.Diagram([(0,float('inf')),
...: (0,2.1269),
...: (0,2.5984),
...: (0,2.70797),
...: (0,2.27383),
...: (0,1.60921),
...: (0,2.38453),
...: (0,2.16508),
...: (0,1.55326),
...: (0,2.06283),
...: (0,1.83689),
...: (0,1.57891),
...: (0,2.27507),
...: (0,2.94212)])

d.wasserstein_distance(d1, d2, q=2)
inf

d3=d.Diagram([(0,float('inf')),
...: (0,1.83587),
...: (0,3.29106),
...: (0,3.40428),
...: (0,2.3634),
...: (0,3.36883),
...: (0,2.40259),
...: (0,3.26498),
...: (0,3.80988),
...: (0,3.25368),
...: (0,0.916199)])

d.wasserstein_distance(d2, d3, q=2)
2.6976122856140137

In this example d3 is just d1 with the multiple (0, ‘inf’) pairs removed. Is there an easy way to remove these multiplicities, or should I be looking at something else?

nukpezah@...

Hi Dmitri
I also have a question about the Wassertein distances. I have 21 different NMR structures of the same protein and I am trying to compute the pairwise wasserstein distances in dimension 0, dimension 1, and dimension 2.  I lable the persistence diagrams  of the different NMR structures as dgms1, dgms2, dgms3,....., dgms21. I can compute the  dimension 0 distance between dgms1 and the 21 peristence diagrams in the following code;
j = 0 #compute the zero dimensional wasserstein distances
A0 = np.array([])
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms1[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms2[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms3[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms4[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms5[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms6[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms7[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms8[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms9[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms10[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms11[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms12[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms13[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms14[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms15[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms16[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms17[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms18[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms19[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms20[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms21[j], q=2))
A0
and it gives me the result in the jupyter notebook as
```array([0.        , 0.86897045, 0.86734861, 0.87343866, 0.68319196,
0.86747402, 0.86989343, 0.86605436, 0.86735857, 0.85884637,
0.86786294, 0.87749594, 0.86279768, 0.86108643, 0.8721627 ,
0.87351388, 0.86195225, 0.87039518, 0.86997241, 0.86407465,
0.87141883])But when I try to compute the dimension 0 wasserstein distance between dgms2 and the 21 structures as in the following code below, it keeps running withiout giving me any result; I wonder if it has something to to do with the data type of dgms?j=0A1 = np.array([])
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms1[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms2[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms3[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms4[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms5[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms6[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms7[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms8[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms9[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms10[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms11[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms12[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms13[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms14[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms15[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms16[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms17[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms18[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms19[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms20[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms21[j], q=2))

Dmitriy Morozov

Hi Jonathan,

From your code snippet I cannot tell what's going wrong. Your first and second line (for dgms2) should return predictable results. The distance is symmetric, so you should get the same answer for (dgms2[j],dgms1[j]), and (dgms2[j], dgms2[j]) should be 0. I'd first try running the individual computations, without sticking the results in the array, to figure out what's going wrong.

But in short I can't tell what the problem is.

Best,
Dmitriy

On Fri, Nov 2, 2018 at 8:02 AM <nukpezah@...> wrote:
Hi Dmitri
I also have a question about the Wassertein distances. I have 21 different NMR structures of the same protein and I am trying to compute the pairwise wasserstein distances in dimension 0, dimension 1, and dimension 2.  I lable the persistence diagrams  of the different NMR structures as dgms1, dgms2, dgms3,....., dgms21. I can compute the  dimension 0 distance between dgms1 and the 21 peristence diagrams in the following code;
j = 0 #compute the zero dimensional wasserstein distances
A0 = np.array([])
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms1[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms2[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms3[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms4[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms5[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms6[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms7[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms8[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms9[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms10[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms11[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms12[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms13[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms14[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms15[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms16[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms17[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms18[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms19[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms20[j], q=2))
A0 =np.append(A0, d.wasserstein_distance(dgms1[j], dgms21[j], q=2))
A0
and it gives me the result in the jupyter notebook as
```array([0.        , 0.86897045, 0.86734861, 0.87343866, 0.68319196,
0.86747402, 0.86989343, 0.86605436, 0.86735857, 0.85884637,
0.86786294, 0.87749594, 0.86279768, 0.86108643, 0.8721627 ,
0.87351388, 0.86195225, 0.87039518, 0.86997241, 0.86407465,
0.87141883])But when I try to compute the dimension 0 wasserstein distance between dgms2 and the 21 structures as in the following code below, it keeps running withiout giving me any result; I wonder if it has something to to do with the data type of dgms?j=0A1 = np.array([])
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms1[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms2[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms3[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms4[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms5[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms6[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms7[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms8[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms9[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms10[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms11[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms12[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms13[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms14[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms15[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms16[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms17[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms18[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms19[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms20[j], q=2))
A1 =np.append(A1, d.wasserstein_distance(dgms2[j], dgms21[j], q=2))