Skip to content

Commit f234f54

Browse files
author
Nathaniel Saul
authored
Merge pull request #12 from Cimagroup/master
Persistent entropy
2 parents e7401fa + e0b17ea commit f234f54

File tree

5 files changed

+322
-1
lines changed

5 files changed

+322
-1
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ __pycache__
88
.ipynb_checkpoints
99

1010
build
11-
dist
11+
dist
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Finding significant difference with persistent entropy"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": 27,
13+
"metadata": {},
14+
"outputs": [],
15+
"source": [
16+
"import numpy as np\n",
17+
"import ripser\n",
18+
"import matplotlib.pyplot as plt\n",
19+
"import random\n",
20+
"from persim.persistent_entropy import *\n",
21+
"from scipy import stats"
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": 1,
27+
"metadata": {},
28+
"outputs": [
29+
{
30+
"ename": "ModuleNotFoundError",
31+
"evalue": "No module named 'cechmate'",
32+
"output_type": "error",
33+
"traceback": [
34+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
35+
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
36+
"\u001b[0;32m<ipython-input-1-fa4902125d99>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mcechmate\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
37+
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'cechmate'"
38+
]
39+
}
40+
],
41+
"source": [
42+
"import cechmate"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"metadata": {},
48+
"source": [
49+
"This notebook shows how persistent entropy can be used to find significant difference in the geometrical distribution of the data. We will distinguish point clouds following a normal distribution from point clouds following a uniform distribution. Persistent entropy allow to use a one dimensional non-parametric statistical test instead of a multivariative test."
50+
]
51+
},
52+
{
53+
"cell_type": "markdown",
54+
"metadata": {},
55+
"source": [
56+
"## Construct the data\n",
57+
"We will generate a sample of 20 point clouds, 10 following a normal distribution and 10 following the uniform one. Each point cloud is 2D and have 50 points."
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": 28,
63+
"metadata": {
64+
"scrolled": true
65+
},
66+
"outputs": [],
67+
"source": [
68+
"# Normal point clouds\n",
69+
"mu = 0.5\n",
70+
"sigma = 0.25\n",
71+
"l1 = []\n",
72+
"for i in range(10):\n",
73+
" d1 = np.random.normal(mu, sigma, (50,2))\n",
74+
" l1.append(d1)\n",
75+
"# Uniform point clouds\n",
76+
"l2 = []\n",
77+
"for i in range(10):\n",
78+
" d2 = np.random.random((50,2))\n",
79+
" l2.append(d2)"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": 29,
85+
"metadata": {},
86+
"outputs": [
87+
{
88+
"data": {
89+
"image/png": "\n",
90+
"text/plain": [
91+
"<Figure size 432x288 with 1 Axes>"
92+
]
93+
},
94+
"metadata": {
95+
"needs_background": "light"
96+
},
97+
"output_type": "display_data"
98+
}
99+
],
100+
"source": [
101+
"# Example of normal and uniform point clouds\n",
102+
"plt.scatter(d1[:,0], d1[:,1], label=\"Normal distribution\")\n",
103+
"plt.scatter(d2[:,0], d2[:,1], label=\"Uniform distribution\")\n",
104+
"plt.axis('equal')\n",
105+
"plt.legend()\n",
106+
"plt.show()\n"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"## Calculate persistent entropy \n",
114+
"In order to calculate persistent entropy, is necessary to generate the persistent diagrams previously. Note that we do not consider the infinity bar in the computation of persistent entropy since it does not give information about the point cloud. "
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": 30,
120+
"metadata": {},
121+
"outputs": [],
122+
"source": [
123+
"# Generate the persistent diagrams using ripser\n",
124+
"p = 0\n",
125+
"dgm_d1 = []\n",
126+
"dgm_d2 = []\n",
127+
"for i in range(len(l1)):\n",
128+
" dgm_d1.append(ripser.ripser(l1[i])['dgms'][p])\n",
129+
" dgm_d2.append(ripser.ripser(l2[i])['dgms'][p])\n",
130+
"# Calculate their persistent entropy.\n",
131+
"e1 = persistent_entropy(dgm_d1)\n",
132+
"e2 = persistent_entropy(dgm_d2)"
133+
]
134+
},
135+
{
136+
"cell_type": "markdown",
137+
"metadata": {},
138+
"source": [
139+
"## Statistical test\n",
140+
"Finally, perform the statistical test which suits better for your aim. In our case, we perform the Mann–Whitney U test. You can claim there are differences in the geometry of both point clouds if the pvalue is smaller than the significance level α (usually α is 0.05)."
141+
]
142+
},
143+
{
144+
"cell_type": "code",
145+
"execution_count": 31,
146+
"metadata": {},
147+
"outputs": [
148+
{
149+
"data": {
150+
"text/plain": [
151+
"MannwhitneyuResult(statistic=3.0, pvalue=0.00021981937631328227)"
152+
]
153+
},
154+
"execution_count": 31,
155+
"metadata": {},
156+
"output_type": "execute_result"
157+
}
158+
],
159+
"source": [
160+
"stats.mannwhitneyu(e1, e2)"
161+
]
162+
}
163+
],
164+
"metadata": {
165+
"kernelspec": {
166+
"display_name": "Python 3",
167+
"language": "python",
168+
"name": "python3"
169+
},
170+
"language_info": {
171+
"codemirror_mode": {
172+
"name": "ipython",
173+
"version": 3
174+
},
175+
"file_extension": ".py",
176+
"mimetype": "text/x-python",
177+
"name": "python",
178+
"nbconvert_exporter": "python",
179+
"pygments_lexer": "ipython3",
180+
"version": "3.6.4"
181+
}
182+
},
183+
"nbformat": 4,
184+
"nbformat_minor": 2
185+
}

0 commit comments

Comments
 (0)