Skip to content

Commit 8137a53

Browse files
authored
Feature/survival probability (#6)
* Change testing from unittest to pytest * Add survival probability method * Update test to use parametrizations * Test output index matches input. * Fix method so input index persists. * Update Gamestop Power Law notebook * Update README.md * Update README.md
1 parent 71e266e commit 8137a53

File tree

5 files changed

+397
-302
lines changed

5 files changed

+397
-302
lines changed

README.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,19 +15,29 @@ See the [notebooks/README.md](./notebooks/) for more detail.
1515
My favourite notebooks so far:
1616
* [Central Limit Theorem: How the sum of Uniform values is Gaussian](./notebooks/NB-22%20-%20Visual%20Central%20Limit%20Theorem.ipynb)
1717
* [S&P500: How geometric average return is impossible](./notebooks/Notebook-11%20-%20Ergodicity%20and%20S%26P500.ipynb)
18+
* [GameStop: January 2021 was not an outlier if you assume Power Law tails.](./notebooks/NB-25%20-%20Survival%20Plot%20-%20Gamestop.ipynb)
1819

1920
### Functions
2021
* `fattails.metrics.mad()`: Calculates mean absolute deviation.
22+
* `fattails.metrics.get_survival_probability()`: Calculate survival probabilities for a given dataset.
2123

2224
Example:
2325
```
2426
$ pip install fattails
2527
$ python
2628
27-
>>> import fattails
28-
>>> from fattails.metrics import mad
29-
>>> mad([1,2,3]) # Calculate Mean Absolute Deviation of [1,2,3]
29+
>>> import fattails.metrics as fattails
30+
>>>
31+
>>>
32+
>>> fattails.mad([1,2,3]) # Calculate Mean Absolute Deviation of [1,2,3]
3033
0.6666666666666666
34+
>>>
35+
>>>
36+
>>> fattails.get_survival_probability([1,2,3]) # Get survival probability for each value in your data
37+
0 0.75
38+
1 0.50
39+
2 0.25
40+
Name: survival_probability, dtype: float64
3141
```
3242

3343
### Derivations

fattails/metrics.py

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1+
from copy import copy
12
import numpy as np
3+
import pandas as pd
24

35
def mad(x):
46
"""Calculate Mean Absolute Deviation
@@ -24,4 +26,62 @@ def mad(x):
2426

2527
mad = np.mean(absolute_deviation)
2628

27-
return mad
29+
return mad
30+
31+
def get_survival_probability(arr):
32+
"""Calculate sample probability of X >= x, for each value in `arr`.
33+
34+
Duplicate values are treated as ascending values based on
35+
their position in `arr`.
36+
37+
Parameters
38+
----------
39+
arr : array_like
40+
Numeric values on the real number line
41+
42+
Returns
43+
-------
44+
survival_probability_sr : Pandas Series
45+
"""
46+
#---------------------------------------------------
47+
# PREPARE
48+
## Sort values from low to high. Keep track of original
49+
## index and order.
50+
51+
arr = copy(arr) # Copy to avoid accidental mutation
52+
sr = pd.Series(arr) # Ensure we have a pandas series
53+
54+
## Keep a copy of the original index
55+
input_index = sr.index.copy()
56+
57+
## Create index of input order
58+
df = sr.reset_index(name='input_values') # Keeps the input index as a column
59+
df.index.name = 'input_order' # Name the new index
60+
61+
## Sort from low to high and reindex
62+
df = df.sort_values(by='input_values') # sort from low to high
63+
df = df.reset_index()
64+
df.index.name = 'sorted_order' # Name the new index
65+
66+
#---------------------------------------------------
67+
# CALCULATE
68+
69+
# Label relative positions
70+
gap_count = len(sr) + 1 # Think of the Posts and Fences analogy
71+
df['left_gap_count'] = df.index + 1 # Count values <= x
72+
df['right_gap_count'] = gap_count - df.left_gap_count # Count values >= x
73+
74+
# Get survival Probability
75+
df['survival_probability'] = df.right_gap_count / gap_count
76+
77+
#---------------------------------------------------
78+
# FORMAT THE OUTPUT
79+
80+
#Reset Input Order and Index
81+
df = df.sort_values(by='input_order') # sort from low to high
82+
df.index = input_index
83+
84+
#Extract the output series
85+
survival_probability_sr = df.survival_probability
86+
87+
return survival_probability_sr

0 commit comments

Comments
 (0)