- Processed file: D:/R.LTWB/.datasets/IDEAM_EDA/Pivot_TMX_CON.csv
- Execution date: 2022-11-07 13:46:21.382182
- Python version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
- Python path: ['D:\R.LTWB\.src', 'D:\R.LTWB', 'D:\R.GISPython', 'D:\R.HydroTools', 'D:\R.GISPython.wiki']
- matplotlib version: 3.6.0
- pandas version: 1.4.3
- numpy version: 1.23.2
- Stations exclude: ['28017140', '25027020', '25027410', '25027490', '25027330', '25027390', '25027630', '25027360', '25027320', '16067010', '25027420']
- Print table sample: True
- Instructions & script: https://github.com/rcfdtools/R.LTWB/tree/main/Section03/Outlier
- License: https://github.com/rcfdtools/R.LTWB/blob/main/LICENSE.md
- Credits: r.cfdtools@gmail.com
Dataframe records head sample
| Fecha | 15015020 | 15065040 | 23215060 | 25025002 | 25025090 | 25025250 | 25025300 | 25025330 | 28015030 | 28015070 | 28025020 | 28025040 | 28025070 | 28025080 | 28025090 | 28025502 | 28035010 | 28035020 | 28035040 | 28035070 | 28045020 | 28045040 | 29065010 | 29065020 | 29065030 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1980-01-01 00:00:00 | 32.2 | nan | nan | nan | 33.6 | nan | nan | nan | nan | 32.4 | nan | nan | 34.8 | nan | nan | nan | nan | 34.6 | nan | nan | nan | nan | nan | nan | 33.2 |
| 1980-01-02 00:00:00 | 32.6 | nan | nan | nan | 33.4 | nan | nan | nan | nan | 29.8 | nan | 29.8 | 34 | nan | nan | nan | nan | 30.8 | nan | nan | nan | nan | nan | nan | 32.8 |
| 1980-01-03 00:00:00 | 33 | nan | nan | nan | 33.8 | nan | nan | nan | 34.3 | 32.4 | nan | 30 | nan | 35 | nan | nan | nan | 34.4 | nan | nan | nan | nan | nan | nan | 33.4 |
Dataframe records tail sample
| Fecha | 15015020 | 15065040 | 23215060 | 25025002 | 25025090 | 25025250 | 25025300 | 25025330 | 28015030 | 28015070 | 28025020 | 28025040 | 28025070 | 28025080 | 28025090 | 28025502 | 28035010 | 28035020 | 28035040 | 28035070 | 28045020 | 28045040 | 29065010 | 29065020 | 29065030 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-12-29 00:00:00 | nan | nan | nan | 35.4 | 34.8 | 34.2 | 36.6 | 33.2 | nan | 33.2 | 32.6 | nan | 34.8 | nan | 36 | nan | 35.6 | 34.8 | 37.2 | nan | nan | nan | nan | 35.2 | nan |
| 2021-12-30 00:00:00 | nan | nan | nan | 34.4 | 35 | 34 | 37.2 | nan | nan | 33.4 | 32.4 | nan | 35.2 | nan | 37 | nan | 35 | 35 | 38 | nan | nan | nan | nan | 34.4 | nan |
| 2021-12-31 00:00:00 | nan | nan | nan | 34.8 | 34.6 | 35.6 | 38 | nan | nan | 33.6 | 33.6 | nan | 36.4 | nan | 36.2 | nan | 36.6 | 35 | 38.4 | nan | nan | nan | nan | 37.6 | nan |
Datatypes for station and nulls values in the initial file
| 15015020 | 15065040 | 23215060 | 25025002 | 25025090 | 25025250 | 25025300 | 25025330 | 28015030 | 28015070 | 28025020 | 28025040 | 28025070 | 28025080 | 28025090 | 28025502 | 28035010 | 28035020 | 28035040 | 28035070 | 28045020 | 28045040 | 29065010 | 29065020 | 29065030 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dtype | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 | float64 |
| Nulls | 6482 | 11876 | 14381 | 7640 | 6855 | 3583 | 5275 | 7735 | 14214 | 2696 | 2375 | 12728 | 1943 | 8265 | 2388 | 3885 | 6899 | 3205 | 5917 | 15299 | 14291 | 14557 | 14448 | 5935 | 4804 |
General statistics table - Initial file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0729 | 1.48932 | 25.4 | 32.2 | 33.2 | 34.2 | 38.4 |
| 15065040 | 3465 | 34.0803 | 1.83791 | 25.1 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3217 | 2.06657 | 23 | 33 | 34.2 | 35.6 | 44 |
| 25025090 | 8486 | 33.5555 | 1.8415 | 26.4 | 32.4 | 33.6 | 34.8 | 40 |
| 25025250 | 11758 | 33.787 | 2.07519 | 26.6 | 32.4 | 34 | 35 | 42.6 |
| 25025300 | 10066 | 34.5617 | 2.3698 | 25.2 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2243 | 2.0554 | 23.2 | 31.8 | 33 | 34.6 | 41.6 |
| 28015030 | 1127 | 34.5657 | 1.85704 | 26.3 | 33.4 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7017 | 2.18347 | 23.8 | 32.2 | 33.6 | 35.2 | 42.2 |
| 28025020 | 12966 | 32.6602 | 2.08594 | 24.6 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7907 | 1.74453 | 22.4 | 28.6 | 30 | 31 | 39.2 |
| 28025070 | 13398 | 34.4325 | 2.35604 | 24.1 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5969 | 2.05285 | 26.4 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3336 | 2.09333 | 23.8 | 33 | 34.4 | 35.8 | 42.3 |
| 28025502 | 11456 | 34.6493 | 2.18198 | 24.8 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0506 | 2.25909 | 25 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7338 | 2.21035 | 25.4 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1116 | 2.03179 | 26 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.5605 | 2.25236 | 26 | 33 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1372 | 1.63127 | 26 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3129 | 1.4911 | 24.8 | 32.4 | 33.4 | 34.2 | 38.6 |
Since the data doesn`t follow a normal distribution, we will calculate the outlier data points using the statistical method called interquartile range (IQR) instead of using Z-score. Using the IQR, the outlier data points are the ones falling below Q1 - 1.5 IQR or above Q3 + 1.5 IQR. The Q1 could be the 25th percentile and Q3 could be the 75th percentile of the dataset, and IQR represents the interquartile range calculated by Q3 minus Q1 (Q3-Q1). 1
Outliers parameters:
- mean: mean value
- std: standard deviation value
- q1: quartile 0.175
- q3: quartile 0.825
- IQR: interquartile range (q3-q1)
- OlLowerLim: outlier bottom limit (q1-1.5*IQR)
- OlUpperLim: outlier top limit (q3+1.5*IQR)
- OlMinVal: minimum outlier value founded
- OlMaxVal: maximum outlier value founded
- OlCount: # outliers founded
- CapLowerLim: capped lower limit for outliers replacement (
$\mu$ - 3.6 *$\sigma$ ) - CapUpperLim: capped upper limit for outliers replacement (
$\mu$ + 3.6 *$\sigma$ )
| mean | std | q1 | q3 | IQR | OlLowerLim | OlUpperLim | OlMinVal | OlMaxVal | OlCount | CapLowerLim | CapUpperLim | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15015020 | 33.0729 | 1.48932 | 31.8 | 34.4 | 2.6 | 35.7 | 38.3 | 25.4 | 38.4 | 17 | 27.7114 | 38.4345 |
| 15065040 | 34.0803 | 1.83791 | 32.4 | 35.8 | 3.4 | 37.5 | 40.9 | 25.1 | 27.1 | 5 | 27.4638 | 40.6967 |
| 23215060 | 32.7978 | 1.77501 | 31 | 34.4 | 3.4 | 36.1 | 39.5 | nan | nan | 0 | 26.4078 | 39.1879 |
| 25025002 | 34.3217 | 2.06657 | 32.4 | 36.2 | 3.8 | 38.1 | 41.9 | 23 | 44 | 5 | 26.882 | 41.7613 |
| 25025090 | 33.5555 | 1.8415 | 31.8 | 35 | 3.2 | 36.6 | 39.8 | 26.4 | 40 | 6 | 26.9261 | 40.1849 |
| 25025250 | 33.787 | 2.07519 | 32 | 35.4 | 3.4 | 37.1 | 40.5 | 26.6 | 42.6 | 15 | 26.3163 | 41.2576 |
| 25025300 | 34.5617 | 2.3698 | 32.4 | 36.8 | 4.4 | 39 | 43.4 | 25.2 | 25.4 | 2 | 26.0304 | 43.093 |
| 25025330 | 33.2243 | 2.0554 | 31.4 | 35.2 | 3.8 | 37.1 | 40.9 | 23.2 | 41.6 | 7 | 25.8248 | 40.6238 |
| 28015030 | 34.5657 | 1.85704 | 33 | 36.4 | 3.4 | 38.1 | 41.5 | 26.3 | 27.9 | 3 | 27.8803 | 41.251 |
| 28015070 | 33.7017 | 2.18347 | 31.6 | 35.8 | 4.2 | 37.9 | 42.1 | 23.8 | 42.2 | 3 | 25.8412 | 41.5622 |
| 28025020 | 32.6602 | 2.08594 | 30.6 | 34.6 | 4 | 36.6 | 40.6 | nan | nan | 0 | 25.1508 | 40.1696 |
| 28025040 | 29.7907 | 1.74453 | 28.2 | 31.2 | 3 | 32.7 | 35.7 | 22.4 | 39.2 | 4 | 23.5104 | 36.071 |
| 28025070 | 34.4325 | 2.35604 | 32.2 | 36.6 | 4.4 | 38.8 | 43.2 | 24.1 | 25.6 | 4 | 25.9508 | 42.9143 |
| 28025080 | 33.5969 | 2.05285 | 31.8 | 35.4 | 3.6 | 37.2 | 40.8 | 26.4 | 26.4 | 1 | 26.2066 | 40.9872 |
| 28025090 | 34.3336 | 2.09333 | 32.4 | 36.2 | 3.8 | 38.1 | 41.9 | 23.8 | 42.3 | 13 | 26.7976 | 41.8696 |
| 28025502 | 34.6493 | 2.18198 | 32.6 | 36.7 | 4.1 | 38.75 | 42.85 | 24.8 | 26.4 | 12 | 26.7942 | 42.5044 |
| 28035010 | 35.0506 | 2.25909 | 33 | 37.2 | 4.2 | 39.3 | 43.5 | 25 | 26.4 | 9 | 26.9179 | 43.1833 |
| 28035020 | 34.7338 | 2.21035 | 32.6 | 36.8 | 4.2 | 38.9 | 43.1 | 25.4 | 26 | 5 | 26.7766 | 42.6911 |
| 28035040 | 36.1116 | 2.03179 | 34.4 | 37.8 | 3.4 | 39.5 | 42.9 | 26 | 29.2 | 42 | 28.7971 | 43.426 |
| 28035070 | 37.1095 | 1.26738 | 36.235 | 38.2 | 1.965 | 39.1825 | 41.1475 | nan | nan | 0 | 32.547 | 41.6721 |
| 28045020 | 33.8226 | 2.13222 | 31.8 | 36 | 4.2 | 38.1 | 42.3 | nan | nan | 0 | 26.1466 | 41.4985 |
| 28045040 | 34.5605 | 2.25236 | 32.4 | 36.6 | 4.2 | 38.7 | 42.9 | 26 | 26 | 1 | 26.4519 | 42.669 |
| 29065010 | 33.5259 | 1.58034 | 32 | 35.2 | 3.2 | 36.8 | 40 | nan | nan | 0 | 27.8366 | 39.2151 |
| 29065020 | 34.1372 | 1.63127 | 32.8 | 35.6 | 2.8 | 37 | 39.8 | 26 | 28.4 | 20 | 28.2646 | 40.0097 |
| 29065030 | 33.3129 | 1.4911 | 32 | 34.6 | 2.6 | 35.9 | 38.5 | 24.8 | 38.6 | 51 | 27.945 | 38.6809 |
- Outliers identified file: Outlier_IQR_Pivot_TMX_CON.csv
- Outliers dropped file: Outlier_IQR_Drop_Pivot_TMX_CON.csv
- Outliers capped file: Outlier_IQR_Cap_Pivot_TMX_CON.csv
- Outliers imputed file: Outlier_IQR_Impute_Pivot_TMX_CON.csv
IQR - General statistics table - Capped file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0737 | 1.48617 | 27.7114 | 32.2 | 33.2 | 34.2 | 38.4345 |
| 15065040 | 3465 | 34.082 | 1.83098 | 27.3 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3222 | 2.06183 | 26.8 | 33 | 34.2 | 35.6 | 41.8 |
| 25025090 | 8486 | 33.5557 | 1.84131 | 26.9261 | 32.4 | 33.6 | 34.8 | 40.1849 |
| 25025250 | 11758 | 33.7867 | 2.07458 | 26.3163 | 32.4 | 34 | 35 | 41.2576 |
| 25025300 | 10066 | 34.5619 | 2.36926 | 25.8 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2245 | 2.05156 | 25.8 | 31.8 | 33 | 34.6 | 40.6238 |
| 28015030 | 1127 | 34.5671 | 1.85119 | 27.8803 | 33.4 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7018 | 2.18237 | 25.4 | 32.2 | 33.6 | 35.2 | 42 |
| 28025020 | 12966 | 32.6602 | 2.08594 | 24.6 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7902 | 1.73652 | 23.5104 | 28.6 | 30 | 31 | 36.071 |
| 28025070 | 13398 | 34.4328 | 2.35485 | 25.8 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5968 | 2.05295 | 26.2066 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3342 | 2.09049 | 26.7976 | 33 | 34.4 | 35.8 | 41.8696 |
| 28025502 | 11456 | 34.6505 | 2.17731 | 26.6 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0518 | 2.25426 | 26.8 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7342 | 2.2088 | 26.4 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1131 | 2.02555 | 28.7971 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.561 | 2.25023 | 26.2 | 33 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1386 | 1.6254 | 28.2646 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3167 | 1.47557 | 27.945 | 32.4 | 33.4 | 34.2 | 38.6809 |
IQR - General statistics table - Imputed file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0828 | 1.46746 | 28 | 32.2 | 33.2 | 34.2 | 38.2 |
| 15065040 | 3465 | 34.0915 | 1.81361 | 27.3 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3251 | 2.05309 | 26.8 | 33 | 34.2 | 35.6 | 41.8 |
| 25025090 | 8486 | 33.5557 | 1.83285 | 27 | 32.4 | 33.6 | 34.8 | 39.8 |
| 25025250 | 11758 | 33.7797 | 2.05733 | 27 | 32.4 | 34 | 35 | 40.4 |
| 25025300 | 10066 | 34.5636 | 2.3662 | 25.8 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2215 | 2.03924 | 25.8 | 31.8 | 33 | 34.6 | 40.6 |
| 28015030 | 1127 | 34.5849 | 1.81864 | 28 | 33.5 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7025 | 2.17901 | 25.4 | 32.2 | 33.6 | 35.2 | 42 |
| 28025020 | 12966 | 32.6602 | 2.08594 | 24.6 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7902 | 1.71904 | 23.8 | 28.6 | 30 | 31 | 35.4 |
| 28025070 | 13398 | 34.4354 | 2.35028 | 25.8 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5979 | 2.05107 | 26.6 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3383 | 2.07681 | 27 | 33 | 34.4 | 35.8 | 41.2 |
| 28025502 | 11456 | 34.6587 | 2.16239 | 26.6 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0605 | 2.23855 | 26.8 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7375 | 2.20289 | 26.4 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1457 | 1.9655 | 29.4 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.5714 | 2.23147 | 26.2 | 33.15 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1511 | 1.60262 | 28.6 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3416 | 1.42724 | 28.2 | 32.4 | 33.4 | 34.2 | 38.4 |
Method 2 - Outliers processing through empirical rule - ER or k-sigma ( $\mu$ - k * $\sigma$ ) with k = 3.6
The empirical rule, also referred to as the three-sigma rule or 68-95-99.7 rule, is a statistical rule which states that for a normal distribution, almost all observed data will fall within three standard deviations (denoted by
Outliers parameters:
- mean: mean value
- std: standard deviation value
- OlMinVal: minimum outlier value founded
- OlMaxVal: maximum outlier value founded
- OlCount: # outliers founded
- CapLowerLim: capped lower limit for outliers replacement (
$\mu$ - 3.6 *$\sigma$ ) - CapUpperLim: capped upper limit for outliers replacement (
$\mu$ + 3.6 *$\sigma$ )
| mean | std | OlMinVal | OlMaxVal | OlCount | CapLowerLim | CapUpperLim | |
|---|---|---|---|---|---|---|---|
| 15015020 | 33.0729 | 1.48932 | 25.4 | 27.6 | 13 | 27.7114 | 38.4345 |
| 15065040 | 34.0803 | 1.83791 | 25.1 | 27.4 | 7 | 27.4638 | 40.6967 |
| 23215060 | 32.7978 | 1.77501 | nan | nan | 0 | 26.4078 | 39.1879 |
| 25025002 | 34.3217 | 2.06657 | 23 | 44 | 8 | 26.882 | 41.7613 |
| 25025090 | 33.5555 | 1.8415 | 26.4 | 26.8 | 3 | 26.9261 | 40.1849 |
| 25025250 | 33.787 | 2.07519 | 41.4 | 42.6 | 5 | 26.3163 | 41.2576 |
| 25025300 | 34.5617 | 2.3698 | 25.2 | 25.8 | 3 | 26.0304 | 43.093 |
| 25025330 | 33.2243 | 2.0554 | 23.2 | 41.6 | 8 | 25.8248 | 40.6238 |
| 28015030 | 34.5657 | 1.85704 | 26.3 | 27.8 | 2 | 27.8803 | 41.251 |
| 28015070 | 33.7017 | 2.18347 | 23.8 | 42.2 | 13 | 25.8412 | 41.5622 |
| 28025020 | 32.6602 | 2.08594 | 24.6 | 25.1 | 5 | 25.1508 | 40.1696 |
| 28025040 | 29.7907 | 1.74453 | 22.4 | 39.2 | 3 | 23.5104 | 36.071 |
| 28025070 | 34.4325 | 2.35604 | 24.1 | 25.8 | 5 | 25.9508 | 42.9143 |
| 28025080 | 33.5969 | 2.05285 | nan | nan | 0 | 26.2066 | 40.9872 |
| 28025090 | 34.3336 | 2.09333 | 23.8 | 42.3 | 13 | 26.7976 | 41.8696 |
| 28025502 | 34.6493 | 2.18198 | 24.8 | 26.7 | 17 | 26.7942 | 42.5044 |
| 28035010 | 35.0506 | 2.25909 | 25 | 26.8 | 11 | 26.9179 | 43.1833 |
| 28035020 | 34.7338 | 2.21035 | 25.4 | 26.6 | 12 | 26.7766 | 42.6911 |
| 28035040 | 36.1116 | 2.03179 | 26 | 28.6 | 29 | 28.7971 | 43.426 |
| 28035070 | 37.1095 | 1.26738 | nan | nan | 0 | 32.547 | 41.6721 |
| 28045020 | 33.8226 | 2.13222 | nan | nan | 0 | 26.1466 | 41.4985 |
| 28045040 | 34.5605 | 2.25236 | 26 | 26.2 | 2 | 26.4519 | 42.669 |
| 29065010 | 33.5259 | 1.58034 | nan | nan | 0 | 27.8366 | 39.2151 |
| 29065020 | 34.1372 | 1.63127 | 26 | 28.2 | 18 | 28.2646 | 40.0097 |
| 29065030 | 33.3129 | 1.4911 | 24.8 | 27.8 | 39 | 27.945 | 38.6809 |
- Outliers identified file: Outlier_ER_Pivot_TMX_CON.csv
- Outliers dropped file: Outlier_ER_Drop_Pivot_TMX_CON.csv
- Outliers capped file: Outlier_ER_Cap_Pivot_TMX_CON.csv
- Outliers imputed file: Outlier_ER_Impute_Pivot_TMX_CON.csv
ER - General statistics table - Capped file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0738 | 1.48604 | 27.7114 | 32.2 | 33.2 | 34.2 | 38.4 |
| 15065040 | 3465 | 34.082 | 1.83074 | 27.4638 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3222 | 2.06175 | 26.882 | 33 | 34.2 | 35.6 | 41.7613 |
| 25025090 | 8486 | 33.5557 | 1.84107 | 26.9261 | 32.4 | 33.6 | 34.8 | 40 |
| 25025250 | 11758 | 33.7866 | 2.07391 | 26.6 | 32.4 | 34 | 35 | 41.2576 |
| 25025300 | 10066 | 34.5619 | 2.36917 | 26.0304 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2245 | 2.05154 | 25.8248 | 31.8 | 33 | 34.6 | 40.6238 |
| 28015030 | 1127 | 34.5671 | 1.85113 | 27.8803 | 33.4 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7019 | 2.18138 | 25.8412 | 32.2 | 33.6 | 35.2 | 41.5622 |
| 28025020 | 12966 | 32.6603 | 2.08556 | 25.1508 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7902 | 1.73642 | 23.5104 | 28.6 | 30 | 31 | 36.071 |
| 28025070 | 13398 | 34.4328 | 2.3548 | 25.9508 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5969 | 2.05285 | 26.4 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3342 | 2.09049 | 26.7976 | 33 | 34.4 | 35.8 | 41.8696 |
| 28025502 | 11456 | 34.6505 | 2.17712 | 26.7942 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0519 | 2.25416 | 26.9179 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7344 | 2.20819 | 26.7766 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1134 | 2.02448 | 28.7971 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.5614 | 2.24905 | 26.4519 | 33 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1386 | 1.62529 | 28.2646 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3167 | 1.47534 | 27.945 | 32.4 | 33.4 | 34.2 | 38.6 |
ER - General statistics table - Imputed file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0816 | 1.47176 | 27.8 | 32.2 | 33.2 | 34.2 | 38.4 |
| 15065040 | 3465 | 34.0954 | 1.80636 | 27.6 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3241 | 2.04776 | 27 | 33 | 34.2 | 35.6 | 41.6 |
| 25025090 | 8486 | 33.558 | 1.83685 | 27 | 32.4 | 33.6 | 34.8 | 40 |
| 25025250 | 11758 | 33.7835 | 2.06818 | 26.6 | 32.4 | 34 | 35 | 41.2 |
| 25025300 | 10066 | 34.5644 | 2.36459 | 26.4 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2225 | 2.03746 | 26 | 31.8 | 33 | 34.6 | 40.6 |
| 28015030 | 1127 | 34.579 | 1.82951 | 27.9 | 33.5 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7038 | 2.16677 | 26 | 32.2 | 33.6 | 35.2 | 41.2 |
| 28025020 | 12966 | 32.6632 | 2.08033 | 25.2 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7926 | 1.72333 | 23.8 | 28.6 | 30 | 31 | 36 |
| 28025070 | 13398 | 34.436 | 2.34909 | 26 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5969 | 2.05285 | 26.4 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3383 | 2.07681 | 27 | 33 | 34.4 | 35.8 | 41.2 |
| 28025502 | 11456 | 34.6622 | 2.15595 | 26.8 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0625 | 2.23493 | 27.2 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7423 | 2.19395 | 26.8 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1359 | 1.98325 | 28.8 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.582 | 2.21129 | 27 | 33.2 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1499 | 1.60481 | 28.4 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3366 | 1.43854 | 28 | 32.4 | 33.4 | 34.2 | 38.6 |
Z score is an important concept in statistics. Z score is also called standard score. This score helps to understand if each data value is greater or smaller than mean and how far away it is from the mean. More specifically, Z score tells how many standard deviations away a data point is from the mean. Z = ( x -
Altought with this method, the identified outliers are the same obtained in Method 2 that uses the empirical rule when the Z-score threshold is the same k-sigma value, the Method 3 creates the Z-score table values. Use this method to compare the identified outliers with differents k-sigma values.
Outliers parameters:
- mean: mean value
- std: standard deviation value
- OlMinVal: minimum outlier value founded
- OlMaxVal: maximum outlier value founded
- OlCount: # outliers founded
- CapLowerLim: capped lower limit for outliers replacement (
$\mu$ - 3.6 *$\sigma$ ) - CapUpperLim: capped upper limit for outliers replacement (
$\mu$ + 3.6 *$\sigma$ )
| mean | std | OlMinVal | OlMaxVal | OlCount | CapLowerLim | CapUpperLim | |
|---|---|---|---|---|---|---|---|
| 15015020 | 33.0729 | 1.48932 | 25.4 | 27.6 | 13 | 27.7114 | 38.4345 |
| 15065040 | 34.0803 | 1.83791 | 25.1 | 27.4 | 7 | 27.4638 | 40.6967 |
| 23215060 | 32.7978 | 1.77501 | nan | nan | 0 | 26.4078 | 39.1879 |
| 25025002 | 34.3217 | 2.06657 | 23 | 44 | 8 | 26.882 | 41.7613 |
| 25025090 | 33.5555 | 1.8415 | 26.4 | 26.8 | 3 | 26.9261 | 40.1849 |
| 25025250 | 33.787 | 2.07519 | 41.4 | 42.6 | 5 | 26.3163 | 41.2576 |
| 25025300 | 34.5617 | 2.3698 | 25.2 | 25.8 | 3 | 26.0304 | 43.093 |
| 25025330 | 33.2243 | 2.0554 | 23.2 | 41.6 | 8 | 25.8248 | 40.6238 |
| 28015030 | 34.5657 | 1.85704 | 26.3 | 27.8 | 2 | 27.8803 | 41.251 |
| 28015070 | 33.7017 | 2.18347 | 23.8 | 42.2 | 13 | 25.8412 | 41.5622 |
| 28025020 | 32.6602 | 2.08594 | 24.6 | 25.1 | 5 | 25.1508 | 40.1696 |
| 28025040 | 29.7907 | 1.74453 | 22.4 | 39.2 | 3 | 23.5104 | 36.071 |
| 28025070 | 34.4325 | 2.35604 | 24.1 | 25.8 | 5 | 25.9508 | 42.9143 |
| 28025080 | 33.5969 | 2.05285 | nan | nan | 0 | 26.2066 | 40.9872 |
| 28025090 | 34.3336 | 2.09333 | 23.8 | 42.3 | 13 | 26.7976 | 41.8696 |
| 28025502 | 34.6493 | 2.18198 | 24.8 | 26.7 | 17 | 26.7942 | 42.5044 |
| 28035010 | 35.0506 | 2.25909 | 25 | 26.8 | 11 | 26.9179 | 43.1833 |
| 28035020 | 34.7338 | 2.21035 | 25.4 | 26.6 | 12 | 26.7766 | 42.6911 |
| 28035040 | 36.1116 | 2.03179 | 26 | 28.6 | 29 | 28.7971 | 43.426 |
| 28035070 | 37.1095 | 1.26738 | nan | nan | 0 | 32.547 | 41.6721 |
| 28045020 | 33.8226 | 2.13222 | nan | nan | 0 | 26.1466 | 41.4985 |
| 28045040 | 34.5605 | 2.25236 | 26 | 26.2 | 2 | 26.4519 | 42.669 |
| 29065010 | 33.5259 | 1.58034 | nan | nan | 0 | 27.8366 | 39.2151 |
| 29065020 | 34.1372 | 1.63127 | 26 | 28.2 | 18 | 28.2646 | 40.0097 |
| 29065030 | 33.3129 | 1.4911 | 24.8 | 27.8 | 39 | 27.945 | 38.6809 |
- Outliers Z-score values file: Outlier_ZScore_Value_Pivot_TMX_CON.csv
- Outliers identified file: Outlier_ZScore_Pivot_TMX_CON.csv
- Outliers dropped file: Outlier_ZScore_Drop_Pivot_TMX_CON.csv
- Outliers capped file: Outlier_ZScore_Cap_Pivot_TMX_CON.csv
- Outliers imputed file: Outlier_ZScore_Impute_Pivot_TMX_CON.csv
Z-score - General statistics table - Capped file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0738 | 1.48604 | 27.7114 | 32.2 | 33.2 | 34.2 | 38.4 |
| 15065040 | 3465 | 34.082 | 1.83074 | 27.4638 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3222 | 2.06175 | 26.882 | 33 | 34.2 | 35.6 | 41.7613 |
| 25025090 | 8486 | 33.5557 | 1.84107 | 26.9261 | 32.4 | 33.6 | 34.8 | 40 |
| 25025250 | 11758 | 33.7866 | 2.07391 | 26.6 | 32.4 | 34 | 35 | 41.2576 |
| 25025300 | 10066 | 34.5619 | 2.36917 | 26.0304 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2245 | 2.05154 | 25.8248 | 31.8 | 33 | 34.6 | 40.6238 |
| 28015030 | 1127 | 34.5671 | 1.85113 | 27.8803 | 33.4 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7019 | 2.18138 | 25.8412 | 32.2 | 33.6 | 35.2 | 41.5622 |
| 28025020 | 12966 | 32.6603 | 2.08556 | 25.1508 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7902 | 1.73642 | 23.5104 | 28.6 | 30 | 31 | 36.071 |
| 28025070 | 13398 | 34.4328 | 2.3548 | 25.9508 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5969 | 2.05285 | 26.4 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3342 | 2.09049 | 26.7976 | 33 | 34.4 | 35.8 | 41.8696 |
| 28025502 | 11456 | 34.6505 | 2.17712 | 26.7942 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0519 | 2.25416 | 26.9179 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7344 | 2.20819 | 26.7766 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1134 | 2.02448 | 28.7971 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.5614 | 2.24905 | 26.4519 | 33 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1386 | 1.62529 | 28.2646 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3167 | 1.47534 | 27.945 | 32.4 | 33.4 | 34.2 | 38.6 |
Z-score - General statistics table - Imputed file
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| 15015020 | 8859 | 33.0816 | 1.47176 | 27.8 | 32.2 | 33.2 | 34.2 | 38.4 |
| 15065040 | 3465 | 34.0954 | 1.80636 | 27.6 | 33 | 34.2 | 35.4 | 39.4 |
| 23215060 | 960 | 32.7978 | 1.77501 | 27 | 31.6 | 32.8 | 34 | 37.4 |
| 25025002 | 7701 | 34.3241 | 2.04776 | 27 | 33 | 34.2 | 35.6 | 41.6 |
| 25025090 | 8486 | 33.558 | 1.83685 | 27 | 32.4 | 33.6 | 34.8 | 40 |
| 25025250 | 11758 | 33.7835 | 2.06818 | 26.6 | 32.4 | 34 | 35 | 41.2 |
| 25025300 | 10066 | 34.5644 | 2.36459 | 26.4 | 32.8 | 34.6 | 36.2 | 41.8 |
| 25025330 | 7606 | 33.2225 | 2.03746 | 26 | 31.8 | 33 | 34.6 | 40.6 |
| 28015030 | 1127 | 34.579 | 1.82951 | 27.9 | 33.5 | 34.6 | 35.8 | 40.4 |
| 28015070 | 12645 | 33.7038 | 2.16677 | 26 | 32.2 | 33.6 | 35.2 | 41.2 |
| 28025020 | 12966 | 32.6632 | 2.08033 | 25.2 | 31.3 | 32.7 | 34.2 | 39.7 |
| 28025040 | 2613 | 29.7926 | 1.72333 | 23.8 | 28.6 | 30 | 31 | 36 |
| 28025070 | 13398 | 34.436 | 2.34909 | 26 | 32.8 | 34.4 | 36.2 | 42.8 |
| 28025080 | 7076 | 33.5969 | 2.05285 | 26.4 | 32.2 | 33.6 | 35 | 40 |
| 28025090 | 12953 | 34.3383 | 2.07681 | 27 | 33 | 34.4 | 35.8 | 41.2 |
| 28025502 | 11456 | 34.6622 | 2.15595 | 26.8 | 33.2 | 34.7 | 36.2 | 41.8 |
| 28035010 | 8442 | 35.0625 | 2.23493 | 27.2 | 33.6 | 35.2 | 36.6 | 42.4 |
| 28035020 | 12136 | 34.7423 | 2.19395 | 26.8 | 33.2 | 34.8 | 36.4 | 41.8 |
| 28035040 | 9424 | 36.1359 | 1.98325 | 28.8 | 35 | 36.4 | 37.4 | 42.4 |
| 28035070 | 42 | 37.1095 | 1.26738 | 34 | 36.6 | 37.4 | 38 | 39.2 |
| 28045020 | 1050 | 33.8226 | 2.13222 | 27.2 | 32.25 | 33.8 | 35.4 | 39.2 |
| 28045040 | 784 | 34.582 | 2.21129 | 27 | 33.2 | 34.6 | 36 | 41.8 |
| 29065010 | 893 | 33.5259 | 1.58034 | 29.4 | 32.4 | 33.6 | 34.6 | 37.2 |
| 29065020 | 9406 | 34.1499 | 1.60481 | 28.4 | 33 | 34.2 | 35.2 | 39.8 |
| 29065030 | 10537 | 33.3366 | 1.43854 | 28 | 32.4 | 33.4 | 34.2 | 38.6 |
The drop files contains the database values without the outliers identified.
The capped files contains the database values and the outliers has been replaced with the lower or upper capped value calculated. Lower outliers could be replaced with negative values because the limit is defined with (mean() - cap_multiplier * std()). In some cases like temperature analysis, the upper outliers values could be replaced with values over the original values and you can try to fix this issue changing the parameter cap_multiplier that defines the stripe values range.
The imputation method replace each outlier value with the mean value that contains the original outliers values.


