Conversation
|
The PoC notebooks are located in https://github.com/Tatsuya-hasegawa/MSTICPy_utils/tree/main/analysis_rrcf_outliers RRCF can also use "from msticpy.analysis.outliers import plot_outlier_results" function, however that of RRCF is very slow. For instance, this is one of DNS traffic anomaly case. Isolation Forest result is like below. Robust Random Cut Forest is smart like below. Since scikit-learn does not support Robust Random Cut Forest, I implemented RRCF in msticpy in Python instead of Cpython natively. |
ianhelle
left a comment
There was a problem hiding this comment.
I'll have to trust your expertise on how this works and is implemented (which I do).
You might want to include one of the notebooks and add it to the notebooks folder.
I have a couple of lightweight comments - more about format than functionality.
|
Hi , Thanks for your many advices. |
|
Hi Tatsuya, at the top of the imports to ensure that Py3.8 supports this syntax. Also - not a hard requirement - if you have any documentation to add about this (now impressive) module, it would def make it more visible to others. Also if you want to publish an article somewhere (we have a msticpy blog that hasn't been used for ages), we can include that in the release and linkedin/X posts. |
c503721 to
f237dbc
Compare
…to add_rrcf_outlier
…a/msticpy into add_rrcf_outlier
|
Hi Ian, Thank you for the advices. On the other hand, I'm not sure for the compatibility to Python3.8... Also, about notebook documentation and artcle, I'm preparing some notebook files both the current IsolationForest outlier and this RRCF outliers. I'm examining the results of differences for the same datasets. For time series data, Isolation Forest is known to be effective at detecting simple spikes, while RRCF is effective at detecting trend changes and correlations in multidimensional features. For non-time series data, Isolation Forest is also found to be faster and more accurate. |
This should work fine - we're using future annotations in multiple places. Am building now. |
|
OMG, I'm sorry. In addition, I'll add the joblib package to pip requirement as same as rrcf package. |
|
No worries, take your time. :-) |
…the outlier algorisms
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hi Ian, I have finished my tasks with adding some jupyter notebooks! Best regards, |
|
Weird... |
|
Yeah - looks like something changed with maybe a mypy update and possible a respx update. |
…od understanding of numpy
* add robust_random_cut_forest to outliers * modified docstrings, typing to builtin and rrcf module install * fixed max_samples parameter pass to RRCF class * fixed CI/CD errors * fixed the rest typing errors and add jupyter notebooks for comparing the outlier algorisms * Fixing some mypy and test errors * Fixing and/or supressing mypy warnings - I think it doesn't have a good understanding of numpy * pylint fixes * Fixing type annotation in cast for Py3.8 in sentinel_utils.py * Fixing version checking logic * Fixing typo in nbinit --------- Co-authored-by: Ian Hellen <ianhelle@microsoft.com>


Hello Ian,
I implemented Robust Random Cut Forest class and the outlier function.
RRCF is more useful when we detect anomaly of time series data as you know.
Best regards,
Tatsuya