-
Notifications
You must be signed in to change notification settings - Fork 38
Exact MeLiF #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Exact MeLiF #56
Conversation
LastShekel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkout numpy documentation, it should help you to improve code performance
| _RANDOM_STATE = 42 | ||
|
|
||
| def __init__(self, measures, kappa, estimator, score): | ||
| self._measures = measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation with argument explanation and types needed in functions
|
|
||
| for j in range(n_measures): | ||
| score = self._measures[j](X, y) | ||
| for i in range(n_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use numpy array assignment?
| maximum = planes.max() | ||
| min_max_diff = maximum - minimum | ||
|
|
||
| for i in range(shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again you could use array assignment like
(Planes-minimum)/min_max_diff
Or use bumpy minmaxscaler
| return normalized | ||
|
|
||
| def _kappa_filter(self, planes): | ||
| n_measures = len(self._measures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are repeating this string, it is better to create class field with this value
| n_measures = len(self._measures) | ||
|
|
||
| indexed = [] | ||
| for i, plane in enumerate(planes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you could just assign enumerate to indexed without for loop
| kappa_indices = set() | ||
| for i in range(n_measures): | ||
| planes = sorted(planes, key=lambda p: p[1][i]) | ||
| kappa_indices.add(planes[-self._kappa][0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if -self._kappa exceeds list index bounds?
You better check it as early as posible
|
|
||
| filtered_indices = set() | ||
| for i in range(n_measures): | ||
| planes.sort(key=lambda p: p[1][i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you already sorted it before
| planes.sort(key=lambda p: p[1][i]) | ||
|
|
||
| left = 0 | ||
| while planes[left][0] not in kappa_indices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy where or count, should work faster
|
|
||
| intersection = SortedSet(key=cmp_to_key(_double_list_cmp)) | ||
| for k in range(dim): | ||
| for l in range(k + 1, dim): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely this could be optimised with numpy
4 nasted loops is too much
|
|
||
| for j in range(dim): | ||
| point = np.zeros(dim) | ||
| point[j] = planes[i][j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be made with numpy diagonal array
Pull Request Template
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests or set link to that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist: