Skip to content

[ENH] Nice binning of time variables (Distributions, SOM)#4123

Merged
ajdapretnar merged 2 commits intobiolab:masterfrom
janezd:discretize-datetime-labels
Oct 22, 2019
Merged

[ENH] Nice binning of time variables (Distributions, SOM)#4123
ajdapretnar merged 2 commits intobiolab:masterfrom
janezd:discretize-datetime-labels

Conversation

@janezd
Copy link
Contributor

@janezd janezd commented Oct 20, 2019

Fixes #3964.

Description of changes
  • provides a general time_binning function, similar to decimal_binning, which can be used in any visualization widget that discretizes numeric features
  • reverses the order of returned binnings in decimal_binning
  • improves the BinDefinition
  • uses the above in Distributions in SOM (the two widgets that already use decimal_binning)
Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd force-pushed the discretize-datetime-labels branch from cc749d9 to f7d6551 Compare October 20, 2019 17:28
@codecov
Copy link

codecov bot commented Oct 20, 2019

Codecov Report

Merging #4123 into master will increase coverage by 0.03%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master    #4123      +/-   ##
==========================================
+ Coverage   85.62%   85.66%   +0.03%     
==========================================
  Files         388      389       +1     
  Lines       69435    69638     +203     
==========================================
+ Hits        59454    59654     +200     
- Misses       9981     9984       +3

@janezd janezd force-pushed the discretize-datetime-labels branch 3 times, most recently from 0b3c15d to 947facb Compare October 21, 2019 10:44
@ajdapretnar
Copy link
Contributor

I ❤️ it!

@ajdapretnar
Copy link
Contributor

So far I was not able to find any faults with it. It is responsive, labels are correct, output is as expected. This makes Distributions really useful for visualizing datetime data!

The only thing that doesn't seem to be implemented yet is the legend in SOM - it shows floats for datetime.

else:
col = self.data.get_column_view(self.attr_color)[0].astype(float)
self.thresholds = decimal_binnings(col, min_bins=4)[0][1:-1]
binning = decimal_binnings(col, min_bins=4)[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forget to check whether self.attr_color is a Time Variable. This doesn't use time_binnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I just ported it to new signature of decimal_binning. It works now.

I'm not sure I like the legend. It uses short labels.

Screenshot 2019-10-21 at 16 32 15

Perhaps the lower bound for each interval should be in the long form. That is, the last two should be 17 Apr - Jul and >= 17 Jul. The second would be in short form, except of course when the longer is needed, like in 16 Oct - 17 Jan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this.

Screenshot 2019-10-21 at 19 27 15


if not return_defs:
bins = [get_bins(bin) for bin in bins]
def time_binnings(data, *, min_bins=2, max_bins=50, min_unique=5, add_unique=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of time_binnings and decimal_binnings are not the same. The latter uses BinDefinition, which has thresholds and labels properties, while the former doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be. OWDistributions calls either time_binnings or decimal_binnings, stores the result and then doesn't care where the bins came from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OWSOM doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was OWSOM's problem. Fixed.

@janezd janezd force-pushed the discretize-datetime-labels branch 2 times, most recently from 97ff40b to 04d9602 Compare October 21, 2019 14:30
@janezd janezd force-pushed the discretize-datetime-labels branch from 94fc142 to 29f8b43 Compare October 21, 2019 19:49
@ajdapretnar ajdapretnar merged commit 4b3f94c into biolab:master Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributions: binning datetime

2 participants