Skip to content

[FIX] Continuize: Disable normalizing sparse data#4379

Merged
markotoplak merged 1 commit intobiolab:masterfrom
VesnaT:continuize_sparse
Feb 14, 2020
Merged

[FIX] Continuize: Disable normalizing sparse data#4379
markotoplak merged 1 commit intobiolab:masterfrom
VesnaT:continuize_sparse

Conversation

@VesnaT
Copy link
Copy Markdown
Contributor

@VesnaT VesnaT commented Jan 30, 2020

Issue

Fixes #4378

Description of changes

Disable Normalize by span and Normalize by standard deviation radio buttons for sparse datasets.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 30, 2020

Codecov Report

Merging #4379 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4379      +/-   ##
==========================================
+ Coverage   87.13%   87.14%   +<.01%     
==========================================
  Files         399      399              
  Lines       72901    72936      +35     
==========================================
+ Hits        63521    63557      +36     
+ Misses       9380     9379       -1

@markotoplak
Copy link
Copy Markdown
Member

What kind of normalization can we still have with sparse data then? We could still do normalization that does some division or multiplication, we just have to avoid shifts (plus, minus).

Another thing: I never associated sparse data and discrete values, but yes, why not... Where do we get discrete sparse data in Orange? Is ti directly read from a file or generated by some text-mining widget?

@ajdapretnar
Copy link
Copy Markdown
Contributor

🤔 I think that if you have some discrete variables in the corpus before bag of words, they would inevitably get transformed into sparse alongside words.

@VesnaT
Copy link
Copy Markdown
Contributor Author

VesnaT commented Jan 30, 2020

The example is described in the issue and yes, it does not make sense but it still should not crash.

@markotoplak
Copy link
Copy Markdown
Member

Yes, but using appropriate operations for sparse data would be better than disabling options. Normalization by span is something that could still be done for sparse data, it should just not be centered.

@VesnaT
Copy link
Copy Markdown
Contributor Author

VesnaT commented Jan 30, 2020

Some types of normalization can be done using a Preprocess widget. Should these be added to the Continuize widget as well?

@janezd janezd added the needs discussion Core developers need to discuss the issue label Feb 13, 2020
@markotoplak markotoplak merged commit 0a65399 into biolab:master Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs discussion Core developers need to discuss the issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Continuize: Normalizing sparse data fails

4 participants