Skip to content

[ENH] CSV Import: guess data types#4838

Merged
ales-erjavec merged 1 commit intobiolab:masterfrom
PrimozGodec:csvimport-autovariable
Jun 5, 2020
Merged

[ENH] CSV Import: guess data types#4838
ales-erjavec merged 1 commit intobiolab:masterfrom
PrimozGodec:csvimport-autovariable

Conversation

@PrimozGodec
Copy link
Contributor

Issue

Implements #4794

Description of changes

Implemented guessing strategy for CSV import which should match the strategy in io_utils.guess_data_type.

This PR adds another iteration over the columns. For each column, it checks for the data type according to data. Complex operations here are:

  • unique: Pandas unique is based on a hash table - less complex than Numpy's one
  • casting times to date-time

@ales-erjavec is this kind of guessing acceptable or would decrease the performance of the widget too much?

TODO

  • Tests
Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jun 2, 2020

Codecov Report

Merging #4838 into master will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4838      +/-   ##
==========================================
+ Coverage   84.01%   84.07%   +0.05%     
==========================================
  Files         281      277       -4     
  Lines       56901    56487     -414     
==========================================
- Hits        47804    47490     -314     
+ Misses       9097     8997     -100     

@ales-erjavec
Copy link
Contributor

is this kind of guessing acceptable or would decrease the performance of the widget too much?

I think it is ok.

@PrimozGodec
Copy link
Contributor Author

PrimozGodec commented Jun 2, 2020

Ok. Then I will write a few tests before it is merged.

@PrimozGodec PrimozGodec force-pushed the csvimport-autovariable branch 2 times, most recently from 2d5cdc5 to e970535 Compare June 3, 2020 11:02
@PrimozGodec PrimozGodec force-pushed the csvimport-autovariable branch from e970535 to 9c19f2f Compare June 3, 2020 11:16
@ales-erjavec ales-erjavec changed the title CSV Import: guess data types [ENH] CSV Import: guess data types Jun 5, 2020
@ales-erjavec ales-erjavec merged commit 8ecbe68 into biolab:master Jun 5, 2020
@PrimozGodec PrimozGodec deleted the csvimport-autovariable branch January 21, 2022 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants