Skip to content

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Nov 24, 2016

(I plan to fix #10381 and #12154 next, but first we need coherence on type(data_columns) - and Index seems to me the best option.)

@codecov-io
Copy link

codecov-io commented Nov 24, 2016

Current coverage is 85.21% (diff: 50.00%)

Merging #14728 into master will decrease coverage by 0.05%

@@             master     #14728   diff @@
==========================================
  Files           144        143     -1   
  Lines         50947      50804   -143   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43445      43293   -152   
- Misses         7502       7511     +9   
  Partials          0          0          

Powered by Codecov. Last update 5d0e157...1f70a6e

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

this is technically an API change because we currently store the data_columns as a pickled list (there is a separate issues on actually storing this as a table). I am surpised this didn't break.

So i would prefer to keep this as a list for now, though see if you can create a tests which fails on this.

@jreback jreback added the IO HDF5 read_hdf, HDFStore label Nov 24, 2016
@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

the issue you reference is closed. so what issue is this fixing?

@toobaz
Copy link
Member Author

toobaz commented Nov 24, 2016

Regarding the issue: yes, it is closed, but not solved, it is different from the #11364 you referred to. I can open a new issue if you prefer. But the thing is: I am not solving #11364 here, I am solving #11412, which is different.

Regarding the data_columns storage: I had missed that indeed, can you give me a pointer to understand more (I don't see that use of pickle in pytables.py)?

My motivation was that in some cases data_columns already is an index (e.g. at line 4257 of pytables.py), but then I can certainly fix that (as I was doing in the previous #12252).

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

then are you solving this? #10381

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L3129

you would need to be pretty careful about this (though not 100% sure it actually matters). It would be nice to be consistent, though everything is a list. yes Index has the same semantics. but I don't want to change this unless it stays consistent (which is already a list for almost everything else).

so bottom line is, keep it a list. If you have a repro where its NOT a list. then pls show that.

@toobaz
Copy link
Member Author

toobaz commented Nov 24, 2016

OK, new version works with lists rather than indices.

No, I'm not fixing #10381, otherwise I wouldn't have written that I plan to fix it when I opened the PR ...

@toobaz
Copy link
Member Author

toobaz commented Nov 25, 2016

The AppVeyor failure is a bug in AppVeyor, right?

@jreback
Copy link
Contributor

jreback commented Nov 25, 2016

I restarted. it sometimes does get stuck.

@jorisvandenbossche jorisvandenbossche changed the title Minitemsizefix BUG: Ensure min_itemsize is always a list (#11412) Nov 25, 2016
@toobaz
Copy link
Member Author

toobaz commented Nov 27, 2016

again...

@toobaz
Copy link
Member Author

toobaz commented Dec 3, 2016

ping




- Bug in ``HDFStore.append()`` with Series and 'index' appearing in ``min_itemsize`` (:issue:`11412`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little bit more verbose as I don't understand what you are fixing.

data_columns = []
elif data_columns is True:
data_columns = obj.columns[:]
data_columns = list(obj.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just fixed in #14791

obj.columns = [name]
return super(AppendableSeriesTable, self).write(
obj=obj, data_columns=obj.columns, **kwargs)
obj=obj, data_columns=list(obj.columns), **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use obj.columns.tolist() to be consistent





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put underneath the whatsnew for #14791

@toobaz
Copy link
Member Author

toobaz commented Dec 5, 2016

Stuck again

@jreback jreback added this to the 0.19.2 milestone Dec 5, 2016
@jreback jreback added the Bug label Dec 5, 2016
@jreback
Copy link
Contributor

jreback commented Dec 5, 2016

thanks!

@jreback jreback closed this in 53bf1b2 Dec 5, 2016
@toobaz toobaz deleted the minitemsizefix branch December 6, 2016 07:14
jorisvandenbossche pushed a commit that referenced this pull request Dec 15, 2016
closes #11412

Author: Pietro Battiston <[email protected]>

Closes #14728 from toobaz/minitemsizefix and squashes the following commits:

e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list

(cherry picked from commit 53bf1b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO HDF5 read_hdf, HDFStore

Projects

None yet

Development

Successfully merging this pull request may close these issues.

min_itemsize not working on MultiIndex columns for Series, with format="table" BUG: DataFrame.to_hdf doesn't pass along min_itemsize for index

3 participants