-
Notifications
You must be signed in to change notification settings - Fork 6
fix importing and exporting Session objects from/to CSV and Excel #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
26ab2ba
f9ce292
8cf1fc9
2348257
787f9ca
3c6b5e5
830388c
0844710
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,4 +55,11 @@ Miscellaneous improvements | |
Fixes | ||
^^^^^ | ||
|
||
* fixed something (closes :issue:`1`). | ||
* fixed reading/exporting sessions containing two or more axes/groups | ||
with the same name (or anonymous) from/to CSV, Excel and HDF files (closes :issue:`803`). | ||
|
||
* fixed NaNs and None labels appearing in axes and groups when reading/exporting sessions | ||
from/to CSV and Excel files (closes :issue:`804`). | ||
|
||
* fixed importing/exporting anonymous and/or wildcard axes to CSV and Excel files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was the problem present only when importing/exporting the axes objects themselves or arrays with such axes, or both? |
||
(closes :issue:`805`). |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1185,22 +1185,21 @@ def to_frame(self, fold_last_axis_name=False, dropna=None): | |
b1 6 7 | ||
""" | ||
columns = pd.Index(self.axes[-1].labels) | ||
axes_names = self.axes.display_names[:] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure using display_names here is a good idea? You'll get I am very nervous about this change. I fear it must break "something" somewhere given that to_frame is used in a lot of places (at least indirectly). Pandas dataframes with "{0}*" as explicit name would be ugly, right? |
||
if not fold_last_axis_name: | ||
columns.name = self.axes[-1].name | ||
columns.name = axes_names[-1] | ||
if self.ndim > 1: | ||
axes_names = self.axes.names[:-1] | ||
_axes_names = axes_names[:-1] | ||
if fold_last_axis_name: | ||
tmp = axes_names[-1] if axes_names[-1] is not None else '' | ||
if self.axes[-1].name: | ||
axes_names[-1] = "{}\\{}".format(tmp, self.axes[-1].name) | ||
_axes_names[-1] = "{}\\{}".format(_axes_names[-1], axes_names[-1]) | ||
if self.ndim == 2: | ||
index = pd.Index(data=self.axes[0].labels, name=axes_names[0]) | ||
index = pd.Index(data=self.axes[0].labels, name=_axes_names[0]) | ||
else: | ||
index = pd.MultiIndex.from_product(self.axes.labels[:-1], names=axes_names) | ||
index = pd.MultiIndex.from_product(self.axes.labels[:-1], names=_axes_names) | ||
else: | ||
index = pd.Index(['']) | ||
if fold_last_axis_name: | ||
index.name = self.axes.names[-1] | ||
index.name = axes_names[-1] | ||
data = np.asarray(self).reshape(len(index), len(columns)) | ||
df = pd.DataFrame(data, index, columns) | ||
if dropna is not None: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3454,6 +3454,17 @@ def test_from_series(): | |
assert_array_equal(res, expected) | ||
|
||
|
||
def test_to_frame(): | ||
# array containing anonymous axes | ||
arr = ndtest((Axis(2), Axis(2), Axis(2))) | ||
df = arr.to_frame() | ||
assert df.index.name is None | ||
assert df.index.names == ['{0}*', '{1}*'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it really what we want??? |
||
assert df.columns.name == '{2}*' | ||
assert list(df.index.values) == [(0, 0), (0, 1), (1, 0), (1, 1)] | ||
assert list(df.columns.values) == [0, 1] | ||
|
||
|
||
def test_from_frame(): | ||
# 1) data = scalar | ||
# ================ | ||
|
@@ -3816,6 +3827,18 @@ def test_from_frame(): | |
assert la.axes.names == ['age', 'sex', 'time'] | ||
assert_array_equal(la[0, 'F', :], [3722, 3395, 3347]) | ||
|
||
# 3C) 3 anonymous axes | ||
# ==================== | ||
arr = ndtest((Axis(2), Axis(2), Axis(2))) | ||
df = arr.to_frame() | ||
|
||
la = from_frame(df) | ||
assert la.ndim == 3 | ||
assert la.shape == (2, 2, 2) | ||
for axis in la.axes: | ||
assert axis.name is None | ||
assert axis.iswildcard | ||
|
||
# 4) test sort_rows and sort_columns arguments | ||
# ============================================ | ||
age = Axis('age=2,0,1,3') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work you have put into this PR makes me wonder even more whether saving sessions (and especially when using non LArray objects) to CSV and to Excel, are worth it. I fear they will give us an endless stream of problems for little benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one noticed the bug meaning no one will notice if we drop the ability to save and load axes and groups with the CSV or Excel format.
So, do I edit the title of the corresponding issue and drop Axis and Group objects when calling Session.save()/load() ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure where to draw the line/what's best:
I think we need to discuss this face to face, as it will be easier.