Skip to content

Failed to create flat dataframe #413

@delucchi-cmu

Description

@delucchi-cmu

Bug report

I tried a silly thing today, and tried to create a nested data frame, using a flat pandas dataframe, but I didn't really want to nest anything. This

df = npd.NestedFrame.from_flat(flat_df, base_columns=flat_df.columns)

Full stacktrace:

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[7], [line 11](vscode-notebook-cell:?execution_count=7&line=11)
      4 from astropy.coordinates import ICRS, Angle, SkyCoord
      6 flat_df = pd.read_csv(
      7     "https://ogle3.snad.space/api/v1/all",
      8     sep='\t',
      9 )
---> [11](vscode-notebook-cell:?execution_count=7&line=11) df = npd.NestedFrame.from_flat(flat_df, base_columns=flat_df.columns)

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/nestedframe/core.py:628, in NestedFrame.from_flat(cls, df, base_columns, nested_columns, on, name)
    626 if nested_columns is None:
    627     nested_columns = [col for col in df.columns if col not in base_columns]
--> [628](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/nestedframe/core.py:628) return out_df.join_nested(df[nested_columns], name=name)

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/nestedframe/core.py:518, in NestedFrame.join_nested(self, obj, name, how, on, dtype)
    516     raise ValueError("Currently we only support a single column for 'on'")
    517 # Add sources to objects
--> [518](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/nestedframe/core.py:518) packed = pack(obj, name=name, on=on, dtype=dtype)
    519 new_df = self.copy()
    520 res = new_df.join(packed, how=how, on=on)

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:57, in pack(obj, name, index, on, dtype)
     34 """Pack a "flat" dataframe or a sequence of dataframes into a "nested" series.
     35 
     36 Parameters
   (...)     54     Output series.
     55 """
     56 if isinstance(obj, pd.DataFrame):
---> [57](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:57)     nested = pack_flat(obj, name=name, on=on)
     58     if index is not None:
     59         nested.index = index

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:101, in pack_flat(df, name, on)
     99 # pandas knows when index is pre-sorted, so it would do nothing if it is already sorted
    100 sorted_flat = df.sort_index(kind="stable")
--> [101](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:101) return pack_sorted_df_into_struct(sorted_flat, name=name)

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:166, in pack_sorted_df_into_struct(df, name)
    163 packed_df = view_sorted_df_as_list_arrays(df)
    164 # No need to validate the dataframe, the length of the nested arrays is forced to be the same by
    165 # the view_sorted_df_as_list_arrays function.
--> [166](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:166) return pack_lists(packed_df, name=name, validate=False)

File ~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:215, in pack_lists(df, name, validate)
    213 if all(chunk_length == chunk_lengths[0] for chunk_length in chunk_lengths):
    214     chunks = []
--> [215](https://file+.vscode-resource.vscode-cdn.net/home/delucchi/git/xmatch/hats-import/tests/data/~/.virtualenvs/oct/lib/python3.12/site-packages/nested_pandas/series/packer.py:215)     num_chunks = next(iter(pa_chunked_arrays.values())).num_chunks
    216     for i in range(num_chunks):
    217         chunks.append(
    218             pa.StructArray.from_arrays(
    219                 [arr.chunk(i) for arr in pa_chunked_arrays.values()],
    220                 names=pa_chunked_arrays.keys(),
    221             )
    222         )

StopIteration:

I would have expected this to create a nested frame, but with no nested columns (and i don't know if that's even a valid expectation).

Environment Information

nested_pandas.version_ = '0.6.3'

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, and any applicable data others will need to reproduce the problem.
  • I have included information about my environment, including the version of this package (e.g. nested_pandas.__version__)
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions