Assign into preallocated array

The current method is to create numpy arrays (or lists) for a given chunk of a given block.

The creation of a pandas dataframe from numpy arrays, and the concatenation of dataframes are memory and time inefficient. Would be much better to allocate a dataframe up front as is [done in fastparquet](https://github.com/dask/fastparquet/blob/master/fastparquet/dataframe.py#L11) and assign into it. The dtypes come from the parsed global file header. 
Any nested fields would be Object type (although non-repeated structures could be flattened, also implemented in [fastparquet](https://github.com/dask/fastparquet/blob/master/fastparquet/schema.py#L53)).

An avro block states how many records and bytes it has at the head.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assign into preallocated array #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assign into preallocated array #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions