-
Notifications
You must be signed in to change notification settings - Fork 268
Open
Description
I'm using parquet on Windows 10 and I have two different parquet files for testing, one is snappy-compressed, one is not compressed.
Simple test code for reading:
with open(filename,'r') as f:
for row in parquet.reader(f):
print row
The uncompressed file throws this error:
File "E:/PythonDir/Diverses/DataTest.py", line 23, in <module>
for row in parquet.reader(f):
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 426, in reader
dict_items)
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 275, in read_data_page
raw_bytes = _read_page(fo, page_header, column_metadata)
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 244, in _read_page
page_header.uncompressed_page_size)
AssertionError: found 87 raw bytes (expected 367)
Reading the compressed file like that gives:
File "E:/PythonDir/Diverses/DataTest.py", line 23, in <module>
for row in parquet.reader(f):
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 393, in reader
footer = _read_footer(fo)
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 71, in _read_footer
footer_size = _get_footer_size(fo)
File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 64, in _get_footer_size
tup = struct.unpack("<i", fo.read(4))
error: unpack requires a string argument of length 4
I can open both files with fastparquet 0.0.5 just fine so there's nothing wrong with the files.
What am I doing wrong?
Do I have to explicitely uncompress the data with snappy or is parquet doing that by itself?
Can you in general provide some more documentation on the basic usage?
Metadata
Metadata
Assignees
Labels
No labels