Skip to content

Two different errors when reading two different files #54

@Khris777

Description

@Khris777

I'm using parquet on Windows 10 and I have two different parquet files for testing, one is snappy-compressed, one is not compressed.

Simple test code for reading:

with open(filename,'r') as f:
    for row in parquet.reader(f):
        print row

The uncompressed file throws this error:

  File "E:/PythonDir/Diverses/DataTest.py", line 23, in <module>
	for row in parquet.reader(f):

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 426, in reader
	dict_items)

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 275, in read_data_page
	raw_bytes = _read_page(fo, page_header, column_metadata)

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 244, in _read_page
	page_header.uncompressed_page_size)

AssertionError: found 87 raw bytes (expected 367)

Reading the compressed file like that gives:

  File "E:/PythonDir/Diverses/DataTest.py", line 23, in <module>
	for row in parquet.reader(f):

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 393, in reader
	footer = _read_footer(fo)

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 71, in _read_footer
	footer_size = _get_footer_size(fo)

  File "C:\Users\my.name\AppData\Local\Continuum\Anaconda2\lib\site-packages\parquet\__init__.py", line 64, in _get_footer_size
	tup = struct.unpack("<i", fo.read(4))

error: unpack requires a string argument of length 4

I can open both files with fastparquet 0.0.5 just fine so there's nothing wrong with the files.

What am I doing wrong?
Do I have to explicitely uncompress the data with snappy or is parquet doing that by itself?
Can you in general provide some more documentation on the basic usage?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions