Skip to content

Issue with IncrementalDecoder and pipreqsnb #131437

@mauriciomm7

Description

@mauriciomm7

Bug report

Bug description:

I am runing pipreqsnb . which requires the incremental decoder function IncrementalDecoder from this lib, and it returns this error:

  File "C:\Users\[USERNAME]\anaconda3\envs\pdfparser\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 124872: character maps to <undefined>

To resolve this, you need to specify an encoding that can handle a broader range of characters, such as utf-8, and also specify how to handle decoding errors. Here's how you can modify your IncrementalDecoder class to handle this:

class IncrementalDecoder(codecs.IncrementalDecoder):
    def __init__(self, errors='ignore'):
        super().__init__(errors=errors)
        self.encoding = 'utf-8'

    def decode(self, input, final=False):
        try:
            # Attempt to decode using utf-8
            return codecs.getdecoder(self.encoding)(input, errors=self.errors)[0]
        except UnicodeDecodeError:
            # If decoding fails, use charmap with error handling
            return codecs.charmap_decode(input, errors=self.errors)[0]

But not really sure. Hopefully this solves my issue.

CPython versions tested on:

3.13

Operating systems tested on:

Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-unicodetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions