Skip to content

UnicodeEncodeError: 'charmap' codec can't encode character #166

@swahareddy

Description

@swahareddy

This was my command python photon.py -u "https://en.wikipedia.org/wiki/Tom_Crean_(explorer)" -l 2
and this was the output:

    ____  __          __
     / __ \/ /_  ____  / /_____  ____
    / /_/ / __ \/ __ \/ __/ __ \/ __ \
   / ____/ / / / /_/ / /_/ /_/ / / / /
  /_/   /_/ /_/\____/\__/\____/_/ /_/ v1.3.2

 Level 1: 1 URLs
 Progress: 1/1
 Level 2: 478 URLs
 Progress: 478/478
 Crawling 1 JavaScript files
 Progress: 1/1
Traceback (most recent call last):
  File "photon.py", line 385, in <module>
    writer(datasets, dataset_names, output_dir)
  File "C:\Users\Tejaswa\Documents\GitHub\Photon\core\utils.py", line 85, in writer
    out_file.write(str(joined.encode('utf-8').decode('utf-8')))
  File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0142' in position 17758: character maps to <undefined>

I (think I) added a mapping in lib\encodings\cp1252.py by doing:

    .
    .
    '\xff'     #  0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
    '\u0142'     #  0xFF -> LATIN SMALL LETTER L WITH DIAERESIS
)

### Encoding table
encoding_table=codecs.charmap_build(decoding_table)

But I doubt this is correct (the hex values are maxed out at \xff too)

Is there any parameter to ignore such encoding problems that I can specify with photon itself? Or some underlying file to edit?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions