Skip to content

Commit 4f7c5d6

Browse files
authored
aozora-bunko: Replace invalid and undef in Book#text encoding (#251)
GitHub fixes GH-250 Fix the following error: ``` <Encoding::UndefinedConversionError: "\xF8E" from Shift_JIS to UTF-8> <Encoding::InvalidByteSequenceError: "\xE7" followed by "\f" on Shift_JIS> ```
1 parent ed86425 commit 4f7c5d6

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

lib/datasets/aozora-bunko.rb

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,10 @@ def text
9494
downloader.download(text_file_output_path)
9595

9696
@text = ZipExtractor.new(text_file_output_path).extract_first_file do |input|
97-
input.read.encode(Encoding::UTF_8, normalize_encoding(text_file_character_encoding))
97+
input.read.encode(Encoding::UTF_8,
98+
normalize_encoding(text_file_character_encoding),
99+
invalid: :replace,
100+
undef: :replace)
98101
end
99102

100103
@text

0 commit comments

Comments
 (0)