Skip to content

Commit 91456ab

Browse files
committed
Optimize multi-byte character handling in processCarriageReturns
Refactor the logic within the `processCarriageReturns` function to simplify the detection of partially overwritten multi-byte characters (e.g., emojis). Removed redundant checks and clarified the conditions for identifying potential character corruption during carriage return processing. This improves code readability and maintainability while preserving the original functionality of replacing potentially corrupted characters with a space. Also enforced consistent use of semicolons for improved code style.
1 parent 3027e90 commit 91456ab

File tree

1 file changed

+7
-13
lines changed

1 file changed

+7
-13
lines changed

src/integrations/misc/extract-text.ts

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -288,22 +288,16 @@ export function processCarriageReturns(input: string): string {
288288
} else {
289289
// Partial overwrite - need to check for multi-byte character boundary issues
290290
const potentialPartialChar = curLine.charAt(segment.length)
291-
292-
// Cache character code points to avoid repeated charCodeAt calls
293-
const hasPartialChar = potentialPartialChar !== ""
294291
const segmentLastCharCode = segment.length > 0 ? segment.charCodeAt(segment.length - 1) : 0
295-
const partialCharCode = hasPartialChar ? potentialPartialChar.charCodeAt(0) : 0
292+
const partialCharCode = potentialPartialChar.charCodeAt(0)
296293

297-
// Check if character is part of a multi-byte sequence (emoji or other Unicode characters)
298-
// Detect surrogate pairs (high/low surrogates) to identify multi-byte characters
294+
// Simplified condition for multi-byte character detection
299295
if (
300-
hasPartialChar &&
301-
((segment.length > 0 &&
302-
((segmentLastCharCode >= 0xd800 && segmentLastCharCode <= 0xdbff) ||
303-
(partialCharCode >= 0xdc00 && partialCharCode <= 0xdfff))) ||
304-
(curLine.length > segment.length + 1 &&
305-
partialCharCode >= 0xd800 &&
306-
partialCharCode <= 0xdbff))
296+
(segmentLastCharCode >= 0xd800 && segmentLastCharCode <= 0xdbff) || // High surrogate at end of segment
297+
(partialCharCode >= 0xdc00 && partialCharCode <= 0xdfff) || // Low surrogate at overwrite position
298+
(curLine.length > segment.length + 1 &&
299+
partialCharCode >= 0xd800 &&
300+
partialCharCode <= 0xdbff) // High surrogate followed by another character
307301
) {
308302
// If a partially overwritten multi-byte character is detected, replace with space
309303
const remainPart = curLine.substring(segment.length + 1)

0 commit comments

Comments
 (0)