-
Notifications
You must be signed in to change notification settings - Fork 16
Display error msg when a non-visible character is present in the csv #1200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display error msg when a non-visible character is present in the csv #1200
Conversation
WalkthroughAdds detection and rejection of non-visible Unicode characters during CSV upload for both Zod3 and Zod4 flows; validates headers ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant W as Web App
participant UU as upload.ts
participant Z as Zod Parser
U->>W: Upload CSV
W->>UU: processInstrumentCSV(csv)
rect #f3f7ff
note right of UU: Trim headers (use .trim())\nHeader pre-checks for non-visible chars (subjectID/date)\nReturn localized UploadError on header failure
end
UU->>Z: parse validation schema (Zod3 or Zod4)
Z-->>UU: schema parse result / error
alt schema error
UU-->>W: localized UploadError (schema parse)
else schema OK
rect #f6fff5
loop per CSV row (rowNumber++)
UU->>UU: per-cell non-visible-char check\n(if found -> localized UploadError with row/col and U+ hex)
UU->>UU: trim/map fields according to schema order
end
end
UU-->>W: parsed data or first UploadError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/web/src/utils/upload.ts (2)
585-587: Fix French string: missing closing quote around row number.- fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}` + fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'`
939-941: Fix French string: missing closing quote around row number (Zod4 path).- fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}` + fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'`
🧹 Nitpick comments (6)
apps/web/src/utils/upload.ts (6)
66-72: Simplify and harden the non‑visible character check (return boolean, cover more code points).Use .test(), return boolean, and include LRM/RLM (200E/200F) and WORD JOINER (2060).
-function nonVisibleCharChecker(entry: string | undefined) { - if (!entry) { - return null; - } - const nonVisibleCharCheck = /[\u200B-\u200D\uFEFF\u180E]/g.exec(entry); - return nonVisibleCharCheck; -} +function nonVisibleCharChecker(entry: string | undefined): boolean { + if (!entry) return false; + // Zero‑width: 200B..200F, WORD JOINER 2060, BOM FEFF, MVS 180E + return /[\u200B-\u200F\u2060\uFEFF\u180E]/.test(entry); +}
517-537: Remove redundant first‑row Subject/Date checks (covered by per‑cell loop and can trip on sample row).These checks duplicate the per‑cell validation and run before sample‑row removal. Drop them to avoid inconsistent behavior.
- const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { - return reject( - new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } - if (regexResultDate !== null) { - return reject( - new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - }
541-543: Make sample‑row detection resilient to hidden characters (MVS/ZW).*Strip non‑visible chars before comparing to sample tokens. This preserves compatibility with older templates containing MVS.
- dataLines[0][0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && - dataLines[0][1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] + dataLines[0][0]?.replace(/[\u200B-\u200F\u2060\uFEFF\u180E]/g, '').trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && + dataLines[0][1]?.replace(/[\u200B-\u200F\u2060\uFEFF\u180E]/g, '').trim() === INTERNAL_HEADERS_SAMPLE_DATA[1]
870-871: Fix comment typos and align terminology.- //return an error if non space charaters are found + // return an error if non‑visible characters are found
872-892: Remove redundant first‑row Subject/Date checks (same rationale as Zod3 path).- let rowNumber = 1; - - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { - return reject( - new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } - if (regexResultDate !== null) { - return reject( - new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } + let rowNumber = 1;
895-899: Same resilience for sample‑row detection in Zod4 path.- dataLines[0][0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && - dataLines[0][1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] + dataLines[0][0]?.replace(/[\u200B-\u200F\u2060\uFEFF\u180E]/g, '').trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && + dataLines[0][1]?.replace(/[\u200B-\u200F\u2060\uFEFF\u180E]/g, '').trim() === INTERNAL_HEADERS_SAMPLE_DATA[1]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(5 hunks)
🔇 Additional comments (1)
apps/web/src/utils/upload.ts (1)
605-607: LGTM: clearer schema‑parse failure copy with FR translation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
551-555: Guard against short rows (avoid crash on undefined cell values)
elements[i]!.trim()will throw if a row has fewer columns than headers. Default missing cells to empty string.- for (let i = 0; i < headers.length; i++) { - const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); - if (rawValue === '\n') { - continue; - } + for (let i = 0; i < headers.length; i++) { + const key = headers[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') continue;
🧹 Nitpick comments (8)
apps/web/src/utils/upload.ts (8)
66-72: Return a boolean from the hidden-character checkerMake the helper boolean and prefer
.testover.execto simplify call sites and avoid nullable checks.-function nonVisibleCharChecker(entry: string | undefined) { - if (!entry) { - return null; - } - const nonVisibleCharCheck = /[\u200B-\u200D\uFEFF\u180E]/g.exec(entry); - return nonVisibleCharCheck; -} +function nonVisibleCharChecker(entry?: string): boolean { + if (!entry) return false; + return /[\u200B-\u200D\uFEFF\u180E]/.test(entry); +}
517-537: Remove redundant pre-check on row 1 (avoid false positives with sample row)This duplicate check is unnecessary (per-cell validation below catches it) and can incorrectly flag the sample row. Remove it and keep
rowNumberinitialization.- let rowNumber = 1; - - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { - return reject( - new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } - if (regexResultDate !== null) { - return reject( - new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } + let rowNumber = 1;
539-543: Update stale comment to reflect current behaviorThe code no longer strips MVS; it just checks the sample row values. Update the comment.
- //remove sample data if included remove any mongolian vowel separators + // Remove sample data row if present if ( dataLines[0][0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && dataLines[0][1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1]
583-585: Clarify parse error context (EN/FR phrasing)Shorten EN and fix FR prepositions.
- en: `${error.description.en} at column name: '${key}' and row number '${rowNumber}'`, - fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'` + en: `${error.description.en} at column '${key}' and row ${rowNumber}`, + fr: `${error.description.fr} à la colonne '${key}' et à la ligne ${rowNumber}`
591-594: Improve FR translation for generic CSV error- en: `Error parsing CSV`, - fr: `Erreur avec la CSV` + en: `Error parsing CSV`, + fr: `Erreur lors de l'analyse du CSV`
867-890: Remove Zod4 pre-check and fix typoSame redundancy and sample-row false positive risk as Zod3; also fix “charaters”.
- //remove sample data if included (account for old mongolian vowel separator templates) - //return an error if non space charaters are found - - let rowNumber = 1; - - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { - return reject( - new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } - if (regexResultDate !== null) { - return reject( - new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` - }) - ); - } + // Remove sample data row if present + let rowNumber = 1;
935-937: Clarify parse error context (EN/FR phrasing) — Zod4- en: `${error.description.en} at column name: '${key}' and row number '${rowNumber}'`, - fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'` + en: `${error.description.en} at column '${key}' and row ${rowNumber}`, + fr: `${error.description.fr} à la colonne '${key}' et à la ligne ${rowNumber}`
943-945: Improve FR translation for generic CSV error — Zod4- en: `Error parsing CSV`, - fr: `Erreur avec la CSV` + en: `Error parsing CSV`, + fr: `Erreur lors de l'analyse du CSV`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (3)
apps/web/src/utils/upload.ts (3)
558-567: Use boolean checker and polish messages (quotes, casing, FR preposition)Same change as previously suggested.
- //Check for non visible char in every row, return error if present - if (nonVisibleCharChecker(rawValue) !== null) { + // Return error if any non‑visible character is present + if (nonVisibleCharChecker(rawValue)) { return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains Non-visible characters`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters`, + fr: `La valeur à la ligne ${rowNumber} et à la colonne '${key}' contient des caractères non visibles` }) ); }
603-605: LGTM: clearer schema parse failure messagingMessage content and FR translation look good.
911-919: Use boolean checker and polish messages (quotes, casing, FR preposition) — Zod4Same change as previously suggested.
- // Return error if any non‑visible character is present - if (nonVisibleCharChecker(rawValue) !== null) { + // Return error if any non‑visible character is present + if (nonVisibleCharChecker(rawValue)) { return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains Non-visible characters`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters`, + fr: `La valeur à la ligne ${rowNumber} et à la colonne '${key}' contient des caractères non visibles` }) ); }
128853d to
d5f0ba7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
551-554: Guard against short rows to avoid.trim()on undefined (Zod3 path)Short rows will throw here; guard before trimming.
- const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim();
🧹 Nitpick comments (4)
apps/web/src/utils/upload.ts (4)
66-72: Return a boolean from nonVisibleCharChecker; prefer RegExp.test with Unicode flagSimplifies usage and avoids exec-with-g side effects.
-function nonVisibleCharChecker(entry: string | undefined) { - if (!entry) { - return null; - } - const nonVisibleCharCheck = /[\u200B-\u200D\uFEFF\u180E]/g.exec(entry); - return nonVisibleCharCheck; -} +function nonVisibleCharChecker(entry?: string): boolean { + return !!entry && /[\u200B-\u200D\uFEFF\u180E]/u.test(entry); +}
517-537: Use boolean checker and fix messages (Subject/Date pre-check)Remove temp vars; lowercase “non-visible”; improve FR wording.
- const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { + if (nonVisibleCharChecker(dataLines[0][0])) { return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, + en: `Subject ID at row ${rowNumber} contains non-visible characters`, fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` }) ); } - if (regexResultDate !== null) { + if (nonVisibleCharChecker(dataLines[0][1])) { return reject( new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` + en: `Date at row ${rowNumber} contains non-visible characters`, + fr: `La date à la ligne ${rowNumber} contient des caractères non visibles` }) ); }
585-585: French phrasing nit in aggregated error“à la colonne … et à la ligne …” reads better.
- fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'` + fr: `${error.description.fr} à la colonne '${key}' et à la ligne '${rowNumber}'`
935-937: French phrasing nit in aggregated error (Zod4)Align with Zod3 suggestion.
- fr: `${error.description.fr} au nom de colonne : '${key}' et numéro de ligne '${rowNumber}'` + fr: `${error.description.fr} à la colonne '${key}' et à la ligne '${rowNumber}'`
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (8)
apps/web/src/utils/upload.ts (8)
541-543: LGTM: sample-row detection via trim is cleanerTrim-based comparison is a good replacement for MVS stripping.
558-567: Use boolean checker and polish messages (per‑cell Zod3 path)Lowercase “non-visible” and quote column name.
- //Check for non visible char in every row, return error if present - if (nonVisibleCharChecker(rawValue) !== null) { + // Return error if any non-visible character is present + if (nonVisibleCharChecker(rawValue)) { return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains Non-visible characters`, + en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` }) ); }
603-605: LGTM: clearer schema-failure message (Zod3)Improved guidance is helpful.
868-890: Use boolean checker; fix comment and messages (Subject/Date pre-check, Zod4 path)Updates wording and removes temp vars.
- //return an error if non space charaters are found + // Return an error if non-visible characters are found - - let rowNumber = 1; - - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { + let rowNumber = 1; + if (nonVisibleCharChecker(dataLines[0][0])) { return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains Non-visible characters`, + en: `Subject ID at row ${rowNumber} contains non-visible characters`, fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` }) ); } - if (regexResultDate !== null) { + if (nonVisibleCharChecker(dataLines[0][1])) { return reject( new UploadError({ - en: `Date at row ${rowNumber} contains Non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` + en: `Date at row ${rowNumber} contains non-visible characters`, + fr: `La date à la ligne ${rowNumber} contient des caractères non visibles` }) ); }
893-895: LGTM: sample-row detection via trim (Zod4)Consistent with Zod3 path.
911-919: Use boolean checker and polish messages (per‑cell Zod4 path)Lowercase “non-visible” and quote column name.
- // Return error if any non‑visible character is present - if (nonVisibleCharChecker(rawValue) !== null) { + // Return error if any non‑visible character is present + if (nonVisibleCharChecker(rawValue)) { return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains Non-visible characters`, + en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` }) ); }
955-957: LGTM: clearer schema-failure message (Zod4)Matches Zod3.
903-910: Guard against short rows to avoid.trim()on undefined (Zod4 path)Same crash risk as Zod3 path.
- const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
552-566: Fix false negatives when BOM characters are present.
rawValueis trimmed before invokingnonVisibleCharChecker, butString.prototype.trim()strips some of the very characters we want to detect (e.g.\uFEFF/ zero-width no-break space, possibly at either edge). As a result, a cell containing only a leading/trailing BOM passes the new guard even though it should be rejected. Please run the checker on the original cell string before trimming (and keep the trimmed value for later parsing). This also lets us avoid the unsafeelements[i]!when the row is short.- const rawValue = elements[i]!.trim(); - if (rawValue === '\n') { - continue; - } - - //Check for non visible char in every row, return error if present - if (nonVisibleCharChecker(rawValue) !== null) { + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') { + continue; + } + + // Return error if any non-visible character is present + if (nonVisibleCharChecker(cell) !== null) { return reject( new UploadError({ en: `Value at row ${rowNumber} and column ${key} contains non-visible characters`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` }) ); }
fe8460e to
1c21715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
apps/web/src/utils/upload.ts (3)
519-537: Enhance error messages with character code.Per past feedback, error messages should indicate which character was detected. Currently users only know "non-visible characters" exist but not which one.
Apply this diff to include the character code:
- const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); - - if (regexResultSubject !== null) { + const matchSubject = nonVisibleCharChecker(dataLines[0][0]); + const matchDate = nonVisibleCharChecker(dataLines[0][1]); + + if (matchSubject !== null) { + const charCode = matchSubject[0]?.charCodeAt(0).toString(16).toUpperCase(); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } - if (regexResultDate !== null) { + if (matchDate !== null) { + const charCode = matchDate[0]?.charCodeAt(0).toString(16).toUpperCase(); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
872-890: Enhance error messages with character code (Zod4 path).Same as Zod3: include the character code in error messages for better debugging.
Apply the same enhancement as suggested for lines 519-537 in the Zod3 path.
66-72: Simplify return type to boolean.The function returns a regex exec result but all call sites only check
!== null. Since the matched character info is never used in error messages, return a boolean instead:return /[\u200B-\u200D\uFEFF\u180E]/g.test(entry);The character selection (zero-width spaces, joiners, and BOM) is standard for CSV validation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (5)
apps/web/src/utils/upload.ts (5)
539-545: Good fix for sample data detection.Using trimmed comparison instead of character stripping is cleaner and more robust.
603-605: French translation added.Good addition for consistency.
892-897: Good fix for sample data detection (Zod4 path).Consistent with Zod3 improvement.
952-954: French translation present.Consistent localization across both paths.
905-916: Check raw cell value before trimming (Zod4 path).The cell safety guard is good, but you're still checking
rawValue(trimmed) for non-visible characters. This misses leading/trailing characters like BOM. Checkcellbefore trimming, as flagged in past comments.Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - if (nonVisibleCharChecker(rawValue) !== null) { + // Check for non-visible character in raw cell before trimming + const match = nonVisibleCharChecker(cell); + if (match !== null) { + const charCode = match[0]?.charCodeAt(0).toString(16).toUpperCase(); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains non-visible characters`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles` + en: `The value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); }Based on past review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
518-538: Minor formatting inconsistency in French error message.Line 527 has an extra space before the character placeholder compared to line 535.
- fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}`Ensure consistent spacing around interpolated values in both French messages.
540-546: Update outdated comment.The comment references removing Mongolian vowel separators, but the code now uses trimmed string comparison.
- //remove sample data if included remove any mongolian vowel separators + // Remove sample data row if present if (
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🧰 Additional context used
🪛 Biome (2.1.2)
apps/web/src/utils/upload.ts
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
🔇 Additional comments (2)
apps/web/src/utils/upload.ts (2)
870-870: Fix typo."charaters" → "characters"
- //return an error if non space charaters are found + // Return an error if non-visible characters are foundBased on learnings (past comment flagged this typo).
559-568: Check raw cell value before trimming.Checking the trimmed
rawValueloses leading/trailing non-visible characters like BOM (\uFEFF). Checkelements[i]directly before trimming.for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) {Based on learnings (past comments flagged this issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
553-568: Check raw cell before trimming.Line 554 trims first, then line 560 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM
\uFEFF) are lost.for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { + const charCode = nonVisibleChars[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); }Based on learnings (past comments flagged this exact issue).
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
520-538: Include Unicode code point in error messages.Error messages show the character itself
${regexResultSubject[0]}, which may render as invisible. Format as hex code (e.g.,U+FEFF) for clarity:const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }Based on learnings (past comment asked for char code indication).
874-892: Include Unicode code point in error messages.Error messages show the character itself, which may be invisible. Format as hex code:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🧰 Additional context used
🪛 Biome (2.1.2)
apps/web/src/utils/upload.ts
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
🔇 Additional comments (2)
apps/web/src/utils/upload.ts (2)
542-546: LGTM: Sample data detection.Trimmed comparison correctly identifies sample data rows without character-specific stripping.
894-899: LGTM: Sample data detection.Trimmed comparison correctly identifies sample data rows, consistent with Zod3 path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
apps/web/src/utils/upload.ts (3)
520-538: Include character code in error messages.The error messages show the character but should include its Unicode code point for clarity:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `La date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
870-870: Fix typo."charaters" → "characters"
- //return an error if non space charaters are found + //return an error if non-space characters are foundBased on learnings.
874-892: Include character code in error messages.Mirror the Zod3 fix to include Unicode code points:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `La date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🧰 Additional context used
🪛 Biome (2.1.2)
apps/web/src/utils/upload.ts
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
[error] 70-70: Unexpected control character in a regular expression.
Control characters are unusual and potentially incorrect inputs, so they are disallowed.
(lint/suspicious/noControlCharactersInRegex)
🔇 Additional comments (1)
apps/web/src/utils/upload.ts (1)
559-568: Check raw cell before trimming.Line 554 trims first, then line 560 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM) are lost.
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { + const charCode = nonVisibleChars[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
| const cell = elements[i]; | ||
| const rawValue = cell == null ? '' : cell.trim(); | ||
| if (rawValue === '\n') continue; | ||
| // Return error if any non‑visible character is present | ||
| const nonVisibleChars = nonVisibleCharChecker(rawValue); | ||
| if (nonVisibleChars !== null) { | ||
| return reject( | ||
| new UploadError({ | ||
| en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`, | ||
| fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}` | ||
| }) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check raw cell before trimming.
Line 908 trims first, then line 911 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM) are lost.
for (let i = 0; i < headers.length; i++) {
const key = headers[i]!.trim();
const cell = elements[i];
- const rawValue = cell == null ? '' : cell.trim();
if (rawValue === '\n') continue;
- // Return error if any non‑visible character is present
- const nonVisibleChars = nonVisibleCharChecker(rawValue);
+ // Check for non-visible character before trimming
+ const nonVisibleChars = nonVisibleCharChecker(cell);
if (nonVisibleChars !== null) {
+ const charCode = nonVisibleChars[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0');
return reject(
new UploadError({
- en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`,
- fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}`
+ en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`,
+ fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})`
})
);
}
+ const rawValue = cell == null ? '' : cell.trim();Based on learnings.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/web/src/utils/upload.ts around lines 907 to 919, the code trims the cell
before checking for non-visible characters so leading/trailing non-visible
characters (e.g., BOM) are lost; change the flow to first capture the original
cell string (e.g., rawBeforeTrim = cell == null ? '' : cell), run
nonVisibleCharChecker against rawBeforeTrim and return the UploadError if it
finds characters, then set the trimmed value (rawValue = rawBeforeTrim.trim())
for the subsequent logic (including the '\n' check) so trimming happens after
validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
521-539: Include Unicode code point in error messages.Pre-validation correctly runs before trimming, but error messages show the raw character rather than its code point. Past feedback requested displaying char codes (e.g., "U+FEFF").
if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
875-893: Include Unicode code point in error messages (Zod4 path).Mirror the Zod3 feedback: error messages should display the Unicode code point (e.g., "U+FEFF") rather than the raw character for clarity.
if (regexResultSubject !== null) { + const charCode = regexResultSubject[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (3)
apps/web/src/utils/upload.ts (3)
66-74: Positive-match regex now permits non-Latin scripts.The regex correctly targets specific non-visible/format characters (zero-width spaces, BOM, soft hyphen, etc.) without blocking valid Cyrillic, CJK, Arabic, or other scripts. This addresses the critical feedback from past reviews.
908-920: Check raw cell before trimming to catch leading/trailing BOM (Zod4 path).Line 909 trims first, then line 912 checks the trimmed value. Leading/trailing non-visible characters are lost before validation.
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); - if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { + const charCode = nonVisibleChars[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') continue;Based on learnings.
560-569: Check raw cell before trimming to catch leading/trailing BOM.Line 555 trims the cell, then line 561 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM at \uFEFF) are stripped before validation and won't be detected.
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { + const charCode = nonVisibleChars[0]?.charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column ${key} contains non-visible characters ${nonVisibleChars[0]}`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles ${nonVisibleChars[0]}` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); }Based on learnings.
cdcba35 to
cd5b487
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
apps/web/src/utils/upload.ts (2)
560-570: Check raw cell before trimming.Line 561 checks
rawValue(already trimmed on line 555), so leading/trailing non-visible characters like BOM are lost. Checkelements[i]before trimming:for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractère(s) non visibles (U+${charCode})` }) ); }Based on learnings.
913-922: Check raw cell before trimming (Zod4 path).Line 913 checks
rawValue(trimmed on line 910), losing leading/trailing non-visible characters. Checkcellbefore trimming:for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim();Based on learnings.
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
521-539: Include character codes in pre-validation errors.The pre-validation checks raw values correctly but error messages show the character directly rather than as a Unicode code point. Match the per-cell format:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
876-894: Include character codes in pre-validation errors (Zod4 path).Mirror the Zod3 enhancement—format error messages with Unicode code points:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (1)
apps/web/src/utils/upload.ts (1)
66-74: Good fix—positive pattern now allows non-Latin scripts.The regex correctly targets specific non-visible characters (zero-width spaces, BOM, soft hyphen, etc.) without blocking valid Cyrillic, CJK, or other Unicode text. The commented-out regex on line 71 can be removed to reduce clutter.
cd5b487 to
15f6843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
554-570: Check raw cell before trimming.Line 555 trims first, then line 561 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM) are lost before validation runs.
Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractère(s) non visibles (U+${charCode})` }) ); }Based on learnings.
♻️ Duplicate comments (2)
apps/web/src/utils/upload.ts (2)
909-922: Check raw cell before trimming.Line 910 trims first, then line 913 checks the trimmed value. Leading/trailing non-visible characters are lost.
Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non-visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim();Based on learnings.
521-539: Include character code in header error messages.The per-cell validation (lines 563-567) includes the Unicode code point (e.g.,
U+FEFF), but header validation only shows the raw character. For consistency and clarity, add the character code here as well.Apply this diff:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/web/src/utils/upload.ts(7 hunks)
🔇 Additional comments (5)
apps/web/src/utils/upload.ts (5)
66-74: Regex correctly targets non-visible characters.The positive pattern now matches specific format/zero-width characters without blocking valid non-Latin scripts. The function returning the match array (instead of boolean) is appropriate since downstream error messages extract the character code from
nonVisibleChars[0].
543-544: LGTM.Sample data detection correctly uses trimmed comparison after validation has already caught non-visible characters.
588-608: LGTM.French translations added for error context messages.
897-898: LGTM.Sample data detection matches Zod3 approach.
939-960: LGTM.French translations added for error context messages, matching Zod3 path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
apps/web/src/utils/upload.ts (4)
568-578: Check raw cell before trimming.Line 563 trims
elements[i], then line 569 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM) are lost before validation.Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractère(s) non visibles (U+${charCode})` }) ); }Based on learnings.
917-930: Check raw cell before trimming.Line 918 trims the cell, then line 921 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM) are lost.
Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); - if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Return error if any non‑visible character is present + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters (U+${charCode})`, fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') continue;Based on learnings.
884-902: Include character code in header error messages.Same issue as Zod3 path: header validation shows only the raw character, while per-cell validation (lines 923-927) includes the Unicode code point.
Apply this diff:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
529-547: Include character code in header error messages.Header validation (lines 532-547) displays only the raw character, while per-cell validation (lines 571-575) includes the Unicode code point. This inconsistency makes debugging harder.
Apply this diff:
const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); if (regexResultSubject !== null) { + const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { + const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
apps/web/src/utils/upload.ts (4)
560-577: Guard against short rows and check raw cell before trimming.Calling trim() on undefined will throw; trimming also hides leading/trailing BOM and similar characters. Check the original cell first; then trim for parsing.
- const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); - if (rawValue === '\n') { - continue; - } - - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + const key = headers[i]!.trim(); + const cell = elements[i]; + // Check for non-visible characters before trimming (preserves leading/trailing BOM, etc.) + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { - const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); + const charCode = nonVisibleChars[0].codePointAt(0)?.toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractère(s) non visibles (U+${charCode})` + en: `The value at row ${rowNumber} and column '${key}' contains a non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et à la colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') { + continue; + }
526-546: Move sample-row removal before header checks; include code point and fix FR.Running the header checks before removing the sample row can raise false errors on legacy templates with hidden chars. Also include the Unicode code point and fix French grammar.
let rowNumber = 1; - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); + // Remove sample data row first (handles old templates with hidden chars) + if ( + dataLines[0]?.[0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && + dataLines[0]?.[1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] + ) { + dataLines.shift(); + } + + const [subjectCell, dateCell] = dataLines[0] ?? []; + const regexResultSubject = nonVisibleCharChecker(subjectCell); + const regexResultDate = nonVisibleCharChecker(dateCell); if (regexResultSubject !== null) { + const code = regexResultSubject[0]?.codePointAt(0)?.toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains a non-visible character (U+${code})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${code})` }) ); } if (regexResultDate !== null) { + const code = regexResultDate[0]?.codePointAt(0)?.toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains a non-visible character (U+${code})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${code})` }) ); } - - //remove sample data if included remove any mongolian vowel separators - if ( - dataLines[0][0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && - dataLines[0][1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] - ) { - dataLines.shift(); - } + // sample data row already removed aboveAlso applies to: 548-552
879-901: Do sample-row removal first; include code point and fix FR.Mirror the Zod3 fix: remove the sample row before header validation; include Unicode code point in messages; fix French wording.
- //remove sample data if included (account for old mongolian vowel separator templates) - //return an error if non space characters are found - - let rowNumber = 1; - - const regexResultSubject = nonVisibleCharChecker(dataLines[0][0]); - const regexResultDate = nonVisibleCharChecker(dataLines[0][1]); + // remove sample data if included (account for old Mongolian vowel separator templates) + if ( + dataLines[0]?.[0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && + dataLines[0]?.[1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] + ) { + dataLines.shift(); + } + // return an error if non-space characters are found + let rowNumber = 1; + const [subjectCell, dateCell] = dataLines[0] ?? []; + const regexResultSubject = nonVisibleCharChecker(subjectCell); + const regexResultDate = nonVisibleCharChecker(dateCell); if (regexResultSubject !== null) { + const code = regexResultSubject[0]?.codePointAt(0)?.toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters ${regexResultSubject[0]}`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles ${regexResultSubject[0]}` + en: `Subject ID at row ${rowNumber} contains a non-visible character (U+${code})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${code})` }) ); } if (regexResultDate !== null) { + const code = regexResultDate[0]?.codePointAt(0)?.toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters ${regexResultDate[0]}`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles ${regexResultDate[0]}` + en: `Date at row ${rowNumber} contains a non-visible character (U+${code})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${code})` }) ); } - - if ( - dataLines[0][0]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[0] && - dataLines[0][1]?.trim() === INTERNAL_HEADERS_SAMPLE_DATA[1] - ) { - dataLines.shift(); - } + // sample data row already removed aboveAlso applies to: 903-907
916-929: Check raw cell before trimming in Zod4 path.Same issue: trimming first hides leading/trailing non-visible chars (BOM, etc.). Check the original cell, then trim for parsing.
const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); - if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Check for non‑visible character before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters (U+${charCode})`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles (U+${charCode})` + en: `The value at row ${rowNumber} and column '${key}' contains a non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et à la colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') continue;
🧹 Nitpick comments (1)
apps/web/src/utils/upload.ts (1)
66-81: Precompile the regex and clarify intent.Avoid recompiling the heavy RegExp on every call; hoist it to module scope. Also the name “zeroWidthCharacterExceptNewline” includes a wide control range, not just zero‑width. Consider a clearer name.
-function nonVisibleCharChecker(entry: string | undefined) { - if (!entry) { - return null; - } - const ansiEscapeCode = '[\u001b\u009b][[()#;?]*(?:[0-9]{1,4}(?:;[0-9]{0,4})*)?[0-9A-PRZcf-nqry=><]', - zeroWidthCharacterExceptNewline = - '\u0000-\u0008\u000B-\u0019\u001b\u009b\u00ad\u200b\u2028\u2029\ufeff\ufe00-\ufe0f'; - - const zeroWidthCharactersExceptNewline = new RegExp( - '(?:' + ansiEscapeCode + ')|[' + zeroWidthCharacterExceptNewline + ']', - 'g' - ); - const nonVisibleCharCheck = zeroWidthCharactersExceptNewline.exec(entry); - - return nonVisibleCharCheck; -} +function nonVisibleCharChecker(entry: string | undefined) { + if (!entry) return null; + return NON_VISIBLE_RE.exec(entry); +}Add at module scope:
// Matches ANSI escape sequences and non-visible/control characters excluding \t, \n, \r, and printable ASCII const ANSI_ESCAPE = String.raw`[\u001b\u009b][[()#;?]*(?:\d{1,4}(?:;\d{0,4})*)?[0-9A-PRZcf-nqry=><]`; const NON_VISIBLE_RANGE = String.raw`\u0000-\u0008\u000B-\u0019\u001B\u009B\u00AD\u200B\u2028\u2029\uFEFF\uFE00-\uFE0F`; const NON_VISIBLE_RE = new RegExp(`(?:${ANSI_ESCAPE})|[${NON_VISIBLE_RANGE}]`, 'g');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/utils/upload.ts (1)
560-577: Run the non-visible check on the original cell value.We trim the cell before validation, so leading/trailing markers like
\uFEFFare stripped and slip through. Calling the checker on the raw cell also avoids trimmingundefinedwhen rows are short.for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + //Check for non visible char in every row, return error if present + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`,
♻️ Duplicate comments (1)
apps/web/src/utils/upload.ts (1)
915-928: Check the untrimmed cell in the Zod4 path as well.Same issue here: trimming before validation drops leading/trailing non-visible characters (e.g., BOM), so they pass the check.
const key = headers[i]!.trim(); const cell = elements[i]; const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') continue; // Return error if any non-visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
apps/web/src/utils/upload.ts (2)
570-580: Check raw cell value before trimming.Line 565 trims
elements[i]before line 571 checks for non-visible characters. Leading/trailing non-visible characters (e.g., BOM\uFEFF) are stripped by.trim()and won't be detected. The header validation (lines 529-530) correctly checks raw values—apply the same pattern here.Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) {
921-933: Check raw cell value before trimming.Line 922 trims the cell, then line 925 checks the trimmed
rawValue. Leading/trailing non-visible characters are lost. You've already captured the raw cell in thecellvariable on line 921—use it for the check.Apply this diff:
const key = headers[i]!.trim(); const cell = elements[i]; const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') continue; // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) {
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
66-82: Document the rationale for the chosen non-visible character set.The regex pattern covers ANSI escapes, control characters, zero-width characters, BOM, and variation selectors. However, a past reviewer asked "Why only these five chars?" (referencing an earlier version). Consider adding a comment explaining which character classes are matched and why these were chosen, to help future maintainers understand the coverage.
529-549: Consider using singular forms in error messages.Since you're reporting a single character code (the first match), the messages would read more naturally in singular form:
- English: "contains non-visible character" (not "character(s)")
- French: "contient un caractère non visible" (not "des caractères non visible(s)")
This applies to lines 536-537 and 545-546.
…ining non-visible characters
…, made check a function
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
28fab6d to
10e8c78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
apps/web/src/utils/__tests__/upload.test.ts (4)
61-64: Field-order updates are fine; notetoMatchObjectignores order.Reflecting schema order (notes before score) is OK, but
toMatchObjectdoesn’t depend on property ordering. Keep if it improves readability.Also applies to: 77-80
64-64: DRY up repeated instrument stubs.Consider tiny helpers to avoid repeating
as unknown as AnyUnilingualFormInstrument:const inst3 = (schema: z3.ZodTypeAny) => ({ validationSchema: schema } as unknown as AnyUnilingualFormInstrument); const inst4 = (schema: z4.ZodTypeAny) => ({ validationSchema: schema } as unknown as AnyUnilingualFormInstrument);Then use
inst3(z3.object({...}))/inst4(z4.object({...})).Also applies to: 95-95, 117-117, 136-136, 203-203, 237-237, 256-256, 280-280
200-203: Zod4 reorder looks good; ordering in expectations is cosmetic.Switching to
feedbackbeforescoreand updating expectations is consistent. Reminder:toMatchObjectis order-agnostic.Also applies to: 216-219
61-63: Add tests for non‑visible character rejection (headers and cells).Given the PR goal, add focused cases for both Zod3 and Zod4:
- Header with zero‑width char → reject with codepoint in message (e.g., U+200B), include row/column info.
- Cell value with bidi/format char (e.g., U+200E) → reject similarly.
Example sketches:
it('rejects non-visible char in header (Zod3)', async () => { const instrument = inst3(z3.object({ score: z3.number() })); const csv = unparse([['subjectID', `date\u200B`, 'score'], ['s1','2024-01-01','10']]); const file = new File([csv], 'data.csv', { type: 'text/csv' }); await expect(Zod3.processInstrumentCSV(file, instrument)).rejects.toThrow(/U\+200B/i); }); it('rejects non-visible char in cell (Zod4)', async () => { const instrument = inst4(z4.object({ notes: z4.string() })); const csv = unparse([['subjectID','date','notes'], ['s1','2024-01-01', `ok\u200E`] ]); const file = new File([csv], 'data.csv', { type: 'text/csv' }); await expect(Zod4.processInstrumentCSV(file, instrument)).rejects.toThrow(/U\+200E/i); });Happy to add coverage for U+180E and localized (FR) messages as well.
Also applies to: 200-203
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/web/src/utils/__tests__/upload.test.ts(12 hunks)apps/web/src/utils/upload.ts(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/web/src/utils/upload.ts
🔇 Additional comments (2)
apps/web/src/utils/__tests__/upload.test.ts (2)
37-46: Nice regression guard on key normalization.Reversing object keys in the input and asserting
multiKeys: ['name','age']plus alignedmultiValuesis a good way to enforce deterministic ordering.
92-95: Optional-before-required case reads well.LGTM. This verifies empty string →
undefinedfor optional, and preserves the required field.Also applies to: 107-109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
apps/web/src/utils/upload.ts (2)
572-582: Check raw cell before trimming to catch leading/trailing non-visible characters.Line 567 trims the cell, then line 573 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM
\uFEFF) are lost. This was flagged in past reviews but remains unaddressed.Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); - const rawValue = elements[i]!.trim(); + const cell = elements[i]; + // Check for non-visible character in raw cell before trimming + const nonVisibleChars = nonVisibleCharChecker(cell); + if (nonVisibleChars !== null) { + const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); + return reject( + new UploadError({ + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` + }) + ); + } + const rawValue = cell == null ? '' : cell.trim(); if (rawValue === '\n') { continue; } - - //Check for non visible char in every row, return error if present - const nonVisibleChars = nonVisibleCharChecker(rawValue); - if (nonVisibleChars !== null) { - const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); - return reject( - new UploadError({ - en: `Value at row ${rowNumber} and column '${key}' contains non-visible character(s) (U+${charCode})`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractère(s) non visibles (U+${charCode})` - }) - ); - }Based on learnings.
927-936: Check raw cell before trimming to catch leading/trailing non-visible characters.Line 924 trims the cell, then line 927 checks the trimmed value. Leading/trailing non-visible characters (e.g., BOM
\uFEFF) are lost. This was flagged in past reviews but remains unaddressed.Apply this diff:
for (let i = 0; i < headers.length; i++) { const key = headers[i]!.trim(); const cell = elements[i]; - const rawValue = cell == null ? '' : cell.trim(); - if (rawValue === '\n') continue; - // Return error if any non‑visible character is present - const nonVisibleChars = nonVisibleCharChecker(rawValue); + // Return error if any non-visible character is present + const nonVisibleChars = nonVisibleCharChecker(cell); if (nonVisibleChars !== null) { const charCode = nonVisibleChars[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Value at row ${rowNumber} and column '${key}' contains non-visible characters (U+${charCode})`, - fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient des caractères non visibles (U+${charCode})` + en: `Value at row ${rowNumber} and column '${key}' contains non-visible character (U+${charCode})`, + fr: `La valeur à la ligne ${rowNumber} et colonne '${key}' contient un caractère non visible (U+${charCode})` }) ); } + const rawValue = cell == null ? '' : cell.trim(); + if (rawValue === '\n') continue;Based on learnings.
🧹 Nitpick comments (2)
apps/web/src/utils/upload.ts (2)
534-551: Refine error messages for consistency.Two minor issues:
- Use singular "character" since you're displaying a single char code.
- Remove double space in French Date message before "(U+".
Apply this diff:
const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible character(s) (U+${charCode})`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visible(s) (U+${charCode})` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible character(s) (U+${charCode})`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visible(s) (U+${charCode})` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) );
891-908: Use singular "character" in error messages.Since you display a single char code, use singular "character" instead of plural "characters".
Apply this diff:
const charCode = regexResultSubject[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Subject ID at row ${rowNumber} contains non-visible characters (U+${charCode})`, - fr: `L'ID du sujet à la ligne ${rowNumber} contient des caractères non visibles (U+${charCode})` + en: `Subject ID at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `L'ID du sujet à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) ); } if (regexResultDate !== null) { const charCode = regexResultDate[0].charCodeAt(0).toString(16).toUpperCase().padStart(4, '0'); return reject( new UploadError({ - en: `Date at row ${rowNumber} contains non-visible characters (U+${charCode})`, - fr: `Date à la ligne ${rowNumber} contient des caractères non visibles (U+${charCode})` + en: `Date at row ${rowNumber} contains non-visible character (U+${charCode})`, + fr: `Date à la ligne ${rowNumber} contient un caractère non visible (U+${charCode})` }) );
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
PR to deal with any non-visible character present in a csv template uploaded
new function to check string entry for non-visible chars
display error if one is present
closes issue #1196
Summary by CodeRabbit
New Features
Bug Fixes
Localization
Tests