Skip to content

fix(editor): sanitize input spaces in data entry (BB-602)#1211

Open
AkshayJalageri wants to merge 1 commit intometabrainz:masterfrom
AkshayJalageri:BB-602-sanitize-inputs
Open

fix(editor): sanitize input spaces in data entry (BB-602)#1211
AkshayJalageri wants to merge 1 commit intometabrainz:masterfrom
AkshayJalageri:BB-602-sanitize-inputs

Conversation

@AkshayJalageri
Copy link

Problem

As described in BB-602, user input in forms currently retains unnecessary whitespace. This leads to database entries with leading/trailing spaces or inconsistent double spacing in names and titles.

Solution

I have implemented a sanitization helper that processes form data before submission.

  • Trims leading and trailing whitespace.
  • Normalizes multiple spaces (including fullwidth spaces \u3000) into single spaces.
  • Recursively cleans nested objects and arrays.
  • Exception: The annotation field is explicitly skipped to preserve Markdown formatting.

Ticket

https://tickets.metabrainz.org/browse/BB-602

Copilot AI review requested due to automatic review settings December 22, 2025 13:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements input sanitization for form data to prevent database entries with leading/trailing whitespace and inconsistent spacing. The changes introduce a sanitizeInput function that trims whitespace, normalizes multiple consecutive spaces (including fullwidth spaces), and recursively applies this sanitization to all form data before submission, with an intended exception for annotation fields to preserve Markdown formatting.

Key Changes

  • Added sanitizeInput function to normalize whitespace in text inputs
  • Implemented recursive sanitizePayload function to sanitize nested objects and arrays
  • Applied sanitization in all three submission paths: postSubmission, postUFSubmission, and submitSingleEntity

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/client/helpers/utils.tsx Adds the sanitizeInput utility function that normalizes whitespace and trims strings
src/client/entity-editor/submission-section/actions.ts Implements sanitizePayload to recursively sanitize form data and integrates it into all submission workflows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
if (_.isPlainObject(data)) {
return _.mapValues(data, (value, key) => {
if (key === 'annotation') { return value; }
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for 'annotation' is incorrect. Based on the codebase structure, the actual key is 'annotationSection', not 'annotation'. The annotationSection object contains a 'content' property. This means annotation content will be sanitized, contradicting the stated intention to preserve formatting in annotations. The check should be 'annotationSection' instead of 'annotation'.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like copilot says, I think this is incorrect

Comment on lines +287 to +296
export function sanitizeInput(text: string): string {
if (!text || typeof text !== 'string') {
return '';
}

return text
.replace(/\u3000/g, ' ') // Convert fullwidth space to normal space
.replace(/\s+/g, ' ') // Merge double/multiple spaces into one
.trim(); // Remove start/end spaces
} No newline at end of file
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function uses space indentation instead of tab indentation, which violates the codebase conventions defined in .editorconfig. According to .editorconfig, .tsx files should use tab indentation (indent_style = tab). Please reformat this function to use tabs for consistency with the rest of the file.

Copilot uses AI. Check for mistakes.
Copy link
Member

@MonkeyDo MonkeyDo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for opening a PR !

All the unrelated auto-formatting changes are making it quite hard to review what are the actual changes, but here is some preliminary feedback from what I could clearly see.

export const SET_SUBMITTED = 'SET_SUBMITTED';

export type Action = {
type: string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those unrelated formatting changes are unwanted and should be removed.

*/
export function sanitizeInput(text: string): string {
if (!text || typeof text !== 'string') {
return '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably return text; instead?

}
if (_.isPlainObject(data)) {
return _.mapValues(data, (value, key) => {
if (key === 'annotation') { return value; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like copilot says, I think this is incorrect

* Recursively sanitizes data object strings.
* Skips 'annotation' field to preserve formatting.
*/
function sanitizePayload(data: any): any {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we already have an existing method to sanitize text, which was just behind a simple search on the codebase...

Changes should be implemented there (I think only the trimming?) instead of creating a new method.

/**
* This function repalces other space control character to U+0020 and trim extra spaces
* @param {string} text - text to sanitize
* @returns {string} - sanitized text
*/
export function collapseWhiteSpaces(text:string):string {
// replace any whitespace space characters
const spaceRegex = RegExp(/\s+/gi);
const sanitizedText = text.replace(spaceRegex, '\u0020');
return sanitizedText.trim();
}
/**
* This function is to sanitize text inputs
* @param {string} text - text to sanitize
* @returns {string} - sanitized text
*/
export function sanitize(text:string):string {
if (!isString(text)) {
return text;
}
// unicode normalization to convert text into normal text
let sanitizeText = text.normalize('NFC');
sanitizeText = collapseWhiteSpaces(sanitizeText);
// eslint-disable-next-line no-control-regex
// https://www.w3.org/TR/xml/#charsets remove invalid xml characters
const invalidXMLRgx = RegExp(/[^\u0020-\uD7FF\uE000-\uFFFD]/gi);
sanitizeText = sanitizeText.replace(invalidXMLRgx, '');
// get rid of all control charcters
const ccRegex = RegExp(/[\u200B\u00AD\p{Cc}]/gu);
sanitizeText = sanitizeText.replace(ccRegex, '');
sanitizeText = collapseWhiteSpaces(sanitizeText);
return sanitizeText;
}

Additionally, the payload is already processed on the server-side, which future-proofs in case we ever want to allow submitting data through API.

return body;
}
export async function processSingleEntity(formBody, JSONEntity, reqSession,
entityType, orm:any, editorJSON, derivedProps, isMergeOperation, transacting):Promise<any> {
const {Entity, Revision} = orm;
let body = sanitizeBody(formBody);
let currentEntity: {
aliasSet: {id: number} | null | undefined,
annotation: {id: number} | null | undefined,
bbid: string,
disambiguation: {id: number} | null | undefined,
identifierSet: {id: number} | null | undefined,
type: EntityTypeString
} | null | undefined = JSONEntity;
try {
// Determine if a new entity is being created
const isNew = !currentEntity;
// sanitize namesection inputs
body = sanitizeBody(body);
if (isNew) {
const newEntity = await new Entity({type: entityType})
.save(null, {transacting});
const newEntityBBID = newEntity.get('bbid');
body.relationships = _.map(
body.relationships,
({sourceBbid, targetBbid, ...others}) => ({

@MonkeyDo
Copy link
Member

MonkeyDo commented Feb 9, 2026

I will also add this: we require you disclose any use of LLM/AI agents, see our contribution guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments