Skip to content

Conversation

@abdimo101
Copy link
Member

@abdimo101 abdimo101 commented Jan 14, 2026

Description

This PR implements encoding and decoding of metadata keys in sampleCharacteristics to handle special characters in MongoDB.

  • Added @Transform decorators in OutputSampleDto and UpdateSampleDto to encode/decode metadata keys
  • Updated samples.controller.ts and samples.service.ts to use DTOs.
  • Created migration script 20260114145500-encode-sample-metadatakeys.js to encode existing sample metadata keys in the database
  • Added tests for metadata keys in Sample.js

Motivation

Fixes

  • Bug fixed (#X)

Changes:

  • changes made

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

@abdimo101 abdimo101 requested a review from a team as a code owner January 14, 2026 14:55
@Junjiequan Junjiequan force-pushed the encode-decode-samples-metadatakeys branch from a0d4f67 to 5aadb60 Compare January 16, 2026 08:56
Comment on lines 16 to 36
try {
encodedMetadata = encodeScientificMetadataKeys(metadata);
} catch (err) {
console.error(
`Error encoding sampleCharacteristics for Sample (Id: ${sample._id}):`,
err,
);
continue;
}

console.log(
`Updating Sample (Id: ${sample._id}) with encoded sampleCharacteristics keys`,
);
await db
.collection("Sample")
.updateOne(
{ _id: sample._id },
{ $set: { sampleCharacteristics: encodedMetadata } },
);
};
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if bulkWrite can be used here for better performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know use bulkWrite for both dataset and sample migrations scripts

@minottic
Copy link
Member

minottic commented Jan 26, 2026

I have to admit I never fully understood why this was needed for datasets... since I see this here now, could you please explain me why we need this?

@abdimo101
Copy link
Member Author

I have to admit I never fully understand why this was needed for datasets... since I see this here now, could you please explain me why we need this?

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

@minottic
Copy link
Member

minottic commented Jan 26, 2026

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

thanks for the explanation! but how did they end up in mongo then (you wouldn't need the migration scripts if mongo did not failed to store them) ?

@abdimo101
Copy link
Member Author

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

thanks for the explanation! but how did they end up in mongo then (you wouldn't need the migration scripts if mongo did not failed to store them) ?

MongoDB does store them, but it has limitations when field names contain dots or even dollar signs. For example in SciCat frontend, if a metadata key contains a dot, users wont be able to perform condition filter searches using that specific metadata key.

@minottic
Copy link
Member

MongoDB does store them, but it has limitations when field names contain dots or even dollar signs. For example in SciCat frontend, if a metadata key contains a dot, users wont be able to perform condition filter searches using that specific metadata key.

but why do you need the migration then? Isn't it enough to translate at run time for search and returns?

@Junjiequan
Copy link
Member

@minottic the migration is for old scientific Metada keys that contains dot or dollar etc.
Mongo has poor support for keys yhat contain special characters.
Encode for write Decode for read is what we thought would be easier to maintain. runtime query transformation can do the same job, but then we may need to include the transfomer in many places, such as find, update, delete, aggregate etc..
That being said, the migration scirpt will be very slow for big number of datasets and it seems causing some troubles to your side..so if you have better idea we could also test that out

@minottic
Copy link
Member

@minottic the migration is for old scientific Metada keys that contains dot or dollar etc.
Mongo has poor support for keys yhat contain special characters.
Encode for write Decode for read is what we thought would be easier to maintain. runtime query transformation can do the same job, but then we may need to include the transfomer in many places, such as find, update, delete, aggregate etc..
That being said, the migration scirpt will be very slow for big number of datasets and it seems causing some troubles to your side..so if you have better idea we could also test that out

thanks for the explanation! It was indeed quite slow for us - thus the question -, but now it's done, so I am ultimately fine with it (and also thanks for the bulkWrite!). We have very few samples, so this new migration should not be a problem for us

Copy link
Member

@minottic minottic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small cosmetic comment, up to you if to try it or not

const samplesObj = (samples as SampleDocument[]).map((sample) =>
sample.toObject(),
);
return plainToInstance(OutputSampleDto, samplesObj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to avoid the explicit call to plainToInstance and be more "nestJS native", you could use a serializer (thus simply returning samples). See here for example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants