-
Notifications
You must be signed in to change notification settings - Fork 33
refactor: add encoding/decoding for samples metadata keys #2469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…tProject/scicat-backend-next into encode-decode-samples-metadatakeys
a0d4f67 to
5aadb60
Compare
| try { | ||
| encodedMetadata = encodeScientificMetadataKeys(metadata); | ||
| } catch (err) { | ||
| console.error( | ||
| `Error encoding sampleCharacteristics for Sample (Id: ${sample._id}):`, | ||
| err, | ||
| ); | ||
| continue; | ||
| } | ||
|
|
||
| console.log( | ||
| `Updating Sample (Id: ${sample._id}) with encoded sampleCharacteristics keys`, | ||
| ); | ||
| await db | ||
| .collection("Sample") | ||
| .updateOne( | ||
| { _id: sample._id }, | ||
| { $set: { sampleCharacteristics: encodedMetadata } }, | ||
| ); | ||
| }; | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if bulkWrite can be used here for better performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know use bulkWrite for both dataset and sample migrations scripts
|
I have to admit I never fully understood why this was needed for datasets... since I see this here now, could you please explain me why we need this? |
Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them. |
thanks for the explanation! but how did they end up in mongo then (you wouldn't need the migration scripts if mongo did not failed to store them) ? |
MongoDB does store them, but it has limitations when field names contain dots or even dollar signs. For example in SciCat frontend, if a metadata key contains a dot, users wont be able to perform condition filter searches using that specific metadata key. |
but why do you need the migration then? Isn't it enough to translate at run time for search and returns? |
|
@minottic the migration is for old scientific Metada keys that contains dot or dollar etc. |
thanks for the explanation! It was indeed quite slow for us - thus the question -, but now it's done, so I am ultimately fine with it (and also thanks for the bulkWrite!). We have very few samples, so this new migration should not be a problem for us |
minottic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small cosmetic comment, up to you if to try it or not
| const samplesObj = (samples as SampleDocument[]).map((sample) => | ||
| sample.toObject(), | ||
| ); | ||
| return plainToInstance(OutputSampleDto, samplesObj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to avoid the explicit call to plainToInstance and be more "nestJS native", you could use a serializer (thus simply returning samples). See here for example
Description
This PR implements encoding and decoding of metadata keys in
sampleCharacteristicsto handle special characters in MongoDB.@Transformdecorators inOutputSampleDtoandUpdateSampleDtoto encode/decode metadata keyssamples.controller.tsandsamples.service.tsto use DTOs.20260114145500-encode-sample-metadatakeys.jsto encode existing sample metadata keys in the databaseSample.jsMotivation
Fixes
Changes:
Tests included
Documentation
official documentation info