refactor: add encoding/decoding for samples metadata keys #2469

abdimo101 · 2026-01-14T14:55:01Z

Description

This PR implements encoding and decoding of metadata keys in sampleCharacteristics to handle special characters in MongoDB.

Added @Transform decorators in OutputSampleDto and UpdateSampleDto to encode/decode metadata keys
Updated samples.controller.ts and samples.service.ts to use DTOs.
Created migration script 20260114145500-encode-sample-metadatakeys.js to encode existing sample metadata keys in the database
Added tests for metadata keys in Sample.js

Motivation

Fixes

Bug fixed (#X)

Changes:

changes made

Tests included

Included for each change/fix?
Passing?

Documentation

swagger documentation updated (required for API changes)
official documentation updated

official documentation info

…tProject/scicat-backend-next into encode-decode-samples-metadatakeys

Junjiequan · 2026-01-20T11:10:11Z

migrations/20260114145500-encode-sample-metadatakeys.js

+      try {
+        encodedMetadata = encodeScientificMetadataKeys(metadata);
+      } catch (err) {
+        console.error(
+          `Error encoding sampleCharacteristics for Sample (Id: ${sample._id}):`,
+          err,
+        );
+        continue;
+      }
+
+      console.log(
+        `Updating Sample (Id: ${sample._id}) with encoded sampleCharacteristics keys`,
+      );
+      await db
+        .collection("Sample")
+        .updateOne(
+          { _id: sample._id },
+          { $set: { sampleCharacteristics: encodedMetadata } },
+        );
+    };
+  },


Can you check if bulkWrite can be used here for better performance?

I know use bulkWrite for both dataset and sample migrations scripts

minottic · 2026-01-26T10:33:11Z

I have to admit I never fully understood why this was needed for datasets... since I see this here now, could you please explain me why we need this?

abdimo101 · 2026-01-26T10:46:22Z

I have to admit I never fully understand why this was needed for datasets... since I see this here now, could you please explain me why we need this?

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

minottic · 2026-01-26T10:54:39Z

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

thanks for the explanation! but how did they end up in mongo then (you wouldn't need the migration scripts if mongo did not failed to store them) ?

abdimo101 · 2026-01-26T12:18:02Z

Hi @minottic, the reason for why encoding is needed, is to handle special characters in metadata key names that break MongoDB and Elasticsearch. MongoDB does not allow dots in field names, and Elasticsearch has issues with spaces. By encoding these characters, it ensures that metadata keys work correctly regardless of how users name them.

thanks for the explanation! but how did they end up in mongo then (you wouldn't need the migration scripts if mongo did not failed to store them) ?

MongoDB does store them, but it has limitations when field names contain dots or even dollar signs. For example in SciCat frontend, if a metadata key contains a dot, users wont be able to perform condition filter searches using that specific metadata key.

minottic · 2026-01-26T12:40:25Z

MongoDB does store them, but it has limitations when field names contain dots or even dollar signs. For example in SciCat frontend, if a metadata key contains a dot, users wont be able to perform condition filter searches using that specific metadata key.

but why do you need the migration then? Isn't it enough to translate at run time for search and returns?

Junjiequan · 2026-01-26T16:22:03Z

@minottic the migration is for old scientific Metada keys that contains dot or dollar etc.
Mongo has poor support for keys yhat contain special characters.
Encode for write Decode for read is what we thought would be easier to maintain. runtime query transformation can do the same job, but then we may need to include the transfomer in many places, such as find, update, delete, aggregate etc..
That being said, the migration scirpt will be very slow for big number of datasets and it seems causing some troubles to your side..so if you have better idea we could also test that out

minottic · 2026-01-28T08:38:37Z

@minottic the migration is for old scientific Metada keys that contains dot or dollar etc.
Mongo has poor support for keys yhat contain special characters.
Encode for write Decode for read is what we thought would be easier to maintain. runtime query transformation can do the same job, but then we may need to include the transfomer in many places, such as find, update, delete, aggregate etc..
That being said, the migration scirpt will be very slow for big number of datasets and it seems causing some troubles to your side..so if you have better idea we could also test that out

thanks for the explanation! It was indeed quite slow for us - thus the question -, but now it's done, so I am ultimately fine with it (and also thanks for the bulkWrite!). We have very few samples, so this new migration should not be a problem for us

minottic

a small cosmetic comment, up to you if to try it or not

minottic · 2026-01-28T08:42:21Z

src/samples/samples.controller.ts

+    const samplesObj = (samples as SampleDocument[]).map((sample) =>
+      sample.toObject(),
+    );
+    return plainToInstance(OutputSampleDto, samplesObj);


if you want to avoid the explicit call to plainToInstance and be more "nestJS native", you could use a serializer (thus simply returning samples). See here for example

encoding/decoding for samples metadata keys

c45cc29

abdimo101 requested a review from a team as a code owner January 14, 2026 14:55

Abdi Abdulle and others added 5 commits January 14, 2026 15:56

eslint fix

5d96e1c

Merge branch 'master' into encode-decode-samples-metadatakeys

8ebcc08

updated tests

683db53

Merge branch 'encode-decode-samples-metadatakeys' of github.com:SciCa…

3dc030e

…tProject/scicat-backend-next into encode-decode-samples-metadatakeys

eslint fix x2

d243dca

abdimo101 requested review from Junjiequan and totopoloco January 14, 2026 15:29

Abdi Abdulle added 8 commits January 14, 2026 16:59

reverted back the generator-tag for openapitools-generator

c4bdc48

reverts the change about openapi-generators

a0d4f67

encoding/decoding for samples metadata keys

7563796

eslint fix

39850b4

updated tests

4b5c160

eslint fix x2

7180a94

reverted back the generator-tag for openapitools-generator

a16c023

reverts the change about openapi-generators

5aadb60

Junjiequan force-pushed the encode-decode-samples-metadatakeys branch from a0d4f67 to 5aadb60 Compare January 16, 2026 08:56

Junjiequan reviewed Jan 20, 2026

View reviewed changes

Abdi Abdulle added 2 commits January 26, 2026 11:22

using bulkWrite for sample and dataset migration scripts

713934e

fixing merge conflict

fa309d7

minottic approved these changes Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: add encoding/decoding for samples metadata keys #2469

refactor: add encoding/decoding for samples metadata keys #2469

Uh oh!

abdimo101 commented Jan 14, 2026 •

edited

Loading

Uh oh!

Junjiequan Jan 20, 2026

Uh oh!

abdimo101 Jan 26, 2026

Uh oh!

minottic commented Jan 26, 2026 •

edited

Loading

Uh oh!

abdimo101 commented Jan 26, 2026

Uh oh!

minottic commented Jan 26, 2026 •

edited

Loading

Uh oh!

abdimo101 commented Jan 26, 2026

Uh oh!

minottic commented Jan 26, 2026

Uh oh!

Junjiequan commented Jan 26, 2026

Uh oh!

minottic commented Jan 28, 2026

Uh oh!

minottic left a comment

Uh oh!

minottic Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor: add encoding/decoding for samples metadata keys #2469

Are you sure you want to change the base?

refactor: add encoding/decoding for samples metadata keys #2469

Uh oh!

Conversation

abdimo101 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Fixes

Changes:

Tests included

Documentation

official documentation info

Uh oh!

Junjiequan Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

abdimo101 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

minottic commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdimo101 commented Jan 26, 2026

Uh oh!

minottic commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdimo101 commented Jan 26, 2026

Uh oh!

minottic commented Jan 26, 2026

Uh oh!

Junjiequan commented Jan 26, 2026

Uh oh!

minottic commented Jan 28, 2026

Uh oh!

minottic left a comment

Choose a reason for hiding this comment

Uh oh!

minottic Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abdimo101 commented Jan 14, 2026 •

edited

Loading

minottic commented Jan 26, 2026 •

edited

Loading

minottic commented Jan 26, 2026 •

edited

Loading