Metadata JSON design by khetherin · Pull Request #30 · EBIvariation/convertGVFtoVCF

khetherin · 2026-02-12T16:40:04Z

No description provided.

convert_gvf_to_vcf/metadataJSON.py

convert_gvf_to_vcf/etc/query_mapper.yaml

convert_gvf_to_vcf/metadataJSON.py

…, merged create_json_file and write_json_file, moving away from manual queries towards using pypika to generate SQL

…ded.

…PECIFIED. added validate to fetch functions

tcezard

It's good progress but there are several aspect that needs to be improved.

convert_gvf_to_vcf/metadataJSON.py

tcezard · 2026-02-25T23:24:04Z

convert_gvf_to_vcf/metadataJSON.py

+        except Exception as e:
+            logger.warning(f"Database error: {e}")
+            logger.warning("Rolling back")
+            # rollback failed transaction
+            self.connection.rollback()
+            return {}


If you're not writing to the database there is nothing to rollback.

tcezard · 2026-02-25T23:26:21Z

convert_gvf_to_vcf/metadataJSON.py

+
+        )
+        project_title_dict = self.load_from_db(project_title_query.get_sql(quote_char=None))
+        project_title = next(iter(project_title_dict.values()))[0]


See my comment in load_from_db but this seems very complicated for just getting the first element of a list.

Also you never check to see if the list contains something which would raise a exception.

Change has been implemented in c0e2cd7 using validate_fetch_result to handle None.

tcezard · 2026-02-25T23:28:08Z

convert_gvf_to_vcf/metadataJSON.py

+    def _get_sample_pre_registered(self, study_accession, biosample_accession):
+        # requires analysisAlias, sampleinVCF, biosample_accession
+        sample_analysis_alias = self._fetch_analysis_alias(study_accession)
+        sample_sampleinvcf = "UNSPECIFIED_SAMPLE_IN_VCF"


We need to find the actual sample name otherwise we cannot link this sample to the data.

This change has been implemented in commit 27aa782 under the assumption sampleinVCF = sample_id.

convert_gvf_to_vcf/metadataJSON.py

…result

…metadata

khetherin · 2026-03-05T14:31:00Z

convert_gvf_to_vcf/metadataJSON.py

+        experiment_type = self.validate_fetch_result("experimentType", experiment_type_dict)
+        return experiment_type
+
+    def _fetch_reference_genome(self, study_accession):


Agree that this needs to be resolved, outside the scope of this ticket, this will be addressed in another ticket (EVA-4096)

khetherin · 2026-03-05T14:32:51Z

convert_gvf_to_vcf/metadataJSON.py

+
+        for sample_status in sample_registration_statuses:
+            if sample_status.is_sample_preregistered:
+                sample_metadata = self._get_sample_pre_registered(study_accession, sample_status.sample_accession, sample_status.sample_id)


Agree with change on attribute error for namedtuple. This has been implemented in commit 847075d.

khetherin · 2026-03-05T14:34:18Z

convert_gvf_to_vcf/metadataJSON.py

+                                   .select(ds.BIOPROJECT_ACCESSION)
+                                   .where(ds.STUDY_ACCESSION == study_accession)
+                                   )
+        project_accession_dict = self.load_from_db(project_accession_query.get_sql(quote_char=None))


Agree with what to do if project_accession_dict is empty. This has been implemented in commit c68f41b.

khetherin · 2026-03-05T14:37:39Z

convert_gvf_to_vcf/metadataJSON.py

+                                   .where(ds.STUDY_ACCESSION == study_accession)
+                                   )
+        project_accession_dict = self.load_from_db(project_accession_query.get_sql(quote_char=None))
+        project_accession = self.validate_fetch_result("projectAccession", project_accession_dict)


Change implemented in 0dc8338 using validate_fetch_result to handle None.

khetherin · 2026-03-05T14:40:49Z

convert_gvf_to_vcf/metadataJSON.py

+        analysis_description = self.validate_fetch_result("description", analysis_description_dict)
+        return analysis_description
+
+    def _fetch_experiment_type(self, study_accession):


Agree with the comment about not being validated against the EVA enum. Suggest this is outside of the scope of this ticket

khetherin · 2026-03-05T14:44:06Z

convert_gvf_to_vcf/convertGVFtoVCF.py

+    parser.add_argument("study_accession", help="DGVa Study Accession")
    parser.add_argument("-a", "--assembly", help="FASTA assembly file")
    parser.add_argument("--log", help="Path to log file")
+    parser.add_argument("--config", help="Path to config file")


--config is optional but treated as required = This has been implemented in commit a0b3d81 by checking for presence of config before obtaining metadata.

khetherin · 2026-03-05T14:45:37Z

convert_gvf_to_vcf/convertGVFtoVCF.py

    parser = argparse.ArgumentParser()
    parser.add_argument("gvf_input", help="GVF input file.")
    parser.add_argument("vcf_output", help="VCF output file.")
+    parser.add_argument("json_output", help="JSON output file.")


Making json_output and study_accession optional: this change has been implemented in 147659f by making arguments optional.

…or missing results

…ed in the function name

…n type

…ings

tcezard · 2026-03-09T14:40:56Z

convert_gvf_to_vcf/metadataJSON.py

+        # assumption the name: sampleinVCF = sample_id
+        sample_sampleinvcf = sample_id
+        sample_object = {
+            "analysisAlias": sample_analysis_alias,


Suggested change

"analysisAlias": sample_analysis_alias,

"analysisAlias": [sample_analysis_alias],

tcezard · 2026-03-09T14:45:29Z

tests/test_metadataJSON.py

+        metadata_client._connection = mock_unhealthy
+        result = metadata_client.connection
+        # did you close the unhealthy connection
+        mock_unhealthy.close_assert_called_once()


Suggested change

mock_unhealthy.close_assert_called_once()

mock_unhealthy.close.assert_called_once()

tcezard · 2026-03-09T14:52:49Z

tests/test_metadataJSON.py

+        mock_connection_object_existing = Mock()
+        mock_connection.return_value = mock_connection_object_existing


This two lines are not used.

Suggested change

mock_connection_object_existing = Mock()

mock_connection.return_value = mock_connection_object_existing

convert_gvf_to_vcf/metadataJSON.py

added metadataJSON.py and query_mapper.yaml

bae1fbe

khetherin requested a review from tcezard February 12, 2026 16:40

tcezard reviewed Feb 12, 2026

View reviewed changes

convert_gvf_to_vcf/metadataJSON.py Show resolved Hide resolved

convert_gvf_to_vcf/etc/query_mapper.yaml Outdated Show resolved Hide resolved

convert_gvf_to_vcf/metadataJSON.py Outdated Show resolved Hide resolved

khetherin added 5 commits February 16, 2026 11:46

Merge remote-tracking branch 'origin/main' into metadataJSON

23e2b03

addressing design comments:clean up init function, connect to db once…

5db536c

…, merged create_json_file and write_json_file, moving away from manual queries towards using pypika to generate SQL

submitter details added to JSON.

269000f

started to add project.

54698b2

set up unit tests

233fe07

khetherin requested a review from tcezard February 19, 2026 12:43

khetherin added 11 commits February 20, 2026 18:22

moved sql queries _fetch functions. started on samples. unit tests ad…

34b18d2

…ded.

test.config file

1ec5168

added _get_analysis, _get_files function. changed placeholders to UNS…

12b0eeb

…PECIFIED. added validate to fetch functions

added unit tests

f408338

added gather_metadata function

2ac6b3b

edit fetching file name

81a505b

remove TODO

12cc212

requirements.txt: add oracledb

c714772

requirements.txt: add pypika

aebf429

change _get_sample_new

657f32c

removed query mapper

565cfd9

tcezard requested changes Feb 25, 2026

View reviewed changes

khetherin added 8 commits March 3, 2026 11:48

fix(metadataJSON.py): correct the dot notation for sample_status

847075d

fix(metadataJSON.py): change the return value

c68f41b

fix(metadataJSON.py): change how unregistered samples are handled

3ff41ef

fix(metadataJSON.py): change how preregistered samples are handled

27aa782

fix(metadataJSON.py): change use of next() to calling validate_fetch_…

0dc8338

…result

fix(metadataJSON.py): change use of next() to calling validate_fetch_…

c0e2cd7

…result

fix(convertGVFtoVCF.py): check if config is present before obtaining …

a0b3d81

…metadata

fix(convertGVFtoVCF.py): make arguments optional

147659f

khetherin added 2 commits March 4, 2026 17:43

fix(metadataJSON.py): validate the project accession

bc1564b

fix(metadataJSON.py): change the sampleinvcf to match the sample id

7ced2e5

khetherin commented Mar 5, 2026

View reviewed changes

khetherin added 10 commits March 5, 2026 15:00

fix(test_metadataJSON.py): edit tests

060d535

fix(test_metadataJSON.py): remove place holder and raise ValueError f…

e917ca9

…or missing results

fix(test_metadataJSON.py): fetch methods changed to fetch what is nam…

f69c451

…ed in the function name

fix(test_metadataJSON.py): edit mock data to represent the real retur…

7e112df

…n type

feat(gather_metadata.py): make gather_metadata its own executable

941ce7b

edit comment

36ecc87

fix(metadataJSON.py): improve error handling

5944163

fix(metadataJSON.py): correct typo

9c5de14

fix(metadataJSON.py): replace UNSPECIFIED placeholders with empty str…

ea96add

…ings

fix(metadataJSON.py): edit unit test

ec45803

khetherin requested a review from tcezard March 9, 2026 14:23

tcezard reviewed Mar 10, 2026

View reviewed changes

khetherin added 2 commits March 10, 2026 15:40

fix(metadataJSON.py, test_metadataJSON.py): fix typos, remove dead code

e85185d

fix(metadataJSON.py, test_metadataJSON.py): fix typos

1836722

tcezard self-requested a review March 13, 2026 09:48

tcezard approved these changes Mar 13, 2026

View reviewed changes

khetherin merged commit db8937a into EBIvariation:main Mar 16, 2026
1 check passed

	"analysisAlias": sample_analysis_alias,
	"analysisAlias": [sample_analysis_alias],

	mock_unhealthy.close_assert_called_once()
	mock_unhealthy.close.assert_called_once()

		mock_connection_object_existing = Mock()
		mock_connection.return_value = mock_connection_object_existing

Conversation

khetherin commented Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tcezard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tcezard left a comment •

edited

Loading