Skip to content

Metadata JSON design#30

Merged
khetherin merged 39 commits intoEBIvariation:mainfrom
khetherin:metadataJSON
Mar 16, 2026
Merged

Metadata JSON design#30
khetherin merged 39 commits intoEBIvariation:mainfrom
khetherin:metadataJSON

Conversation

@khetherin
Copy link
Collaborator

No description provided.

@khetherin khetherin requested a review from tcezard February 12, 2026 16:40
@khetherin khetherin requested a review from tcezard February 19, 2026 12:43
Copy link
Member

@tcezard tcezard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good progress but there are several aspect that needs to be improved.

Comment on lines +108 to +113
except Exception as e:
logger.warning(f"Database error: {e}")
logger.warning("Rolling back")
# rollback failed transaction
self.connection.rollback()
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not writing to the database there is nothing to rollback.


)
project_title_dict = self.load_from_db(project_title_query.get_sql(quote_char=None))
project_title = next(iter(project_title_dict.values()))[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in load_from_db but this seems very complicated for just getting the first element of a list.

Also you never check to see if the list contains something which would raise a exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change has been implemented in c0e2cd7 using validate_fetch_result to handle None.

def _get_sample_pre_registered(self, study_accession, biosample_accession):
# requires analysisAlias, sampleinVCF, biosample_accession
sample_analysis_alias = self._fetch_analysis_alias(study_accession)
sample_sampleinvcf = "UNSPECIFIED_SAMPLE_IN_VCF"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find the actual sample name otherwise we cannot link this sample to the data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been implemented in commit 27aa782 under the assumption sampleinVCF = sample_id.

experiment_type = self.validate_fetch_result("experimentType", experiment_type_dict)
return experiment_type

def _fetch_reference_genome(self, study_accession):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this needs to be resolved, outside the scope of this ticket, this will be addressed in another ticket (EVA-4096)


for sample_status in sample_registration_statuses:
if sample_status.is_sample_preregistered:
sample_metadata = self._get_sample_pre_registered(study_accession, sample_status.sample_accession, sample_status.sample_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with change on attribute error for namedtuple. This has been implemented in commit 847075d.

.select(ds.BIOPROJECT_ACCESSION)
.where(ds.STUDY_ACCESSION == study_accession)
)
project_accession_dict = self.load_from_db(project_accession_query.get_sql(quote_char=None))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with what to do if project_accession_dict is empty. This has been implemented in commit c68f41b.

.where(ds.STUDY_ACCESSION == study_accession)
)
project_accession_dict = self.load_from_db(project_accession_query.get_sql(quote_char=None))
project_accession = self.validate_fetch_result("projectAccession", project_accession_dict)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change implemented in 0dc8338 using validate_fetch_result to handle None.

analysis_description = self.validate_fetch_result("description", analysis_description_dict)
return analysis_description

def _fetch_experiment_type(self, study_accession):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the comment about not being validated against the EVA enum. Suggest this is outside of the scope of this ticket

parser.add_argument("study_accession", help="DGVa Study Accession")
parser.add_argument("-a", "--assembly", help="FASTA assembly file")
parser.add_argument("--log", help="Path to log file")
parser.add_argument("--config", help="Path to config file")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--config is optional but treated as required = This has been implemented in commit a0b3d81 by checking for presence of config before obtaining metadata.

parser = argparse.ArgumentParser()
parser.add_argument("gvf_input", help="GVF input file.")
parser.add_argument("vcf_output", help="VCF output file.")
parser.add_argument("json_output", help="JSON output file.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making json_output and study_accession optional: this change has been implemented in 147659f by making arguments optional.

@khetherin khetherin requested a review from tcezard March 9, 2026 14:23
# assumption the name: sampleinVCF = sample_id
sample_sampleinvcf = sample_id
sample_object = {
"analysisAlias": sample_analysis_alias,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"analysisAlias": sample_analysis_alias,
"analysisAlias": [sample_analysis_alias],

metadata_client._connection = mock_unhealthy
result = metadata_client.connection
# did you close the unhealthy connection
mock_unhealthy.close_assert_called_once()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mock_unhealthy.close_assert_called_once()
mock_unhealthy.close.assert_called_once()

Comment on lines +46 to +47
mock_connection_object_existing = Mock()
mock_connection.return_value = mock_connection_object_existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two lines are not used.

Suggested change
mock_connection_object_existing = Mock()
mock_connection.return_value = mock_connection_object_existing

@tcezard tcezard self-requested a review March 13, 2026 09:48
@khetherin khetherin merged commit db8937a into EBIvariation:main Mar 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants