Skip to content

Split VCF line in builder and storage object#18

Merged
tcezard merged 2 commits intoEBIvariation:mainfrom
tcezard:Split-VCFline
Dec 2, 2025
Merged

Split VCF line in builder and storage object#18
tcezard merged 2 commits intoEBIvariation:mainfrom
tcezard:Split-VCFline

Conversation

@tcezard
Copy link
Member

@tcezard tcezard commented Dec 1, 2025

No description provided.

@tcezard tcezard requested a review from khetherin December 1, 2025 13:34
pragmas_to_add.append(generate_vcf_header_unstructured_line(vcf_header_key, pragma_value))
for vcf_obj in list_of_vcf_objects:
pragmas_to_add.append(generate_vcf_header_unstructured_line("source", vcf_obj.source))
# FIXME: Why are we adding header from the VCF lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the source (displayed in VCF header as: ##source=DGVa). This comes from the GVF feature line. Alsopragmas_to_add is probably a misleading name, these were GVF pragmas that have been converted to VCF header.

unique_info_lines_to_add = list(dict.fromkeys(info_line for info_line in standard_lines_dictionary["INFO"] if info_line not in unique_info_lines_to_add))
unique_filter_lines_to_add = list(dict.fromkeys(filter_line for filter_line in standard_lines_dictionary["FILTER"] if filter_line not in unique_filter_lines_to_add))
unique_format_lines_to_add = list(dict.fromkeys(format_line for format_line in standard_lines_dictionary["FORMAT"] if format_line not in unique_format_lines_to_add))
# TODO: The addition of headers from the VCF lines should be done in the VCF builder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the TODO. I also think the function generate_vcf_header_metainfo could be simplified.


# This is the main conversion logic
def convert_gvf_features_to_vcf_objects(gvf_lines_obj_list, reference_lookup):
def convert_gvf_features_to_vcf_objects(gvf_lines_obj_list, reference_lookup, ordered_list_of_samples):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an ordered_list_of_Samples is a good idea and will be useful in this function. Getting this ordered list will need to come from the GVF pragmas and should be done early on.

self.info_dict = merged_info_dict
other_vcf_line.info_dict = merged_info_dict

def merge_vcf_values_for_format(self, other_vcf_line):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:add test

@tcezard tcezard merged commit 550736e into EBIvariation:main Dec 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants