-
Notifications
You must be signed in to change notification settings - Fork 4
fixes to ingest all HGNC genes for uniprot mappings #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -61,7 +61,7 @@ def run(self, data_file: Union[Optional[Path], Optional[str]] = None, show_statu | |||
|
|
||||
| wallen_etal_df = pd.read_excel(input_file, skiprows=3, sheet_name=WALLEN_ETAL_TAB_NAME) | ||||
| wallen_etal_df[FDR_COLUMN] = pd.to_numeric(wallen_etal_df[FDR_COLUMN], errors="coerce") | ||||
|
|
||||
| import pdb;pdb.set_trace() | ||||
|
||||
| import pdb;pdb.set_trace() |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -9,6 +9,7 @@ | |||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| import pandas as pd | ||||||||||||||||||||||||||||||||||||||||||
| import requests | ||||||||||||||||||||||||||||||||||||||||||
| from tqdm import tqdm | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| from kg_microbe.transform_utils.constants import ( | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -20,6 +21,7 @@ | |||||||||||||||||||||||||||||||||||||||||
| EC_CATEGORY, | ||||||||||||||||||||||||||||||||||||||||||
| EC_PREFIX, | ||||||||||||||||||||||||||||||||||||||||||
| ENABLES, | ||||||||||||||||||||||||||||||||||||||||||
| GENE_CATEGORY, | ||||||||||||||||||||||||||||||||||||||||||
| GENE_TO_PROTEIN_EDGE, | ||||||||||||||||||||||||||||||||||||||||||
| GO_BIOLOGICAL_PROCESS_ID, | ||||||||||||||||||||||||||||||||||||||||||
| GO_BIOLOGICAL_PROCESS_LABEL, | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -37,6 +39,7 @@ | |||||||||||||||||||||||||||||||||||||||||
| MONDO_XREFS_FILEPATH, | ||||||||||||||||||||||||||||||||||||||||||
| NCBI_CATEGORY, | ||||||||||||||||||||||||||||||||||||||||||
| NCBITAXON_PREFIX, | ||||||||||||||||||||||||||||||||||||||||||
| NODE_NORMALIZER_URL, | ||||||||||||||||||||||||||||||||||||||||||
| OMIM_PREFIX, | ||||||||||||||||||||||||||||||||||||||||||
| ONTOLOGIES_TREES_DIR, | ||||||||||||||||||||||||||||||||||||||||||
| PARTICIPATES_IN, | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -98,6 +101,7 @@ | |||||||||||||||||||||||||||||||||||||||||
| RHEA_PARSED_COLUMN = "rhea_parsed" | ||||||||||||||||||||||||||||||||||||||||||
| DISEASE_PARSED_COLUMN = "disease_parsed" | ||||||||||||||||||||||||||||||||||||||||||
| GENE_PRIMARY_PARSED_COLUMN = "gene_primary_parsed" | ||||||||||||||||||||||||||||||||||||||||||
| GENE_NAME_PRIMARY_PARSED_COLUMN = "gene_name_primary_parsed" | ||||||||||||||||||||||||||||||||||||||||||
| GO_TERM_COLUMN = "GO_Term" | ||||||||||||||||||||||||||||||||||||||||||
| GO_CATEGORY_COLUMN = "GO_Category" | ||||||||||||||||||||||||||||||||||||||||||
| UNIPROT_ID_COLUMN = "Uniprot_ID" | ||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -106,6 +110,29 @@ | |||||||||||||||||||||||||||||||||||||||||
| CHEBI_REGEX = re.compile(r'/ligand_id="ChEBI:(.*?)";') | ||||||||||||||||||||||||||||||||||||||||||
| GO_REGEX = re.compile(r"\[(.*?)\]") | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| # Takes cure in the form PREFIX:ID | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
| # Takes cure in the form PREFIX:ID | |
| # Takes curie in the form PREFIX:ID |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line assumes the response JSON contains the node_curie key, but this could fail if the API response structure is different or if the key doesn't exist, potentially causing a KeyError.
| entries = response.json()[node_curie] | |
| entries = response.json().get(node_curie) | |
| if entries is None: | |
| return None |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition len(entries) > 1 doesn't make logical sense here. The code is checking if entries has more than one element, but then accesses entries["equivalent_identifiers"], suggesting entries is a dictionary. The logic should verify entries is not None and contains the expected structure.
| if len(entries) > 1: # .strip().split("\n") | |
| if entries and "equivalent_identifiers" in entries and entries["equivalent_identifiers"]: |
Copilot
AI
Aug 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function lacks proper error handling for HTTP requests. If the API is unavailable or returns an error status, the function will raise an exception without providing a meaningful error message to help with debugging.
| else: | |
| url = NODE_NORMALIZER_URL + node_curie | |
| try: | |
| # Make the HTTP request to NodeNormalizer | |
| response = requests.get(url, timeout=30) | |
| response.raise_for_status() | |
| # Write response to file if it contains data | |
| entries = response.json()[node_curie] | |
| if len(entries) > 1: # .strip().split("\n") | |
| for iden in entries["equivalent_identifiers"]: | |
| if iden["identifier"].split(":")[0] + ":" == HGNC_NEW_PREFIX: | |
| norm_node = iden["identifier"] | |
| return norm_node | |
| else: | |
| return None | |
| except requests.exceptions.RequestException as e: | |
| logging.error(f"HTTP request failed for node_curie '{node_curie}' at URL '{url}': {e}") | |
| return None | |
| except (ValueError, KeyError, TypeError) as e: | |
| logging.error(f"Error processing response for node_curie '{node_curie}' at URL '{url}': {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed rather than left in the codebase. If this header writing logic is needed, it should be uncommented and used, otherwise it should be deleted.