Conversation
✅ Deploy Preview for strchive ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a JSON schema for STRchive-loci and introduces a new validation script along with dependency management updates. Key changes include:
- Addition of scripts/validate-loci.py to provide JSON schema validation.
- Updates to environment YML files to include the jsonschema dependency.
- Minor modifications to scripts/check-loci.py for data cleanup improvements.
Reviewed Changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| scripts/validate-loci.py | Adds a JSON schema validation script for STRchive loci JSON data. |
| scripts/setup-miniconda-patched-environment.yml | Updates environment dependencies to include jsonschema. |
| scripts/environment.yml | Updates environment dependencies to include jsonschema. |
| scripts/check-loci.py | Adjusts list field transformations and updates logging output. |
Files not reviewed (2)
- data/STRchive-loci.json: Language not supported
- workflow/Snakefile: Language not supported
Comments suppressed due to low confidence (1)
scripts/check-loci.py:211
- The f-string here has conflicting single quotes when accessing the 'id' key. Consider changing the outer string to use double quotes or the inner key to use alternate quoting (e.g., {record["id"]}).
sys.stderr.write(f'Updating {record['id']} {field} from {old} to {record[field]}\n')
|
I think a blank locus would be useful. I have been using this (as you have it now) to begin adding PLIN4 and have no complaints! Edit: maybe possible tags would be good to add too -- it took me some looking to figure out what tags I needed to put on PLIN4. |
laurelhiatt
left a comment
There was a problem hiding this comment.
I like the schema, I like the examples. beautiful
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Macayla-weiner I added some more tag examples. Let me know what you think. Are there other's you'd suggest including? |
Toromtomtom
left a comment
There was a problem hiding this comment.
Thanks for adding the schema! I suggest adding that the genomic coordinates are 1-based rather than 0-based. The wording is adopted from the VCF specification.
data/STRchive-loci.schema.json
Outdated
| "type": [ "string", "null" ] | ||
| }, | ||
| "start_hg38": { | ||
| "description": "Start position in hg38 reference genome", |
There was a problem hiding this comment.
| "description": "Start position in hg38 reference genome", | |
| "description": "Start position in hg38 reference genome, with the 1st base having position 1", |
data/STRchive-loci.schema.json
Outdated
| "type": [ "integer", "null" ] | ||
| }, | ||
| "stop_hg38": { | ||
| "description": "Stop position in hg38 reference genome", |
There was a problem hiding this comment.
| "description": "Stop position in hg38 reference genome", | |
| "description": "Stop position in hg38 reference genome, with the 1st base having position 1", |
data/STRchive-loci.schema.json
Outdated
| "type": [ "integer", "null" ] | ||
| }, | ||
| "start_hg19": { | ||
| "description": "Start position in hg19 reference genome", |
There was a problem hiding this comment.
| "description": "Start position in hg19 reference genome", | |
| "description": "Start position in hg19 reference genome, with the 1st base having position 1", |
data/STRchive-loci.schema.json
Outdated
| "type": [ "integer", "null" ] | ||
| }, | ||
| "stop_hg19": { | ||
| "description": "Stop position in hg19 reference genome", |
There was a problem hiding this comment.
| "description": "Stop position in hg19 reference genome", | |
| "description": "Stop position in hg19 reference genome, with the 1st base having position 1", |
data/STRchive-loci.schema.json
Outdated
| "type": [ "integer", "null" ] | ||
| }, | ||
| "start_t2t": { | ||
| "description": "Start position in chm13-T2T reference genome", |
There was a problem hiding this comment.
| "description": "Start position in chm13-T2T reference genome", | |
| "description": "Start position in chm13-T2T reference genome, with the 1st base having position 1", |
data/STRchive-loci.schema.json
Outdated
| "type": [ "integer", "null" ] | ||
| }, | ||
| "stop_t2t": { | ||
| "description": "Stop position in chm13-T2T reference genome", |
There was a problem hiding this comment.
| "description": "Stop position in chm13-T2T reference genome", | |
| "description": "Stop position in chm13-T2T reference genome, with the 1st base having position 1", |
|
Possible additional changes:
|
@Toromtomtom this is a critical point! These raw values should be bed-style 0-based. They get converted to 1-based for some applications, for example, when creating the UCSC links. I'll make this explicit. |
|
I think this is good to go. I'll leave it hanging for another 24 hours in case there are additional suggestions. Note, this reflects the state of the STRchive json format today. There are some planned breaking changes that will be introduced in a future PR. |
For #94
I've got some basic validation set up. Let me know what else would be useful. Maybe an option to create a new blank locus to make it easier to add new loci to the JSON?