Skip to content

Add JSON schema for STRchive-loci.json#126

Merged
hdashnow merged 7 commits intomainfrom
schema
Jun 6, 2025
Merged

Add JSON schema for STRchive-loci.json#126
hdashnow merged 7 commits intomainfrom
schema

Conversation

@hdashnow
Copy link
Copy Markdown
Member

For #94

I've got some basic validation set up. Let me know what else would be useful. Maybe an option to create a new blank locus to make it easier to add new loci to the JSON?

@netlify
Copy link
Copy Markdown

netlify bot commented Dec 15, 2024

Deploy Preview for strchive ready!

Name Link
🔨 Latest commit 4fcebef
🔍 Latest deploy log https://app.netlify.com/projects/strchive/deploys/68436a275d187900082d0a51
😎 Deploy Preview https://deploy-preview-126--strchive.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@hdashnow hdashnow requested review from Copilot and laurelhiatt and removed request for Copilot April 18, 2025 17:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a JSON schema for STRchive-loci and introduces a new validation script along with dependency management updates. Key changes include:

  • Addition of scripts/validate-loci.py to provide JSON schema validation.
  • Updates to environment YML files to include the jsonschema dependency.
  • Minor modifications to scripts/check-loci.py for data cleanup improvements.

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.

File Description
scripts/validate-loci.py Adds a JSON schema validation script for STRchive loci JSON data.
scripts/setup-miniconda-patched-environment.yml Updates environment dependencies to include jsonschema.
scripts/environment.yml Updates environment dependencies to include jsonschema.
scripts/check-loci.py Adjusts list field transformations and updates logging output.
Files not reviewed (2)
  • data/STRchive-loci.json: Language not supported
  • workflow/Snakefile: Language not supported
Comments suppressed due to low confidence (1)

scripts/check-loci.py:211

  • The f-string here has conflicting single quotes when accessing the 'id' key. Consider changing the outer string to use double quotes or the inner key to use alternate quoting (e.g., {record["id"]}).
sys.stderr.write(f'Updating {record['id']} {field} from {old} to {record[field]}\n')

@Macayla-weiner
Copy link
Copy Markdown
Contributor

Macayla-weiner commented May 28, 2025

I think a blank locus would be useful. I have been using this (as you have it now) to begin adding PLIN4 and have no complaints!

Edit: maybe possible tags would be good to add too -- it took me some looking to figure out what tags I needed to put on PLIN4.

Copy link
Copy Markdown
Contributor

@laurelhiatt laurelhiatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the schema, I like the examples. beautiful

hdashnow and others added 3 commits June 2, 2025 19:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@hdashnow
Copy link
Copy Markdown
Member Author

hdashnow commented Jun 3, 2025

I think a blank locus would be useful. I have been using this (as you have it now) to begin adding PLIN4 and have no complaints!

Edit: maybe possible tags would be good to add too -- it took me some looking to figure out what tags I needed to put on PLIN4.

@Macayla-weiner I added some more tag examples. Let me know what you think. Are there other's you'd suggest including?

 "locus_tags": {
        "description": "Tags for the locus, used for grouping similar loci and for flagging loci with specific characteristics",
        "examples": [ "somatic_instability", "contraction", "anticipation", "conflicting_evidence", "sparse_evidence", "maternal_expansion", "length_affects_penetrance", "length_affects_severity", "length_affects_onset" ],
        "type": "array",
        "items": {
          "type": [ "string", "null" ]
        }
      },
      "disease_tags": {
        "description": "Tags for the disease, used for grouping similar diseases",
        "examples": [ "ataxia" ],
        "type": "array",
        "items": {
          "type": [ "string", "null" ]
        }

Copy link
Copy Markdown

@Toromtomtom Toromtomtom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the schema! I suggest adding that the genomic coordinates are 1-based rather than 0-based. The wording is adopted from the VCF specification.

"type": [ "string", "null" ]
},
"start_hg38": {
"description": "Start position in hg38 reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Start position in hg38 reference genome",
"description": "Start position in hg38 reference genome, with the 1st base having position 1",

"type": [ "integer", "null" ]
},
"stop_hg38": {
"description": "Stop position in hg38 reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Stop position in hg38 reference genome",
"description": "Stop position in hg38 reference genome, with the 1st base having position 1",

"type": [ "integer", "null" ]
},
"start_hg19": {
"description": "Start position in hg19 reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Start position in hg19 reference genome",
"description": "Start position in hg19 reference genome, with the 1st base having position 1",

"type": [ "integer", "null" ]
},
"stop_hg19": {
"description": "Stop position in hg19 reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Stop position in hg19 reference genome",
"description": "Stop position in hg19 reference genome, with the 1st base having position 1",

"type": [ "integer", "null" ]
},
"start_t2t": {
"description": "Start position in chm13-T2T reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Start position in chm13-T2T reference genome",
"description": "Start position in chm13-T2T reference genome, with the 1st base having position 1",

"type": [ "integer", "null" ]
},
"stop_t2t": {
"description": "Stop position in chm13-T2T reference genome",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Stop position in chm13-T2T reference genome",
"description": "Stop position in chm13-T2T reference genome, with the 1st base having position 1",

@hdashnow
Copy link
Copy Markdown
Member Author

hdashnow commented Jun 3, 2025

Possible additional changes:

  • Suggest combining locus_structure and flank_motif
  • Remove mechanism source

@hdashnow
Copy link
Copy Markdown
Member Author

hdashnow commented Jun 5, 2025

Thanks for adding the schema! I suggest adding that the genomic coordinates are 1-based rather than 0-based. The wording is adopted from the VCF specification.

@Toromtomtom this is a critical point! These raw values should be bed-style 0-based. They get converted to 1-based for some applications, for example, when creating the UCSC links. I'll make this explicit.

@hdashnow
Copy link
Copy Markdown
Member Author

hdashnow commented Jun 5, 2025

I think this is good to go. I'll leave it hanging for another 24 hours in case there are additional suggestions.

Note, this reflects the state of the STRchive json format today. There are some planned breaking changes that will be introduced in a future PR.

@hdashnow hdashnow merged commit 6b5e450 into main Jun 6, 2025
2 checks passed
@hdashnow hdashnow deleted the schema branch June 6, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants