Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
e66bd6f
Support variable length of extra fields in bed
nakib103 Sep 29, 2025
d1782af
fix: add -f to index generation
ainefairbrother Oct 3, 2025
04d3de3
fix: optimise mem for VEP step
ainefairbrother Oct 3, 2025
0cc63d0
Merge pull request #43 from ainefairbrother/small-patches
nakib103 Oct 6, 2025
c319498
fix: adjust mem alloc
ainefairbrother Oct 7, 2025
7a07391
fix: adjust mem alloc
ainefairbrother Oct 7, 2025
f5a8d11
docs: remove inline comments
ainefairbrother Oct 7, 2025
363e03c
Added initial script for merging call and region files
likhitha-surapaneni Oct 1, 2025
ae9b490
Merge pull request #44 from ainefairbrother/mem-patches
nakib103 Oct 7, 2025
55b7437
Add logic for combining ALTs
likhitha-surapaneni Oct 14, 2025
628b3e9
Added logs and validation
likhitha-surapaneni Oct 14, 2025
a145ce1
Update post-handover test script and sanity DCs
nakib103 Oct 29, 2025
131dac1
skip bw source count test - too much time
nakib103 Oct 29, 2025
a162d39
Add missing module
nakib103 Oct 29, 2025
8642f28
Optionally skip xfail tests
nakib103 Oct 29, 2025
cc369cf
Add skip_xfail to run_datacheck
nakib103 Oct 29, 2025
cc2b038
handle special case
nakib103 Oct 29, 2025
34767e6
Provide more meaningful error message
nakib103 Oct 29, 2025
2813a2e
minor update
nakib103 Oct 30, 2025
6e41c0a
Fix symlink issue in track files
nakib103 Oct 30, 2025
02e1c1c
tweak params, update freq test tolerance
nakib103 Nov 4, 2025
44ab80e
typo
nakib103 Nov 4, 2025
4619491
Update minimise alllele function to match ensembl-vep repo
nakib103 Nov 4, 2025
c78fabb
Merge pull request #47 from nakib103/minimise_allele_upd
nakib103 Nov 5, 2025
c9c308a
humans need more time
nakib103 Nov 5, 2025
ed885fa
More verbose bb bw tests
nakib103 Nov 11, 2025
cdbeede
typos
nakib103 Nov 11, 2025
ccdc42a
Fix names in sv bed json
nakib103 Nov 18, 2025
50f2735
Merge pull request #42 from nakib103/feature/bed_fields
ainefairbrother Nov 18, 2025
6297709
Generate segdup track files for HPRC
nakib103 Nov 19, 2025
a3f0c03
Make extra fields optional
nakib103 Nov 21, 2025
556a679
Keep only ensembl like chromosome name
nakib103 Nov 21, 2025
613afe9
typos
nakib103 Nov 21, 2025
588cc66
Add chm13 prefix
nakib103 Nov 21, 2025
992dbbc
Add synonyms
nakib103 Nov 24, 2025
acbe255
Merge pull request #48 from nakib103/hprc_segdup
nakib103 Dec 1, 2025
6135cb2
feat: add clinvar to sources json
ainefairbrother Dec 1, 2025
b7c7da7
feat: add clinvar to sources json
ainefairbrother Dec 1, 2025
2a9495f
feat: add clinvar to sources json
ainefairbrother Dec 1, 2025
ce9b418
Merge pull request #49 from ainefairbrother/add-clinvar-source
nakib103 Dec 1, 2025
6f7ead2
fix: release ID param
ainefairbrother Dec 1, 2025
9948c07
Add script to update variant id in HGSV files
nakib103 Dec 2, 2025
e92a45a
Force create bgzip if already exists
nakib103 Dec 2, 2025
a96828e
Merge pull request #51 from nakib103/update_ids
nakib103 Dec 3, 2025
3d08e9d
fix: add conditional for the case when no population data is avail
ainefairbrother Dec 4, 2025
699b862
handle SVLEN for none, handle ALT; imporve xml parsing speed
likhitha-surapaneni Dec 5, 2025
f968f87
Merge pull request #46 from nakib103/script_update
nakib103 Dec 9, 2025
4520b48
Fix DC script missing logging setup
nakib103 Dec 10, 2025
9965eab
Check if chrom 1 exist before querying
nakib103 Dec 10, 2025
100a253
Add space
nakib103 Dec 10, 2025
41a14f2
Check gt is 1 before adding to haplotype
nakib103 Dec 10, 2025
cd4adf4
use bash variable instead of nextfow
nakib103 Dec 11, 2025
2d23629
Need to use cache version in vep config
nakib103 Dec 11, 2025
efed2f7
Keep the node id
nakib103 Dec 11, 2025
e023bcc
Need to use cache version in vep config
nakib103 Dec 11, 2025
b66be68
Merge pull request #54 from nakib103/cache_version
nakib103 Dec 11, 2025
bb93b53
Merge pull request #52 from nakib103/fix_dc
nakib103 Dec 12, 2025
2ac2cad
Proper genotype query and filter duplicate variants
nakib103 Dec 13, 2025
2005e37
Keep node id when updating vcf
nakib103 Dec 14, 2025
d10a7d5
Consider numeric deletion in SPDI and option for not keeping nodeid
nakib103 Dec 14, 2025
2edd326
Use ref and alt allele in the identifier
nakib103 Dec 14, 2025
309d48d
Update id in population_to_haplotype optionally
nakib103 Dec 14, 2025
3e8a5be
Use of logger
nakib103 Dec 14, 2025
a5b7297
Set log level
nakib103 Dec 14, 2025
8b6d8eb
Add MAGIC-16 population
nakib103 Dec 18, 2025
c286465
Update wheat pop name
nakib103 Dec 22, 2025
ca28e17
Merge pull request #55 from nakib103/wheat_freq
nakib103 Dec 22, 2025
c452ce2
Bugfix: multiple source construction
nakib103 Dec 23, 2025
857010a
Merge pull request #56 from nakib103/fix_multi_sources
nakib103 Dec 23, 2025
d04dec6
Add MAGIC-16 as source in sources_meta config
nakib103 Dec 25, 2025
eab195f
Merge pull request #57 from nakib103/wheat_freq
nakib103 Dec 25, 2025
dab55b4
Merge branch 'dev-v1.0' into plants
nakib103 Dec 25, 2025
88681b3
Merge pull request #58 from Ensembl/plants
nakib103 Dec 25, 2025
0dc5c80
fix: remove redundant check
ainefairbrother Jan 7, 2026
26d3de4
fix: revert to previous ver of summary_stats
ainefairbrother Jan 7, 2026
17df509
Merge pull request #50 from ainefairbrother/add-clinvar-source
nakib103 Jan 7, 2026
e115289
Added logs for mismatched region and call seq regions
likhitha-surapaneni Jan 7, 2026
cca6195
Minor changes
likhitha-surapaneni Jan 7, 2026
10219c5
Merge pull request #53 from nakib103/fix_hgsv_script
nakib103 Jan 8, 2026
eeba243
Use ALLELE_NUM instead of alleles for matching in summary_stats
nakib103 Jan 7, 2026
c067cab
More restrictive regex for human haplo gnomAD data
nakib103 Jan 7, 2026
c7f963c
Remove hard-coded way to get csq header index
nakib103 Jan 9, 2026
6c14eac
Fix derefencing issue
nakib103 Jan 9, 2026
da8ecfd
index need to be str
nakib103 Jan 12, 2026
bcfb540
Update according to review
nakib103 Jan 15, 2026
e91e7ca
Default allele number is 1
nakib103 Jan 15, 2026
ae0701b
Merge pull request #45 from likhitha-surapaneni/feature/merge_dbVar
nakib103 Jan 15, 2026
cd0793e
Merge pull request #59 from nakib103/ss_allele
nakib103 Jan 16, 2026
e5d92e7
Bugfix: wrong field extraction in vcf_to_bed
nakib103 Jan 21, 2026
be574f2
Query only region avail in VCF for bw - reduce time
nakib103 Jan 21, 2026
54adb9d
Add --no_rvariants option
nakib103 Jan 27, 2026
db412a3
Merge pull request #60 from nakib103/ss_allele
ainefairbrother Feb 9, 2026
e7866a5
Modify MAGIC-16 source metadata
nakib103 Feb 9, 2026
44827a2
Merge pull request #63 from nakib103/ena_modify
nakib103 Feb 9, 2026
7d1b00d
use gencode_primary filter only for human GRCh38
nakib103 Feb 23, 2026
1e538e2
Merge pull request #66 from nakib103/gencode_primary_filter
nakib103 Feb 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 0 additions & 71 deletions datachecks/conftest.py

This file was deleted.

56 changes: 21 additions & 35 deletions datachecks/run_datachecks.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,18 @@ def parse_args(args=None):
"""
parser = argparse.ArgumentParser()

parser.add_argument("--dir", dest="dir", type=str, default=os.getcwd())
parser.add_argument("--dir", dest="dir", type=str, default=os.getcwd(), help="Directory containining pipeline output files")
parser.add_argument("--input_config", dest="input_config", type=str)
parser.add_argument(
"-O", "--output_dir", dest="output_dir", type=str, default=os.getcwd()
"-O", "--output_dir", dest="output_dir", type=str, default=os.path.join(os.getcwd(), "output")
)
parser.add_argument("-M", "--mem", dest="memory", type=str, default="2000")
parser.add_argument("-t", "--time", dest="time", type=str, default="01:00:00")
parser.add_argument("-M", "--mem", dest="memory", type=str, default="6000")
parser.add_argument("-t", "--time", dest="time", type=str, default="05:00:00")
parser.add_argument(
"-p", "--partition", dest="partition", type=str, default="production"
)
parser.add_argument(
"--mail-user",
dest="mail_user",
type=str,
default=getpass.getuser() + "@ebi.ac.uk",
)
parser.add_argument("--mail-user", dest="mail_user", type=str)
parser.add_argument("--skip-xfail", dest="skip_xfail", action="store_true")
parser.add_argument(type=str, nargs="?", dest="tests", default="./")

return parser.parse_args(args)
Expand Down Expand Up @@ -134,15 +130,16 @@ def main(args=None):
input_config = args.input_config or None
dir = args.dir
output_dir = args.output_dir
tmp_dir = os.path.join(os.getcwd(), "tmp")

# create output directory if not already exist
os.makedirs(output_dir, exist_ok=True)
os.makedirs(tmp_dir, exist_ok=True)

species_metadata = {}
if input_config is not None:
species_metadata = get_species_metadata(input_config)

vcf_files = []
api_outdir = os.path.join(dir, "api")
track_outdir = os.path.join(dir, "tracks")
for genome_uuid in os.listdir(api_outdir):
Expand All @@ -159,20 +156,26 @@ def main(args=None):
bigbed = os.path.join(track_outdir, genome_uuid, "variant-details.bb")
bigwig = os.path.join(track_outdir, genome_uuid, "variant-summary.bw")

timestamp = int(datetime.datetime.now().timestamp())
with open(f"dc_{timestamp}.sh", "w") as file:
skip_xfail_arg = ""
if args.skip_xfail:
skip_xfail_arg = "--skip_xfail"

script_file = os.path.join(tmp_dir, f"dc_{species}.sh")
with open(script_file, "w") as file:
file.write("#!/bin/bash\n\n")

file.write(f"#SBATCH --time={args.time}\n")
file.write(f"#SBATCH --mem={args.memory}\n")
file.write(f"#SBATCH --partition={args.partition}\n")
file.write(f"#SBATCH --mail-user={args.mail_user}\n")
file.write(f"#SBATCH --mail-type=END\n")
file.write(f"#SBATCH --mail-type=FAIL\n")
if args.mail_user is not None:
file.write(f"#SBATCH --mail-user={args.mail_user}\n")
file.write("#SBATCH --mail-type=END\n")
file.write("#SBATCH --mail-type=FAIL\n")
file.write("\n")

file.write("module load bcftools\n")
file.write(
f"pytest --source_vcf={source_vcf} --bigbed={bigbed} --bigwig={bigwig} --vcf={vcf} --species={species} {args.tests}\n"
f"pytest --tap --tb=short {skip_xfail_arg} --source_vcf={source_vcf} --bigbed={bigbed} --bigwig={bigwig} --vcf={vcf} --species={species} {args.tests}\n"
)

subprocess.run(
Expand All @@ -184,27 +187,10 @@ def main(args=None):
f"{output_dir}/dc_{species}.out",
"--error",
f"{output_dir}/dc_{species}.err",
f"dc_{timestamp}.sh",
script_file,
]
)

# subprocess.run([
# "bsub",
# f"-M{args.memory}",
# "-q", f"{args.parition}",
# "-J", f"dc_{species}",
# "-oo", f"{output_dir}/dc_{species}.out",
# "-eo", f"{output_dir}/dc_{species}.err",
# f"pytest " + \
# f"--source_vcf={source_vcf} " + \
# f"--bigbed={bigbed} " + \
# f"--bigwig={bigwig} " + \
# f"--vcf={vcf} " + \
# f"--species={species} " + \
# f"{args.tests}"
# ]
# )


if __name__ == "__main__":
sys.exit(main())
Loading
Loading