Skip to content

Commit 70ca3eb

Browse files
committed
The -s, --samples option was not working properly
Now is also supports sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2` to include all samples but sample1 and sample2 Resolves #2380
1 parent 05621cf commit 70ca3eb

File tree

6 files changed

+198
-38
lines changed

6 files changed

+198
-38
lines changed

NEWS

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,16 @@ Changes affecting the whole of bcftools, or multiple commands:
1212

1313
Changes affecting specific commands:
1414

15-
* bcftools +vrfs
16-
17-
- New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles)
18-
from a large number of unaffected parental samples
19-
2015
* bcftools annotate
2116

2217
- Fix Number in the header definition of transferred FILTER and ID tags (#2335)
2318

19+
* bcftools call
20+
21+
- The `-s, --samples` option was not working properly, now also supporting
22+
sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2`
23+
to include all samples but sample1 and sample2 (#2380)
24+
2425
* bcftools consensus
2526

2627
- Preserve entire missing gVCF blocks with --missing (#2350)
@@ -127,6 +128,11 @@ Changes affecting specific commands:
127128

128129
- HTML/JavaScript visualization of bcftools/roh output and homozygosity rate.
129130

131+
* bcftools +vrfs
132+
133+
- New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles)
134+
from a large number of unaffected parental samples
135+
130136

131137
## Release 1.21 (12th September 2024)
132138

test/samples.1.out

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##bcftoolsVersion=1.21-131-g05621cf-dirty+htslib-1.21-59-g45e1390-dirty
4+
##bcftoolsCommand=mpileup -Ou -f samples.fa samples.sam
5+
##reference=file://samples.fa
6+
##contig=<ID=NC_000074.7:89373943-89415103,length=41161>
7+
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
8+
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
9+
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
10+
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
11+
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
12+
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
13+
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
14+
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
15+
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
16+
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
17+
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
18+
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
19+
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
20+
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
21+
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
22+
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
23+
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
24+
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
25+
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
26+
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
27+
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
28+
NC_000074.7:89373943-89415103 1 . G . 63.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
29+
NC_000074.7:89373943-89415103 2 . G . 47.5868 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
30+
NC_000074.7:89373943-89415103 3 . T . 62.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
31+
NC_000074.7:89373943-89415103 4 . G . 69.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
32+
NC_000074.7:89373943-89415103 5 . G . 62.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
33+
NC_000074.7:89373943-89415103 6 . G . 71.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
34+
NC_000074.7:89373943-89415103 7 . T . 62.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
35+
NC_000074.7:89373943-89415103 8 . G . 64.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
36+
NC_000074.7:89373943-89415103 9 . G . 56.5868 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1
37+
NC_000074.7:89373943-89415103 10 . G . 68.5869 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:1

test/samples.2.out

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##bcftoolsVersion=1.21-131-g05621cf-dirty+htslib-1.21-59-g45e1390-dirty
4+
##bcftoolsCommand=mpileup -Ou -f samples.fa samples.sam
5+
##reference=file://samples.fa
6+
##contig=<ID=NC_000074.7:89373943-89415103,length=41161>
7+
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
8+
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
9+
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
10+
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
11+
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
12+
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
13+
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
14+
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
15+
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
16+
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
17+
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
18+
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
19+
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
20+
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
21+
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
22+
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
23+
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
24+
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
25+
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
26+
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
27+
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample3
28+
NC_000074.7:89373943-89415103 1 . G . 114.588 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
29+
NC_000074.7:89373943-89415103 2 . G . 73.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
30+
NC_000074.7:89373943-89415103 3 . T . 111.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
31+
NC_000074.7:89373943-89415103 4 . G . 129.588 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
32+
NC_000074.7:89373943-89415103 5 . G . 111.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
33+
NC_000074.7:89373943-89415103 6 . G . 134.588 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
34+
NC_000074.7:89373943-89415103 7 . T . 111.587 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
35+
NC_000074.7:89373943-89415103 8 . G . 116.588 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
36+
NC_000074.7:89373943-89415103 9 . G . 96.5873 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3
37+
NC_000074.7:89373943-89415103 10 . G . 126.588 . DP=6;MQ0F=0;AN=2;DP4=6,0,0,0;MQ=60 GT:AD 0/0:3

test/samples.vcf

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
##fileformat=VCFv4.2
2+
##FILTER=<ID=PASS,Description="All filters passed">
3+
##bcftoolsVersion=1.21-131-g05621cf-dirty+htslib-1.21-59-g45e1390-dirty
4+
##bcftoolsCommand=mpileup -Ou -f samples.fa samples.sam
5+
##reference=file://samples.fa
6+
##contig=<ID=NC_000074.7:89373943-89415103,length=41161>
7+
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
8+
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
9+
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
10+
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
11+
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
12+
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
13+
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
14+
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
15+
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
16+
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
17+
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
18+
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
19+
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
20+
##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">
21+
##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
22+
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
23+
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
24+
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3
25+
NC_000074.7:89373943-89415103 1 . G <*> 0 . DP=6;I16=6,0,0,0,204,6936,0,0,360,21600,0,0,0,0,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,34:1,0 0,6,62:2,0 0,9,85:3,0
26+
NC_000074.7:89373943-89415103 2 . G <*> 0 . DP=6;I16=6,0,0,0,108,1944,0,0,360,21600,0,0,6,6,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,18:1,0 0,6,33:2,0 0,9,44:3,0
27+
NC_000074.7:89373943-89415103 3 . T <*> 0 . DP=6;I16=6,0,0,0,198,6534,0,0,360,21600,0,0,12,24,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,33:1,0 0,6,60:2,0 0,9,82:3,0
28+
NC_000074.7:89373943-89415103 4 . G <*> 0 . DP=6;I16=6,0,0,0,240,9600,0,0,360,21600,0,0,18,54,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,40:1,0 0,6,73:2,0 0,9,100:3,0
29+
NC_000074.7:89373943-89415103 5 . G <*> 0 . DP=6;I16=6,0,0,0,198,6534,0,0,360,21600,0,0,24,96,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,33:1,0 0,6,60:2,0 0,9,82:3,0
30+
NC_000074.7:89373943-89415103 6 . G <*> 0 . DP=6;I16=6,0,0,0,252,10584,0,0,360,21600,0,0,30,150,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,42:1,0 0,6,77:2,0 0,9,105:3,0
31+
NC_000074.7:89373943-89415103 7 . T <*> 0 . DP=6;I16=6,0,0,0,198,6534,0,0,360,21600,0,0,36,216,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,33:1,0 0,6,60:2,0 0,9,82:3,0
32+
NC_000074.7:89373943-89415103 8 . G <*> 0 . DP=6;I16=6,0,0,0,210,7350,0,0,360,21600,0,0,42,294,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,35:1,0 0,6,64:2,0 0,9,87:3,0
33+
NC_000074.7:89373943-89415103 9 . G <*> 0 . DP=6;I16=6,0,0,0,162,4374,0,0,360,21600,0,0,48,384,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,27:1,0 0,6,49:2,0 0,9,67:3,0
34+
NC_000074.7:89373943-89415103 10 . G <*> 0 . DP=6;I16=6,0,0,0,234,9126,0,0,360,21600,0,0,54,486,0,0;QS=3,0;MQ0F=0 PL:AD 0,3,39:1,0 0,6,71:2,0 0,9,97:3,0

test/test.pl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,10 @@
431431
run_test(\&test_vcf_head2,$opts,in=>'mpileup.2',out=>'head.1.out',args=>'-s0');
432432
run_test(\&test_vcf_head2,$opts,in=>'mpileup.2',out=>'head.2.out',args=>'-s1');
433433
run_test(\&test_vcf_head2,$opts,in=>'mpileup.2',out=>'head.3.out',args=>'-s2 -h2');
434+
run_test(\&test_vcf_call,$opts,in=>'samples',out=>'samples.1.out',args=>'-m -s sample1');
435+
run_test(\&test_vcf_call,$opts,in=>'samples',out=>'samples.1.out',args=>'-m -s ^sample2,sample3');
436+
run_test(\&test_vcf_call,$opts,in=>'samples',out=>'samples.2.out',args=>'-m -s sample3');
437+
run_test(\&test_vcf_call,$opts,in=>'samples',out=>'samples.2.out',args=>'-m -s ^sample1,sample2');
434438
run_test(\&test_vcf_call,$opts,in=>'call-ploidy.1',out=>'call-ploidy.1.1.out',args=>'-m --ploidy-file {PATH}/call-ploidy.1.txt -S {PATH}/call-ploidy.1.ped');
435439
run_test(\&test_vcf_call,$opts,in=>'mpileup',out=>'mpileup.1.out',args=>'-mv');
436440
run_test(\&test_vcf_call,$opts,in=>'mpileup',out=>'mpileup.2.out',args=>'-mg0');

vcfcall.c

Lines changed: 75 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,12 @@ static char **parse_ped_samples(args_t *args, call_t *call, char **vals, int nva
278278
*/
279279
static void set_samples(args_t *args, const char *fn, int is_file)
280280
{
281-
int i, nlines;
281+
int i, nlines, negate = 0;
282+
if ( fn[0]=='^' )
283+
{
284+
negate = 1;
285+
fn++;
286+
}
282287
char **lines = hts_readlist(fn, is_file, &nlines);
283288
if ( !lines ) error("Could not read the file: %s\n", fn);
284289

@@ -300,43 +305,80 @@ static void set_samples(args_t *args, const char *fn, int is_file)
300305
for (i=0; i<bcf_hdr_nsamples(args->aux.hdr); i++) args->sample2sex[i] = dflt_sex_id;
301306

302307
int *old2new = (int*) malloc(sizeof(int)*bcf_hdr_nsamples(args->aux.hdr));
303-
for (i=0; i<bcf_hdr_nsamples(args->aux.hdr); i++) old2new[i] = -1;
304-
305308
int nsmpl = 0, map_needed = 0;
306-
for (i=0; i<nlines; i++)
309+
if ( !negate )
307310
{
308-
char *ss = lines[i];
309-
while ( *ss && isspace(*ss) ) ss++;
310-
if ( !*ss ) error("Could not parse: %s\n", lines[i]);
311-
if ( *ss=='#' ) continue;
312-
char *se = ss;
313-
while ( *se && !isspace(*se) ) se++;
314-
char x = *se, *xptr = se; *se = 0;
315-
316-
int ismpl = bcf_hdr_id2int(args->aux.hdr, BCF_DT_SAMPLE, ss);
317-
if ( ismpl < 0 ) { fprintf(stderr,"Warning: No such sample in the VCF: %s\n",ss); continue; }
318-
if ( old2new[ismpl] != -1 ) { fprintf(stderr,"Warning: The sample is listed multiple times: %s\n",ss); continue; }
319-
320-
ss = se+(x != '\0');
321-
while ( *ss && isspace(*ss) ) ss++;
322-
if ( !*ss ) ss = "2"; // default ploidy
323-
se = ss;
324-
while ( *se && !isspace(*se) ) se++;
325-
if ( se==ss ) { *xptr = x; error("Could not parse: \"%s\"\n", lines[i]); }
311+
for (i=0; i<bcf_hdr_nsamples(args->aux.hdr); i++) old2new[i] = -1;
312+
for (i=0; i<nlines; i++)
313+
{
314+
char *ss = lines[i];
315+
while ( *ss && isspace(*ss) ) ss++;
316+
if ( !*ss ) error("Could not parse: %s\n", lines[i]);
317+
if ( *ss=='#' ) continue;
318+
char *se = ss;
319+
while ( *se && !isspace(*se) ) se++;
320+
char x = *se, *xptr = se; *se = 0;
321+
322+
int ismpl = bcf_hdr_id2int(args->aux.hdr, BCF_DT_SAMPLE, ss);
323+
if ( ismpl < 0 ) { fprintf(stderr,"Warning: No such sample in the VCF: %s\n",ss); continue; }
324+
if ( old2new[ismpl] != -1 ) { fprintf(stderr,"Warning: The sample is listed multiple times: %s\n",ss); continue; }
325+
326+
ss = se+(x != '\0');
327+
while ( *ss && isspace(*ss) ) ss++;
328+
if ( !*ss ) ss = "2"; // default ploidy
329+
se = ss;
330+
while ( *se && !isspace(*se) ) se++;
331+
if ( se==ss ) { *xptr = x; error("Could not parse: \"%s\"\n", lines[i]); }
332+
333+
char *sex = ss;
334+
if ( ploidy_sex2id(args->ploidy,sex)<0 )
335+
{
336+
if ( sex[1]==0 && (sex[0]=='0' || sex[0]=='1' || sex[0]=='2') ) args->sample2sex[nsmpl] = -1*(sex[0]-'0');
337+
else error("[E::%s] The sex \"%s\" has not been declared in --ploidy/--ploidy-file\n",__func__,sex);
338+
}
339+
else
340+
args->sample2sex[nsmpl] = ploidy_add_sex(args->ploidy,sex);
326341

327-
char *sex = ss;
328-
if ( ploidy_sex2id(args->ploidy,sex)<0 )
342+
if ( ismpl!=nsmpl ) map_needed = 1;
343+
args->samples_map[nsmpl] = ismpl;
344+
old2new[ismpl] = nsmpl;
345+
nsmpl++;
346+
}
347+
if ( nsmpl!=bcf_hdr_nsamples(args->aux.hdr) ) map_needed = 1;
348+
}
349+
else
350+
{
351+
// negate: in this mode the default ploidy must be used for obvious reason - there is no way to
352+
// specify ploidy if the sample name is not shown
353+
for (i=0; i<bcf_hdr_nsamples(args->aux.hdr); i++) old2new[i] = 1; // by default keep the sample
354+
for (i=0; i<nlines; i++)
329355
{
330-
if ( sex[1]==0 && (sex[0]=='0' || sex[0]=='1' || sex[0]=='2') ) args->sample2sex[nsmpl] = -1*(sex[0]-'0');
331-
else error("[E::%s] The sex \"%s\" has not been declared in --ploidy/--ploidy-file\n",__func__,sex);
356+
char *ss = lines[i];
357+
while ( *ss && isspace(*ss) ) ss++;
358+
if ( !*ss ) error("Could not parse: %s\n", lines[i]);
359+
if ( *ss=='#' ) continue;
360+
char *se = ss;
361+
while ( *se && !isspace(*se) ) se++;
362+
*se = 0;
363+
364+
int ismpl = bcf_hdr_id2int(args->aux.hdr, BCF_DT_SAMPLE, ss);
365+
if ( ismpl < 0 ) { fprintf(stderr,"Warning: No such sample in the VCF: %s\n",ss); continue; }
366+
367+
old2new[ismpl] = 0; // do not keep this sample
368+
free(lines[i]);
332369
}
333-
else
334-
args->sample2sex[nsmpl] = ploidy_add_sex(args->ploidy,sex);
335-
336-
if ( ismpl!=nsmpl ) map_needed = 1;
337-
args->samples_map[nsmpl] = ismpl;
338-
old2new[ismpl] = nsmpl;
339-
nsmpl++;
370+
free(lines);
371+
lines = malloc(sizeof(*lines)*bcf_hdr_nsamples(args->aux.hdr));
372+
nsmpl = 0;
373+
for (i=0; i<bcf_hdr_nsamples(args->aux.hdr); i++)
374+
{
375+
if ( !old2new[i] ) continue;
376+
lines[nsmpl] = strdup(args->aux.hdr->samples[i]);
377+
args->samples_map[nsmpl] = i;
378+
old2new[i] = nsmpl;
379+
nsmpl++;
380+
}
381+
map_needed = 1;
340382
}
341383

342384
for (i=0; i<args->aux.nfams; i++)

0 commit comments

Comments
 (0)