Skip to content

Commit 35e2ef9

Browse files
committed
RF+ENH: reproin - allow for an empty key in protocols2fix to provide fixups for any study
For some data we cannot rely on reliable presence of consistent study_description across scans. That results in the necessity to duplicate the same protocols2fix records for multiple (not known in advance) study descriptions (actually -- their hashes, which makes it even more difficult). But in the cases with custom overloads of protocols2fix, it is desired to just provide global rewrite rules, which could be applied to any collection since they did follow some convention. Now, with an empty "hash", those rules would be applied last, i.e. after possible study-specific fixups applied. As part of the solution I removed preliminary check to skip a call to fix_dbic_protocol altogether. It should be better this way. I also removed binding of series_spec_fields and protocols2fix within function signature. It should allow easier overloading at the module level when needed
1 parent fdea77c commit 35e2ef9

File tree

1 file changed

+29
-20
lines changed

1 file changed

+29
-20
lines changed

heudiconv/heuristics/reproin.py

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -391,26 +391,38 @@ def fix_canceled_runs(seqinfo, accession2run=fix_accession2run):
391391
return seqinfo
392392

393393

394-
def fix_dbic_protocol(seqinfo, keys=series_spec_fields, subsdict=protocols2fix):
394+
def fix_dbic_protocol(seqinfo, keys=None, subsdict=None):
395395
"""Ad-hoc fixup for existing protocols
396396
"""
397+
if subsdict is None:
398+
subsdict = protocols2fix
399+
if keys is None:
400+
keys = series_spec_fields
401+
397402
study_hash = get_study_hash(seqinfo)
398403

399-
if study_hash not in subsdict:
400-
raise ValueError("I don't know how to fix {0}".format(study_hash))
401-
402-
# need to replace both protocol_name series_description
403-
substitutions = subsdict[study_hash]
404-
for i, s in enumerate(seqinfo):
405-
fixed_kwargs = dict()
406-
for key in keys:
407-
value = getattr(s, key)
408-
# replace all I need to replace
409-
for substring, replacement in substitutions:
410-
value = re.sub(substring, replacement, value)
411-
fixed_kwargs[key] = value
412-
# namedtuples are immutable
413-
seqinfo[i] = s._replace(**fixed_kwargs)
404+
# We will consider study specific (based on hash) and global (if key is "",
405+
# ie empty string) and in that order substitutions
406+
candidate_substitutions = (
407+
('study (%s) specific' % study_hash, study_hash),
408+
('global', ''),
409+
)
410+
for subs_scope, subs_key in candidate_substitutions:
411+
if subs_key not in subsdict:
412+
continue
413+
substitutions = subsdict[subs_key]
414+
lgr.info("Considering %s substitutions", subs_scope)
415+
for i, s in enumerate(seqinfo):
416+
fixed_kwargs = dict()
417+
# need to replace both protocol_name series_description
418+
for key in keys:
419+
value = getattr(s, key)
420+
# replace all I need to replace
421+
for substring, replacement in substitutions:
422+
value = re.sub(substring, replacement, value)
423+
fixed_kwargs[key] = value
424+
# namedtuples are immutable
425+
seqinfo[i] = s._replace(**fixed_kwargs)
414426

415427
return seqinfo
416428

@@ -420,10 +432,7 @@ def fix_seqinfo(seqinfo):
420432
"""
421433
# add cancelme to known bad runs
422434
seqinfo = fix_canceled_runs(seqinfo)
423-
study_hash = get_study_hash(seqinfo)
424-
if study_hash in protocols2fix:
425-
lgr.info("Fixing up protocol for {0}".format(study_hash))
426-
seqinfo = fix_dbic_protocol(seqinfo)
435+
seqinfo = fix_dbic_protocol(seqinfo)
427436
return seqinfo
428437

429438

0 commit comments

Comments
 (0)