diff --git a/.gitignore b/.gitignore index 1fad859c..f3d74a9a 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,2 @@ -site_*.py *.pyc *~ diff --git a/README.md b/README.md index b88c5b05..50df689a 100644 --- a/README.md +++ b/README.md @@ -22,18 +22,37 @@ instance to the database. ### Installation -The 'genologics' directory should be made accessible in your Python path, -by whatever method suits your installation. +``` +pip install genologics +``` + +or for the cutting edge version: + +``` +pip install https://github.com/SciLifeLab/genologics/tarball/master +``` ### Usage -The client script imports the class Lims from the genologics.lims module, -and instantiates it with the required arguments: +The URL and credentials should be wrintten in a new file in any +of those config files (ordered by preference): + +``` +$HOME/.genologicsrc, .genologicsrc, genologics.conf, genologics.cfg +``` + +or if installed system_wide: -- Base URI of the server, including the port number, but excluding - the '/api/v1' segment of the path. -- User name of the account on the server. -- Password of the account on the server. +``` +/etc/genologics.conf +``` + +``` +[genologics] +BASEURI=https://yourlims.example.com:8443 +USERNAME=your_username +PASSWORD=your_password +``` ### Example scripts @@ -43,13 +62,9 @@ NOTE: The example files rely on specific entities and configurations on the server, and use base URI, user name and password, so to work for your server, all these must be reviewed and modified. -### Caveats - -The interface has not been used much yet, so it is not properly debugged. +### Known bugs -Known issues: - Artifact state is part of its URL (as a query parameter). It is not entirely clear how to deal with this in the Lims.cache: Currently, an artifact that has the current state may be represented by a URL that includes the state, and another that does not contain it. - diff --git a/__init__.py b/__init__.py deleted file mode 100644 index a47e218d..00000000 --- a/__init__.py +++ /dev/null @@ -1,5 +0,0 @@ -"""Python interface to GenoLogics LIMS via its REST API. - -Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. -Copyright (C) 2012 Per Kraulis -""" diff --git a/examples/attach_delivery_report.py b/examples/attach_delivery_report.py new file mode 100644 index 00000000..1daf671a --- /dev/null +++ b/examples/attach_delivery_report.py @@ -0,0 +1,31 @@ +"""Python interface to GenoLogics LIMS via its REST API. + +Usage example: Attach customer delivery report to LIMS + + + +Roman Valls Guimera, Science for Life Laboratory, Stockholm, Sweden. +""" + +import codecs +from pprint import pprint +from genologics.lims import * + +# Login parameters for connecting to a LIMS instance. +from genologics.config import BASEURI, USERNAME, PASSWORD + +# Create the LIMS interface instance, and check the connection and version. +lims = Lims(BASEURI, USERNAME, PASSWORD) +lims.check_version() + +project = Project(lims, id="P193") + +print 'UDFs:' +pprint(project.udf.items()) + +print 'files:' +for file in project.files: + print file.content_location + +project.udf['Delivery Report'] = "http://example.com/delivery_note.pdf" +project.put() diff --git a/examples/get_application.py b/examples/get_application.py new file mode 100644 index 00000000..ad8e71a8 --- /dev/null +++ b/examples/get_application.py @@ -0,0 +1,26 @@ +"""Python interface to GenoLogics LIMS via its REST API. + +Usage example: Attach customer delivery report to LIMS + + + +Roman Valls Guimera, Science for Life Laboratory, Stockholm, Sweden. +""" + +import codecs +from pprint import pprint +from genologics.lims import * + +# Login parameters for connecting to a LIMS instance. +from genologics.config import BASEURI, USERNAME, PASSWORD + +# Create the LIMS interface instance, and check the connection and version. +lims = Lims(BASEURI, USERNAME, PASSWORD) +lims.check_version() + +project = Project(lims, id="P193") + +print 'UDFs:' +pprint(project.udf.items()) + +print project.udf['Application'] diff --git a/examples/get_artifacts.py b/examples/get_artifacts.py index fd656625..3c640cdc 100644 --- a/examples/get_artifacts.py +++ b/examples/get_artifacts.py @@ -2,7 +2,7 @@ Usage example: Get artifacts and artifact info. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -12,8 +12,7 @@ from genologics.lims import Lims # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/examples/get_containers.py b/examples/get_containers.py index 78b4b900..99a5e117 100644 --- a/examples/get_containers.py +++ b/examples/get_containers.py @@ -2,7 +2,7 @@ Usage example: Get some containers. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -12,8 +12,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) @@ -39,3 +38,17 @@ containertype = container.type print containertype, containertype.name, containertype.x_dimension, containertype.y_dimension + + + +containers = lims.get_containers(type='Illumina Flow Cell',state='Populated') +for container in containers: + print container.name + print container.id + print container.placements.keys() + arts=lims.get_artifacts(containername=container.name) + for art in arts: + print art.name + print art.type + print art.udf.items() + print art.parent_process.type.name diff --git a/examples/get_labs.py b/examples/get_labs.py index f80ef30c..6a737910 100644 --- a/examples/get_labs.py +++ b/examples/get_labs.py @@ -2,7 +2,7 @@ Usage example: Get labs and lab info. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -11,8 +11,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/examples/get_processes.py b/examples/get_processes.py index 06941d3d..6e2d59eb 100644 --- a/examples/get_processes.py +++ b/examples/get_processes.py @@ -2,7 +2,7 @@ Usage example: Get some processes. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -10,8 +10,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/examples/get_projects.py b/examples/get_projects.py index da62706b..47c102e7 100644 --- a/examples/get_projects.py +++ b/examples/get_projects.py @@ -2,7 +2,7 @@ Usage example: Get some projects. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -12,8 +12,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) @@ -29,7 +28,7 @@ print len(projects), 'projects opened since', day # Get the project with the specified LIMS id, and print some info. -project = Project(lims, id='KRA61') +project = Project(lims, id='P193') print project, project.name, project.open_date, project.close_date print ' UDFs:' @@ -45,10 +44,6 @@ value = codecs.encode(value, 'UTF-8') print ' ', key, '=', value -print ' notes:' -for note in project.notes: - print note.uri, note.content - print ' files:' for file in project.files: print file.id diff --git a/examples/get_samples.py b/examples/get_samples.py index 33e1246a..e5ad890d 100644 --- a/examples/get_samples.py +++ b/examples/get_samples.py @@ -2,7 +2,7 @@ Usage examples: Get some samples, and sample info. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -10,8 +10,8 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD + +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/examples/get_samples2.py b/examples/get_samples2.py index f4a32c5a..f66db09c 100644 --- a/examples/get_samples2.py +++ b/examples/get_samples2.py @@ -2,14 +2,14 @@ Usage examples: Get some samples, and sample info. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ from genologics.lims import * -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD lims = Lims(BASEURI, USERNAME, PASSWORD) lims.check_version() diff --git a/examples/set_project_queued.py b/examples/set_project_queued.py index ba4ea77d..2e407b1b 100644 --- a/examples/set_project_queued.py +++ b/examples/set_project_queued.py @@ -2,7 +2,7 @@ Example usage: Set the UDF 'Queued' of a project. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -12,8 +12,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/examples/set_sample_name.py b/examples/set_sample_name.py index 670441b9..a147f149 100644 --- a/examples/set_sample_name.py +++ b/examples/set_sample_name.py @@ -2,7 +2,7 @@ Example usage: Set the name and a UDF of a sample. -NOTE: You need to set the BASEURI, USERNAME AND PASSWORD. + Per Kraulis, Science for Life Laboratory, Stockholm, Sweden. """ @@ -10,8 +10,7 @@ from genologics.lims import * # Login parameters for connecting to a LIMS instance. -# NOTE: Modify according to your setup. -from genologics.site_cloud import BASEURI, USERNAME, PASSWORD +from genologics.config import BASEURI, USERNAME, PASSWORD # Create the LIMS interface instance, and check the connection and version. lims = Lims(BASEURI, USERNAME, PASSWORD) diff --git a/genologics/__init__.py b/genologics/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/genologics/config.py b/genologics/config.py new file mode 100644 index 00000000..e5512b1d --- /dev/null +++ b/genologics/config.py @@ -0,0 +1,22 @@ +import os +import sys +import warnings + +import ConfigParser + +config = ConfigParser.SafeConfigParser() +try: + conf_file = config.read([os.path.expanduser('~/.genologicsrc'), '.genologicsrc', + 'genologics.conf', 'genologics.cfg', '/etc/genologics.conf']) + + # First config file found wins + config.readfp(open(conf_file[0])) + + BASEURI = config.get('genologics', 'BASEURI').rstrip() + USERNAME = config.get('genologics', 'USERNAME').rstrip() + PASSWORD = config.get('genologics', 'PASSWORD').rstrip() +except: + warnings.warn("Please make sure you've created your own Genologics configuration file (i.e: ~/.genologicsrc) as stated in README.md") + sys.exit(-1) + + diff --git a/entities.py b/genologics/entities.py similarity index 89% rename from entities.py rename to genologics/entities.py index 0874a181..1d9d6ff5 100644 --- a/entities.py +++ b/genologics/entities.py @@ -13,27 +13,33 @@ from xml.etree import ElementTree _NSMAP = dict( - artgr='http://genologics.com/ri/artifactgroup', art='http://genologics.com/ri/artifact', + artgr='http://genologics.com/ri/artifactgroup', cnf='http://genologics.com/ri/configuration', con='http://genologics.com/ri/container', ctp='http://genologics.com/ri/containertype', exc='http://genologics.com/ri/exception', file='http://genologics.com/ri/file', + inst='http://genologics.com/ri/instrument', lab='http://genologics.com/ri/lab', - perm='http://genologics.com/ri/permissions', prc='http://genologics.com/ri/process', prj='http://genologics.com/ri/project', prop='http://genologics.com/ri/property', + protcnf='http://genologics.com/ri/protocolconfiguration', + protstepcnf='http://genologics.com/ri/stepconfiguration', prx='http://genologics.com/ri/processexecution', + ptm='http://genologics.com/ri/processtemplate', ptp='http://genologics.com/ri/processtype', res='http://genologics.com/ri/researcher', - rgt='http://genologics.com/ri/reagent', ri='http://genologics.com/ri', + rt='http://genologics.com/ri/routing', rtp='http://genologics.com/ri/reagenttype', smp='http://genologics.com/ri/sample', + stg='http://genologics.com/ri/stage', + stp='http://genologics.com/ri/step', udf='http://genologics.com/ri/userdefined', - ver='http://genologics.com/ri/version') + ver='http://genologics.com/ri/version', + wkfcnf='http://genologics.com/ri/workflowconfiguration') for prefix, uri in _NSMAP.iteritems(): ElementTree._namespace_map[uri] = prefix @@ -222,11 +228,15 @@ def __setitem__(self, key, value): for node in self._elems: if node.attrib['name'] != key: continue type = node.attrib['type'].lower() + if value is None: pass elif type == 'string': if not isinstance(value, basestring): raise TypeError('String UDF requires str or unicode value') + elif type == 'str': + if not isinstance(value, basestring): + raise TypeError('String UDF requires str or unicode value') elif type == 'text': if not isinstance(value, basestring): raise TypeError('Text UDF requires str or unicode value') @@ -242,6 +252,10 @@ def __setitem__(self, key, value): if not isinstance(value, datetime.date): # Too restrictive? raise TypeError('Date UDF requires datetime.date value') value = str(value) + elif type == 'uri': + if not isinstance(value, basestring): + raise TypeError('URI UDF requires str or punycode (unicode) value') + value = str(value) else: raise NotImplemented("UDF type '%s'" % type) if not isinstance(value, unicode): @@ -263,9 +277,9 @@ def __setitem__(self, key, value): raise NotImplementedError("Cannot handle value of type '%s'" " for UDF" % type(value)) if self._udt: - root = self._instance.root.find(nsmap('udf:type')) + root = self.instance.root.find(nsmap('udf:type')) else: - root = self._instance.root + root = self.instance.root elem = ElementTree.SubElement(root, nsmap('udf:field'), type=type, @@ -278,7 +292,7 @@ def __delitem__(self, key): del self._lookup[key] for node in self._elems: if node.attrib['name'] == key: - self._instance.root.remove(node) + self.instance.root.remove(node) break def items(self): @@ -298,13 +312,9 @@ class UdfDictionaryDescriptor(BaseDescriptor): _UDT = False def __get__(self, instance, cls): - try: - return self.value - except AttributeError: - instance.get() - self.value = UdfDictionary(instance, udt=self._UDT) - return self.value - + instance.get() + self.value = UdfDictionary(instance, udt=self._UDT) + return self.value class UdtDictionaryDescriptor(UdfDictionaryDescriptor): """An instance attribute containing a dictionary of UDF values @@ -320,15 +330,12 @@ class PlacementDictionaryDescriptor(TagDescriptor): """ def __get__(self, instance, cls): - try: - return self.value - except AttributeError: - instance.get() - self.value = dict() - for node in instance.root.findall(self.tag): - key = node.find('value').text - self.value[key] = Artifact(instance.lims,uri=node.attrib['uri']) - return self.value + instance.get() + self.value = dict() + for node in instance.root.findall(self.tag): + key = node.find('value').text + self.value[key] = Artifact(instance.lims,uri=node.attrib['uri']) + return self.value class ExternalidListDescriptor(BaseDescriptor): @@ -394,6 +401,17 @@ def __get__(self, instance, cls): uri = node.find('container').attrib['uri'] return Container(instance.lims, uri=uri), node.find('value').text +class ReagentLabelList(BaseDescriptor): + """An instance attribute yielding a list of reagent labels""" + def __get__(self, instance, cls): + instance.get() + self.value = [] + for node in instance.root.findall('reagent-label'): + try: + self.value.append(node.attrib['name']) + except: + pass + return self.value class InputOutputMapList(BaseDescriptor): """An instance attribute yielding a list of tuples (input, output) @@ -402,16 +420,13 @@ class InputOutputMapList(BaseDescriptor): """ def __get__(self, instance, cls): - try: - return self.value - except AttributeError: - instance.get() - self.value = [] - for node in instance.root.findall('input-output-map'): - input = self.get_dict(instance.lims, node.find('input')) - output = self.get_dict(instance.lims, node.find('output')) - self.value.append((input, output)) - return self.value + instance.get() + self.value = [] + for node in instance.root.findall('input-output-map'): + input = self.get_dict(instance.lims, node.find('input')) + output = self.get_dict(instance.lims, node.find('output')) + self.value.append((input, output)) + return self.value def get_dict(self, lims, node): if node is None: return None @@ -519,6 +534,8 @@ class Researcher(Entity): def name(self): return u"%s %s" % (self.first_name, self.last_name) +class Reagent_label(Entity): + reagent_label = StringDescriptor('reagent-label') class Note(Entity): "Note attached to a project or a sample." @@ -547,7 +564,6 @@ class Project(Entity): researcher = EntityDescriptor('researcher', Researcher) udf = UdfDictionaryDescriptor() udt = UdtDictionaryDescriptor() - notes = EntityListDescriptor('note', Note) files = EntityListDescriptor(nsmap('file:file'), File) externalids = ExternalidListDescriptor() # permissions XXX @@ -649,10 +665,21 @@ class Artifact(Entity): samples = EntityListDescriptor('sample', Sample) udf = UdfDictionaryDescriptor() files = EntityListDescriptor(nsmap('file:file'), File) - # reagent_labels XXX + reagent_labels = ReagentLabelList() # artifact_flags XXX # artifact_groups XXX + def input_artifact_list(self): + """Returns the input artifact ids of the parrent process.""" + input_artifact_list=[] + try: + for tuple in self.parent_process.input_output_maps: + if tuple[1]['limsid'] == self.id: + input_artifact_list.append(tuple[0]['uri'])#['limsid']) + except: + pass + return input_artifact_list + def get_state(self): "Parse out the state value from the URI." parts = urlparse.urlparse(self.uri) diff --git a/lims.py b/genologics/lims.py similarity index 99% rename from lims.py rename to genologics/lims.py index 611da0fe..470a16f3 100644 --- a/lims.py +++ b/genologics/lims.py @@ -22,7 +22,7 @@ class Lims(object): "LIMS interface through which all entity instances are retrieved." - VERSION = 'v1' + VERSION = 'v2' def __init__(self, baseuri, username, password): """baseuri: Base URI for the GenoLogics server, excluding diff --git a/genologics/lims_utils.py b/genologics/lims_utils.py new file mode 100644 index 00000000..62c81a96 --- /dev/null +++ b/genologics/lims_utils.py @@ -0,0 +1,15 @@ +#!/usr/bin/env python +from genologics.lims import * +from genologics.config import BASEURI, USERNAME, PASSWORD +lims = Lims(BASEURI, USERNAME, PASSWORD) + + +def get_run_info(fc): + fc_summary={} + for iom in fc.input_output_maps: + art = iom[0]['uri'] + lane = art.location[1].split(':')[0] + if not fc_summary.has_key(lane): + fc_summary[lane]= dict(art.udf.items()) #"%.2f" % val ----round?? + return fc_summary + diff --git a/scripts/LIMS2DB/flowcell_summary_uppload_LIMS.py b/scripts/LIMS2DB/flowcell_summary_uppload_LIMS.py new file mode 100644 index 00000000..a64a5874 --- /dev/null +++ b/scripts/LIMS2DB/flowcell_summary_uppload_LIMS.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python + +"""Script to load runinfo from the lims process: 'Illumina Sequencing (Illumina SBS) 4.0' +into the flowcell database in statusdb. + +Maya Brandi, Science for Life Laboratory, Stockholm, Sweden. +""" +import sys +import os +import codecs +from optparse import OptionParser +from pprint import pprint +from genologics.lims import * +from genologics.config import BASEURI, USERNAME, PASSWORD +from datetime import date +import genologics.lims_utils as lims_utils +from statusDB_utils import * +import scilifelab.log +lims = Lims(BASEURI, USERNAME, PASSWORD) + +def main(flowcell, all_flowcells,days,conf): + """If all_flowcells: all runs run less than a moth ago are uppdated""" + today = date.today() + couch = load_couch_server(conf) + fc_db = couch['flowcells'] + if all_flowcells: + flowcells = lims.get_processes(type = 'Illumina Sequencing (Illumina SBS) 4.0') + for fc in flowcells: + try: + closed = date(*map(int, fc.date_run.split('-'))) + delta = today-closed + if delta.days < days: + flowcell_name = dict(fc.udf.items())['Flow Cell Position'] + dict(fc.udf.items())['Flow Cell ID'] + key = find_flowcell_from_view(fc_db, flowcell_name) + if key: + dbobj = fc_db.get(key) + dbobj["illumina"]["run_summary"] = lims_utils.get_sequencing_info(fc) + info = save_couchdb_obj(fc_db, dbobj) + LOG.info('flowcell %s %s : _id = %s' % (flowcell_name, info, key)) + except: + pass + elif flowcell is not None: + try: + flowcell_name = flowcell[1:len(flowcell)] + flowcell_position = flowcell[0] + fc = lims.get_processes(type = 'Illumina Sequencing (Illumina SBS) 4.0', + udf = {'Flow Cell ID':flowcell_name,'Flow Cell Position':flowcell_position})[0] + key = find_flowcell_from_view(fc_db, flowcell) + if key: + dbobj = fc_db.get(key) + dbobj["illumina"]["run_summary"] = lims_utils.get_sequencing_info(fc) + info = save_couchdb_obj(fc_db, dbobj) + LOG.info('flowcell %s %s : _id = %s' % (flowcell_name, info, key)) + except: + pass + +if __name__ == '__main__': + usage = "Usage: python flowcell_summary_upload_LIMS.py [options]" + parser = OptionParser(usage=usage) + + parser.add_option("-f", "--flowcell", dest="flowcell_name", default=None, + help = "eg: AD1TAPACXX. Don't use with -a flagg.") + + parser.add_option("-a", "--all_flowcells", dest="all_flowcells", action="store_true", default=False, + help = "Uploads all Lims flowcells into couchDB. Don't use with -f flagg.") + + parser.add_option("-d", "--days", dest="days", default=30, + help="Runs older than DAYS days are not updated. Default is 30 days. Use with -a flagg") + + parser.add_option("-c", "--conf", dest="conf", + default=os.path.join(os.environ['HOME'],'opt/config/post_process.yaml'), + help = "Config file. Default: ~/opt/config/post_process.yaml") + + (options, args) = parser.parse_args() + + LOG = scilifelab.log.file_logger('LOG', options.conf, 'lims2db_flowcells.log') + main(options.flowcell_name, options.all_flowcells, options.days, options.conf) + diff --git a/scripts/LIMS2DB/helpers.py b/scripts/LIMS2DB/helpers.py new file mode 100644 index 00000000..f4aef53f --- /dev/null +++ b/scripts/LIMS2DB/helpers.py @@ -0,0 +1,22 @@ +#!/usr/bin/env python + +from datetime import date + +def comp_dates(a, b): + """Dates in isoformat. Is a < b?""" + a = date(*map(int, a.split('-') )) + b = date(*map(int, b.split('-') )) + delta = a - b + if delta.days < 0: + return True + else: + return False + +def delete_Nones(dict): + "Deletes None type items from dict." + new_dict = {} + for key, val in dict.items(): + if val: + new_dict[key] = val + if new_dict != {}: + return new_dict diff --git a/scripts/LIMS2DB/lims_utils.py b/scripts/LIMS2DB/lims_utils.py new file mode 100644 index 00000000..dc049e48 --- /dev/null +++ b/scripts/LIMS2DB/lims_utils.py @@ -0,0 +1,99 @@ +#!/usr/bin/env python + +"""A module with lims help functions. + +Maya Brandi, Science for Life Laboratory, Stockholm, Sweden. +""" + +from genologics.lims import * +from genologics.config import BASEURI, USERNAME, PASSWORD +lims = Lims(BASEURI, USERNAME, PASSWORD) + +"""process category dictionaries + +In the LIMS2DB context, processes are categorised into groups that define, +or are used to define a certain type of statusdb key. The categories and their +processes are defined here:""" + +INITALQC = {'63' : 'Quant-iT QC (DNA) 4.0', + '65' : 'Quant-iT QC (RNA) 4.0', + '66' : 'Qubit QC (DNA) 4.0', + '68' : 'Qubit QC (RNA) 4.0', + '24' : 'Customer Gel QC', + '20' : 'CaliperGX QC (DNA)', + '16' : 'Bioanalyzer QC (DNA) 4.0', + '18' : 'Bioanalyzer QC (RNA) 4.0', + '116' : 'CaliperGX QC (RNA)', + '48' : 'NanoDrop QC (DNA) 4.0'} +AGRINITQC = {'7' : 'Aggregate QC (DNA) 4.0', + '9' : 'Aggregate QC (RNA) 4.0'} +PREPSTART = {'10' : 'Aliquot Libraries for Hybridization (SS XT)', + '47' : 'mRNA Purification, Fragmentation & cDNA synthesis (TruSeq RNA) 4.0', + '33' : 'Fragment DNA (TruSeq DNA) 4.0'} +PREPEND = {'111' : 'Amplify Captured Libraries to Add Index Tags (SS XT) 4.0', + '109' : 'CA Purification'} +LIBVAL = {'62' : 'qPCR QC (Library Validation) 4.0', + '64' : 'Quant-iT QC (Library Validation) 4.0', + '67' : 'Qubit QC (Library Validation) 4.0', + '20' : 'CaliperGX QC (DNA)', + '17' : 'Bioanalyzer QC (Library Validation) 4.0'} +AGRLIBVAL ={'8': 'Aggregate QC (Library Validation) 4.0'} +SEQSTART = {'40' : 'Library Normalization (MiSeq) 4.0', + '39' : 'Library Normalization (Illumina SBS) 4.0'} +SEQUENCING = {'38' : 'Illumina Sequencing (Illumina SBS) 4.0', + '46' : 'MiSeq Run (MiSeq) 4.0'} + +def get_sequencing_info(fc): + """Input: a process object 'fc', of type 'Illumina Sequencing (Illumina SBS) 4.0', + Output: A dictionary where keys are lanes 1,2,...,8, and values are lane artifact udfs""" + fc_summary={} + for iom in fc.input_output_maps: + art = iom[0]['uri'] + lane = art.location[1].split(':')[0] + if not fc_summary.has_key(lane): + fc_summary[lane]= dict(art.udf.items()) #"%.2f" % val ----round?? + return fc_summary + + + +def make_sample_artifact_maps(sample_name): + """ + outin: connects each out_art for a specific sample to its + corresponding in_art and process. one-one relation + + inout: connects each in_art for a specific sample to all its + coresponding out_arts and processes. one-many relation""" + outin = {} + inout = {} + artifacts = lims.get_artifacts(sample_name = sample_name) + for outart in artifacts: + try: + pro = outart.parent_process + inarts = outart.input_artifact_list() + for inart in inarts: + for samp in inart.samples: + if samp.name == sample_name: + outin[outart.id] = (pro, inart.id) + if not inout.has_key(inart.id): inout[inart.id] = {} + inout[inart.id][pro] = outart.id + except: + pass + return outin, inout + +def get_analyte_hist(analyte, outin, inout): + """Makes a history map of an analyte, using the inout-map + and outin-map of the corresponding sample.""" + history = {} + while outin.has_key(analyte): + hist_process, inart = outin[analyte] + for process, outart in inout[inart].items(): + if (process == hist_process) or (process.type.id in INITALQC.keys()) or (process.type.id in LIBVAL.keys()) or (process.type.id in AGRINITQC.keys()) or (process.type.id in AGRLIBVAL.keys()): + history[process.id] = {'date' : process.date_run, + 'id' : process.id, + 'outart' : outart, + 'inart' : inart, + 'type' : process.type.id, + 'name' : process.type.name} + analyte = inart + return history + diff --git a/scripts/LIMS2DB/load_status_from_google_docs.py b/scripts/LIMS2DB/load_status_from_google_docs.py new file mode 100644 index 00000000..a1dcc2d2 --- /dev/null +++ b/scripts/LIMS2DB/load_status_from_google_docs.py @@ -0,0 +1,97 @@ +#!/usr/bin/env python +import sys +import os +import time +from datetime import datetime +from uuid import uuid4 +import hashlib +from optparse import OptionParser +import logging +import bcbio.google +import scilifelab.google.project_metadata as pmeta +import bcbio.pipeline.config_utils as cl +from bcbio.google import _to_unicode, spreadsheet +import couchdb + + +# GOOGLE DOCS +def get_google_document(ssheet_title, wsheet_title, client): + ssheet = bcbio.google.spreadsheet.get_spreadsheet(client, ssheet_title) + wsheet = bcbio.google.spreadsheet.get_worksheet(client, ssheet, wsheet_title) + content = bcbio.google.spreadsheet.get_cell_content(client,ssheet,wsheet) + ss_key = bcbio.google.spreadsheet.get_key(ssheet) + ws_key = bcbio.google.spreadsheet.get_key(wsheet) + return content, ws_key, ss_key + +def make_client(CREDENTIALS_FILE): + credentials = bcbio.google.get_credentials({'gdocs_upload': {'gdocs_credentials': CREDENTIALS_FILE}}) + client = bcbio.google.spreadsheet.get_client(credentials) + return client + +def get_column(ssheet_content, header, col_cond=0): + colindex='' + for j, row in enumerate(ssheet_content): + if colindex == '': + for i, col in enumerate(row): + if col_cond <= i and colindex == '': + if str(col).strip().replace('\n','').replace(' ','') == header.replace(' ',''): + colindex = i + else: + rowindex = j-1 + return rowindex, colindex + +# NAME STRIP +def strip_index(name): + indexes = ['_nxdual','_index','_rpi','_agilent','_mondrian','_haloht','_halo','_sureselect','_dual','_hht','_ss','_i','_r','_a','_m','_h'] + name = name.replace('-', '_').replace(' ', '') + for i in indexes: + name=name.split(i)[0] + return name + +def get_20158_info(client, project_name_swe): + versions = {"01": ['Sample name Scilife', "Total reads per sample", "Sheet1","Passed=P/ not passed=NP*"], + "02": ["Sample name (SciLifeLab)", "Total number of reads (Millions)","Sheet1", + "Based on total number of reads after mapping and duplicate removal"], + "03": ["Sample name (SciLifeLab)", "Total number of reads (Millions)","Sheet1", + "Based on total number of reads after mapping and duplicate removal "], + "05": ["Sample name (from Project read counts)", "Total number","Sheet1", + "Based on total number of reads","Based on total number of reads after mapping and duplicate removal"], + "06": ["Sample name (from Project read counts)", "Total number","Sheet1", + "Based on total number of reads","Based on total number of reads after mapping and duplicate removal"]} + info = {} + feed = bcbio.google.spreadsheet.get_spreadsheets_feed(client, project_name_swe + '_20158', False) + if len(feed.entry) != 0: + ssheet = feed.entry[0].title.text + version = ssheet.split(str('_20158_'))[1].split(' ')[0].split('_')[0] + content, ws_key, ss_key = get_google_document(ssheet, versions[version][2], client) + dummy, P_NP_colindex = get_column(content, versions[version][3]) + dummy, No_reads_sequenced_colindex = get_column(content, versions[version][1]) + row_ind, scilife_names_colindex = get_column(content, versions[version][0]) + if (version=="05")| (version=="06"): + dummy, P_NP_duprem_colindex = get_column(content, versions[version][4]) ## [version][4] for dup rem + else: + P_NP_duprem_colindex='' + for j, row in enumerate(content): + if (j > row_ind): + try: + sci_name = str(row[scilife_names_colindex]).strip() + striped_name = strip_index(sci_name) + no_reads = str(row[No_reads_sequenced_colindex]).strip() + if (P_NP_duprem_colindex!='') and (str(row[P_NP_duprem_colindex]).strip()!=''): + status = str(row[P_NP_duprem_colindex]).strip() + else: + status = str(row[P_NP_colindex]).strip() + info[striped_name] = [status,no_reads] + except: + pass + return info + +def get(project_ID): + CREDENTIALS_FILE = os.path.join(os.environ['HOME'], 'opt/config/gdocs_credentials') + CONFIG_FILE = os.path.join(os.environ['HOME'], 'opt/config/post_process.yaml') + CONFIG = cl.load_config(CONFIG_FILE) + client= make_client(CREDENTIALS_FILE) + info = get_20158_info(client, project_ID) + print info + return info + diff --git a/scripts/LIMS2DB/objectsDB.py b/scripts/LIMS2DB/objectsDB.py new file mode 100644 index 00000000..e58929bc --- /dev/null +++ b/scripts/LIMS2DB/objectsDB.py @@ -0,0 +1,413 @@ +#!/usr/bin/env python + +"""A module for building up the project objects that build up the project database on +statusdb with lims as the main source of information. + +Maya Brandi, Science for Life Laboratory, Stockholm, Sweden. +""" +import load_status_from_google_docs ### Temorary solution untill 20158 implemented in LIMS!!! +import codecs +from scilifelab.google import _to_unicode, _from_unicode +from pprint import pprint +from genologics.lims import * +from helpers import * +from lims_utils import * +from statusDB_utils import * +from genologics.config import BASEURI, USERNAME, PASSWORD +import os +import couchdb +import bcbio.pipeline.config_utils as cl +import time +from datetime import date + + +lims = Lims(BASEURI, USERNAME, PASSWORD) +config_file = os.path.join(os.environ['HOME'], 'opt/config/post_process.yaml') +db_conf = cl.load_config(config_file)['couch_db'] +url = db_conf['maggie_login']+':'+db_conf['maggie_pass']+'@'+db_conf['maggie_url']+':'+str(db_conf['maggie_port']) +samp_db = couchdb.Server("http://" + url)['samples'] + +class ProjectDB(): + """Instances of this class holds a dictionary formatted for building up the project database on statusdb. + Source of information come from different lims artifacts and processes. A detailed documentation of the + source of all values is found in: + https://docs.google.com/a/scilifelab.se/document/d/1OHRsSI9btaBU4Hb1TiqJ5wwdRqUQ4BAyjJR-Nn5qGHg/edit#""" + + def __init__(self, project_id): + self.lims_project = Project(lims,id = project_id) + preps = lims.get_processes(projectname = self.lims_project.name, type = AGRLIBVAL.values()) + runs = lims.get_processes(projectname = self.lims_project.name, type = SEQUENCING.values()) + self.preps = ProcessInfo(preps) + self.runs = ProcessInfo(runs) + try: + # Temporary solution untill 20158 implemented in lims!! + googledocs_status = load_status_from_google_docs.get(self.lims_project.name) + except: + googledocs_status = {} + pass + self.project={'source' : 'lims', + 'open_date' : self.lims_project.open_date, + 'entity_type' : 'project_summary', + 'application' : None, + 'project_name' : self.lims_project.name, + 'project_id' : self.lims_project.id} + self.udf_field_conv={'Name':'name', + #'Queued':'queued', + 'Portal ID':'Portal_id', + 'Sample type':'sample_type', + 'Sequence units ordered (lanes)':'sequence_units_ordered_(lanes)', + 'Sequencing platform':'sequencing_platform', + 'Sequencing setup':'sequencing_setup', + 'Library construction method':'library_construction_method', + 'Bioinformatics':'bioinformatics', + 'Disposal of any remaining samples':'disposal_of_any_remaining_samples', + 'Type of project':'type', + 'Invoice Reference':'invoice_reference', + 'Uppmax Project Owner':'uppmax_project_owner', + 'Custom Capture Design ID':'custom_capture_design_id', + 'Customer Project Description':'customer_project_description', + 'Project Comment':'project_comment', + 'Delivery Report':'delivery_report'} + self.basic_udf_field_conv = {'Reference genome':'reference_genome', + 'Application':'application', + 'Uppmax Project':'uppnex_id', + 'Customer project reference':'customer_reference'} + for key, val in self.lims_project.udf.items(): + if self.udf_field_conv.has_key(key): + if self.project.has_key('details'): + self.project['details'][self.udf_field_conv[key]] = val + else: self.project['details'] = {self.udf_field_conv[key] : val} + elif self.basic_udf_field_conv.has_key(key): + self.project[self.basic_udf_field_conv[key]] = val + samples = lims.get_samples(projectlimsid = self.lims_project.id) + self.project['no_of_samples'] = len(samples) + if len(samples) > 0: + self.project['samples'] = {} + for samp in samples: + sampDB = SampleDB(samp.id, self.project['project_name'], + self.project['application'], self.preps.info, self.runs.info, googledocs_status) #googledocs_status Temporary solution untill 20158 implemented in lims!! + self.project['samples'][sampDB.name] = sampDB.obj + self.project = delete_Nones(self.project) + +class ProcessInfo(): + """This class takes a list of process type names. Eg 'Aggregate QC (Library Validation) 4.0' + and forms a dict with info about all processes of the type specified in runs which the + project has gon through. + + info = {24-8460:{'finish_date':'2013-04-20', + 'start_date', + 'run_id':'24-8460', + 'samples':{'P424_111':{in_art_id1 : [in_art1, out_art1], + in_art_id2: [in_art2, out_art2]}, + 'P424_115': ...}, + ...}, + '24-8480':...}""" + + def __init__(self, runs): + self.info = self.get_run_info(runs) + + def get_run_info(self, runs): + run_info = {} + for run in runs: + run_info[run.id] = {'start_date': run.date_run,'samples' : {}} + run_udfs = dict(run.udf.items()) + try: + run_info[run.id]['run_id'] = run_udfs["Run ID"] + except: + pass + try: + run_info[run.id]['finish_date'] = run_udfs['Finish Date'].isoformat() + except: + run_info[run.id]['finish_date'] = None + pass + in_arts=[] + for IOM in run.input_output_maps: + in_art_id = IOM[0]['limsid'] + in_art = Artifact(lims, id= in_art_id) + out_art_id = IOM[1]['limsid'] + out_art = Artifact(lims, id= out_art_id) + samples = in_art.samples + if in_art_id not in in_arts: + in_arts.append(in_art_id) + for samp in samples: + if not samp.name in run_info[run.id]['samples'].keys(): + run_info[run.id]['samples'][samp.name] = {} + run_info[run.id]['samples'][samp.name][in_art_id] = [in_art, out_art] + return run_info + + + +class SampleDB(): + """ + Instances of this class holds a dictionary formatted for building up the samples in the project + database on status db. Source of information come from different lims artifacts and processes. + A detailed documentation of the source of all values is found in + https://docs.google.com/a/scilifelab.se/document/d/1OHRsSI9btaBU4Hb1TiqJ5wwdRqUQ4BAyjJR-Nn5qGHg/edit#""" + def __init__(self, sample_id, project_name, application = None, prep_info = [], run_info = [], googledocs_status = {}): # googledocs_status temporary solution untill 20158 implemented in lims!! + self.lims_sample = Sample(lims, id = sample_id) + self.name = self.lims_sample.name + self.application = application + self.outin, self.inout = make_sample_artifact_maps(self.name) + self.obj={'scilife_name' : self.name} + self.udf_field_conv = {'Name':'name', + 'Progress':'progress', + 'Sequencing Method':'sequencing_method', + 'Sequencing Coverage':'sequencing_coverage', + 'Sample Type':'sample_type', + 'Reference Genome':'reference_genome', + 'Pooling':'pooling', + 'Application':'application', + 'Read Length':'requested_read_length', + 'Control?':'control', + 'Sample Buffer':'sample_buffer', + 'Units':'units', + 'Customer Volume':'customer_volume', + 'Color':'color', + 'Customer Conc.':'customer_conc', + 'Customer Amount (ug)':'customer_amount_(ug)', + 'Customer A260:280':'customer_A260:280', + 'Conc Method':'conc_method', + 'QC Method':'qc_method', + 'Extraction Method':'extraction_method', + 'Customer RIN':'customer_rin', + 'Sample Links':'sample_links', + 'Sample Link Type':'sample_link_type', + 'Tumor Purity':'tumor_purity', + 'Lanes Requested':'lanes_requested', + 'Customer nM':'customer_nM', + 'Customer Average Fragment Length':'customer_average_fragment_length', + '-DISCONTINUED-SciLifeLab ID':'sciLifeLab_ID', + '-DISCONTINUED-Volume Remaining':'volume_remaining'} + self.basic_udf_field_conv = {'Customer Sample Name':'customer_name', + 'Reads Requested (millions)':'reads_requested_(millions)', + 'Insert Size':'average_size_bp', + 'Passed Initial QC':'incoming_QC_status'} + for key, val in self.lims_sample.udf.items(): + val=_to_unicode(_from_unicode(val)) + if self.udf_field_conv.has_key(key): + if self.obj.has_key('details'): + self.obj['details'][self.udf_field_conv[key]] = val + else: self.obj['details'] = {self.udf_field_conv[key] : val} + elif self.basic_udf_field_conv.has_key(key): + self.obj[self.basic_udf_field_conv[key]] = val + runs = self.get_sample_run_metrics(run_info) + if self.application == 'Finished library' : + preps = self.get_initQC_preps_and_libval_finished_lib(prep_info) + else: + preps = self.get_initQC_preps_and_libval(prep_info) + if preps: + if preps.has_key('library_prep'): + for prep in runs.keys(): + if preps['library_prep'].has_key(prep): + preps['library_prep'][prep]['sample_run_metrics'] = runs[prep] + self.obj['library_prep'] = self.get_prep_leter(preps['library_prep']) + if preps.has_key('initial_qc'): + self.obj['initial_qc'] = preps['initial_qc'] + try: + # Temporary solution untill 20158 implemented in lims!! + self.obj['status'] = googledocs_status[self.name][0] + self.obj['m_reads_sequenced'] = googledocs_status[self.name][1] + except: + pass + delete_Nones(self.obj) + + def get_initQC_preps_and_libval_finished_lib(self, AgrLibQC_info): + """Input: AgrLibQC_info - instance of the ProcessInfo class with AGRLIBVAL processes as argument + For each AGRLIBVAL process run on the sample, this function steps bacward in the artifact history of the + output artifact of the AGRLIBVAL process to find the folowing information: + + initial_qc/start_date The date_run of the first of all INITALQC steps found for in the artifact + history of the output artifact of one of the AGRINITQC steps + initial_qc/finish_date The date_run of the of the AGRINITQC step + + Preps are defined by the AGRINITQC step + + prep_status The qc_flag of the input artifact of process type AGRLIBVAL + library_validation/start_date First of all LIBVAL steps found for in the artifact history + of the output artifact of one of the AGRLIBVAL step + library_validation/finish_date date-run of AGRLIBVAL step + average_size_bp udf ('Size (bp)') of the input artifact to the process AGRLIBVAL""" + sample_runs = {} + library_prep = {} + for run_id, run in AgrLibQC_info.items(): + if run['samples'].has_key(self.name): + for id , arts in run['samples'][self.name].items(): + inart = arts[0] + outart = arts[1] + history = get_analyte_hist(outart.id, self.outin, self.inout) + sample_runs['initial_qc'] = self.get_initial_qc_dates(history) + lib_val_dates = {'start_date': self.get_lib_val_start_dates(history), + 'finish_date': run['start_date']} + prep = {'prep_status':inart.qc_flag} + if dict(inart.udf.items()).has_key('Size (bp)'): + prep['average_size_bp'] = dict(inart.udf.items())['Size (bp)'] + if not library_prep.has_key('Finished'): + library_prep['Finished'] = delete_Nones(prep) + library_prep['Finished']['library_validation'] = {} + library_prep['Finished']['library_validation'][run_id] = delete_Nones(lib_val_dates) + sample_runs['library_prep'] = delete_Nones(library_prep) + return delete_Nones(sample_runs) + + def get_initQC_preps_and_libval(self, AgrLibQC_info): + """Input: AgrLibQC_info - instance of the ProcessInfo class with AGRLIBVAL processes as argument. + For each AGRLIBVAL process run on the sample, this function steps bacward in the artifact history of the + output artifact of the AGRLIBVAL process to find the folowing information: + + initial_qc/start_date The date_run of the first of all INITALQC steps found for in the artifact + history of the output artifact of one of the AGRINITQC steps + initial_qc/finish_date The date_run of the of the AGRINITQC step + + Preps are defined by the date of any PREPSTART step + + prep_status The qc_flag of the input artifact of process type AGRLIBVAL + prep_start_date The date-run of the PREPSTART step + prep_finished_date The date-run of a PREPEND step. + pre_prep_start_date The date-run of process 'Shear DNA (SS XT) 4.0'. Only for + 'Exome capture' projects + library_validation/start_date First of all LIBVAL steps found for in the artifact history + of the output artifact of one of the AGRLIBVAL step + library_validation/finish_date date-run of AGRLIBVAL step + average_size_bp udf ('Size (bp)') of the input artifact to the process AGRLIBVAL""" + sample_runs = {} + library_prep = {} + for run_id, run in AgrLibQC_info.items(): + if run['samples'].has_key(self.name): + for id , arts in run['samples'][self.name].items(): + inart = arts[0] + outart = arts[1] + history = get_analyte_hist(outart.id, self.outin, self.inout) + sample_runs['initial_qc'] = self.get_initial_qc_dates(history) + lib_val_dates = {'start_date' : self.get_lib_val_start_dates(history), + 'finish_date' : run['start_date']} + prep = {'prep_status' : inart.qc_flag} + if dict(inart.udf.items()).has_key('Size (bp)'): + size_bp = dict(inart.udf.items())['Size (bp)'] + else: + size_bp = None + libPrep = None + for step, info in history.items(): + if info['type'] in PREPSTART.keys(): + if self.application !='Exome capture': + libPrep = info + prep['prep_start_date'] = info['date'] + elif info['type'] in PREPEND.keys(): + prep['prep_finished_date'] = info['date'] + elif info['type'] == '74': + libPrep = info + prep['pre_prep_start_date'] = info['date'] + if libPrep: + if not library_prep.has_key(libPrep['id']): + library_prep[libPrep['id']] = delete_Nones(prep) + library_prep[libPrep['id']]['library_validation'] = {} + library_prep[libPrep['id']]['library_validation'][run_id] = delete_Nones(lib_val_dates) + library_prep[libPrep['id']]['library_validation'][run_id]['average_size_bp'] = size_bp + sample_runs['library_prep'] = delete_Nones(library_prep) + return delete_Nones(sample_runs) + + def get_prep_leter(self, prep_info): + """Get preps and prep names; A,B,C... based on prep dates for sample_name. + Output: A dict where keys are prep_art_id and values are prep names.""" + dates = {} + prep_info_new = {} + preps_keys = map(chr, range(65, 65+len(prep_info))) + if len(prep_info) == 1: + prep_info_new['A'] = prep_info.values()[0] + else: + for key, val in prep_info.items(): + dates[key] = val['prep_start_date'] + for i, key in enumerate(sorted(dates,key= lambda x : dates[x])): + prep_info_new[preps_keys[i]] = prep_info[key] + return prep_info_new + + + def get_sample_run_metrics(self, SeqRun_info): + """Input: SeqRun_info - instance of the ProcessInfo class with SEQUENCING processes as argument + For each SEQUENCING process run on the sample, this function steps bacward in the artifact history of the + input artifact of the SEQUENCING process to find the folowing information: + + dillution_and_pooling_start_date date-run of SEQSTART step + sequencing_start_date date-run of SEQUENCING step + sequencing_finish_date udf ('Finish Date') of SEQUENCING step + sample_run_metrics_id The sample database (statusdb) _id for the sample_run_metrics + corresponding to the run, sample, lane in question. + samp_run_met_id = lane_date_fcid_barcode + date and fcid: from udf ('Run ID') of the SEQUENCING step. + barcode: from reagent-lables of output artifact from SEQSTART step. + lane: from the location of the input artifact to the SEQUENCING step + preps are defined as the id of the PREPSTART step in the artifact history. If appllication== Finished library, + prep is defined as "Finnished". These keys are used to connect the seqeuncing steps to the correct preps.""" + sample_runs = {} + for id, run in SeqRun_info.items(): + if run['samples'].has_key(self.name) and run.has_key('run_id'): + date = run['run_id'].split('_')[0] + fcid = run['run_id'].split('_')[3] + for id , arts in run['samples'][self.name].items(): + lane_art = arts[0] + outart = arts[1] + lane = lane_art.location[1].split(':')[0] + history = get_analyte_hist(lane_art.id, self.outin, self.inout) + for step , info in history.items(): + if info['type'] in SEQSTART.keys(): + art = Artifact(lims, id=info['outart']) + if len(art.reagent_labels) > 0: + barcode = self.get_barcode(art.reagent_labels[0]) + samp_run_met_id = '_'.join([lane, date, fcid, barcode]) + else: + samp_run_met_id = None + dict = {'dillution_and_pooling_start_date': info['date'], + 'sequencing_start_date': run['start_date'], + 'sequencing_finish_date': run['finish_date'], + 'sample_run_metrics_id': find_sample_run_id_from_view(samp_db, samp_run_met_id) } + dict = delete_Nones(dict) + key = None + if self.application == 'Finished library' : + key = 'Finished' + else: + for step , info in history.items(): + if info['type'] in PREPSTART.keys(): + key = info['id'] + if key: + if not sample_runs.has_key(key): + sample_runs[key] = {} + sample_runs[key][samp_run_met_id] = dict + return sample_runs + + def get_sample_status(): + """ongoing,passed,aborted""" + ## Not yet implemented + + def get_barcode(self, name): + """Extracts barcode from artifact.egent_labels""" + return name.split('(')[1].strip(')') + + def get_initial_qc_dates(self, history): + """Extracts run dates for processes of type AGRINITQC + from a history dict.""" + initial_qc_finish_date = None + for step , info in history.items(): + if (info['type'] in AGRINITQC.keys()) and info['date']: + if initial_qc_finish_date is None: + initial_qc_finish_date = info['date'] + elif comp_dates(initial_qc_finish_date, info['date']): + initial_qc_finish_date = info['date'] + if initial_qc_finish_date: + initial_qc_start_date = initial_qc_finish_date + for step , info in history.items(): + if (info['type'] in INITALQC) and info['date']: + if comp_dates(info['date'], initial_qc_start_date): + initial_qc_start_date = info['date'] + return {'start_date' : initial_qc_start_date, 'finish_date' : initial_qc_finish_date} + else: + return + + def get_lib_val_start_dates(self, history): + """Extracts run dates for processes of type LIBVAL + from a history dict.""" + lib_val_start_date = None + for step , info in history.items(): + if info['type'] in LIBVAL.keys(): + if lib_val_start_date == None: + lib_val_start_date = info['date'] + elif comp_dates(info['date'], lib_val_start_date): + lib_val_start_date = info['date'] + return lib_val_start_date diff --git a/scripts/LIMS2DB/project_summary_upload_LIMS.py b/scripts/LIMS2DB/project_summary_upload_LIMS.py new file mode 100644 index 00000000..a6bac5a1 --- /dev/null +++ b/scripts/LIMS2DB/project_summary_upload_LIMS.py @@ -0,0 +1,85 @@ +#!/usr/bin/env python + +"""Script to load project info from Lims into the project database in statusdb. + +Maya Brandi, Science for Life Laboratory, Stockholm, Sweden. +""" +import sys +import os +import codecs +from optparse import OptionParser +from statusDB_utils import * +from helpers import * +from pprint import pprint +from genologics.lims import * +from genologics.config import BASEURI, USERNAME, PASSWORD +import objectsDB as DB +from datetime import date +import scilifelab.log +lims = Lims(BASEURI, USERNAME, PASSWORD) + +def main(proj_name, all_projects, days, conf): + first_of_july = '2013-06-30' + today = date.today() + couch = load_couch_server(conf) + proj_db = couch['projects'] + if all_projects: + projects = lims.get_projects() + for proj in projects: + try: + closed = proj.close_date + closed = date(*map(int, proj.close_date.split('-'))) + delta = today-closed + delta = delta.days + except: + delta = 0 + opened = proj.open_date + if opened: + if comp_dates(first_of_july, opened) and (delta < days): + proj_time = time.time() + obj = DB.ProjectDB(proj.id) + key = find_proj_from_view(proj_db, proj.name) + obj.project['_id'] = find_or_make_key(key) + info = save_couchdb_obj(proj_db, obj.project) + LOG.info('project %s %s : _id = %s' % (proj.name, info, obj.project['_id'])) + else: + LOG.info('Open date missing for project %s' % proj.name) + elif proj_name is not None: + proj = lims.get_projects(name = proj_name) + if len(proj) == 0: + LOG.warning('No project named %s in Lims' % proj_name) + else: + proj = proj[0] + opened = proj.open_date + if opened: + if comp_dates(first_of_july, opened): + obj = DB.ProjectDB(proj.id) + key = find_proj_from_view(proj_db, proj.name) + obj.project['_id'] = find_or_make_key(key) + info = save_couchdb_obj(proj_db, obj.project) + LOG.info('project %s %s : _id = %s' % (proj_name, info, obj.project['_id'])) + else: + LOG.info('Open date missing for project %s' % proj.name) + +if __name__ == '__main__': + usage = "Usage: python project_summary_upload_LIMS.py [options]" + parser = OptionParser(usage=usage) + + parser.add_option("-p", "--project", dest="project_name", default=None, + help = "eg: M.Uhlen_13_01. Dont use with -a flagg.") + + parser.add_option("-a", "--all_projects", dest="all_projects", action="store_true", default=False, + help = "Upload all Lims projects into couchDB. Don't use with -f flagg.") + + parser.add_option("-d", "--days", dest="days", default=30, + help="Projects with a close_date older than DAYS days are not updated. Default is 30 days. Use with -a flagg") + + parser.add_option("-c", "--conf", dest="conf", + default=os.path.join(os.environ['HOME'],'opt/config/post_process.yaml'), + help = "Config file. Default: ~/opt/config/post_process.yaml") + + (options, args) = parser.parse_args() + + LOG = scilifelab.log.file_logger('LOG', options.conf, 'lims2db_projects.log') + main(options.project_name, options.all_projects, options.days, options.conf) + diff --git a/scripts/LIMS2DB/statusDB_utils.py b/scripts/LIMS2DB/statusDB_utils.py new file mode 100644 index 00000000..8772ff8b --- /dev/null +++ b/scripts/LIMS2DB/statusDB_utils.py @@ -0,0 +1,82 @@ +#!/usr/bin/env python +from uuid import uuid4 +import time +from datetime import datetime +import couchdb +import bcbio.pipeline.config_utils as cl + +def load_couch_server(config_file): + """loads couch server with settings specified in 'config_file'""" + try: + db_conf = cl.load_config(config_file)['couch_db'] + url = db_conf['maggie_login']+':'+db_conf['maggie_pass']+'@'+db_conf['maggie_url']+':'+str(db_conf['maggie_port']) + couch = couchdb.Server("http://" + url) + return couch + except: + return None + +def find_or_make_key(key): + if not key: + key = uuid4().hex + return key + +def save_couchdb_obj(db, obj): + """Updates ocr creates the object obj in database db.""" + dbobj = db.get(obj['_id']) + time_log = datetime.utcnow().isoformat() + "Z" + if dbobj is None: + obj["creation_time"] = time_log + obj["modification_time"] = time_log + db.save(obj) + return 'created' + else: + obj["_rev"] = dbobj.get("_rev") + del dbobj["modification_time"] + obj["creation_time"] = dbobj["creation_time"] + if not comp_obj(obj, dbobj): + obj["modification_time"] = time_log + db.save(obj) + return 'uppdated' + return 'not uppdated' + +def comp_obj(obj, dbobj): + """compares the two dictionaries obj and dbobj""" + keys = list(set(obj.keys() + dbobj.keys())) + for key in keys: + if (obj.has_key(key)) and dbobj.has_key(key): + if (obj[key] != dbobj[key]): + return False + else: + return False + return True + +def find_proj_from_view(proj_db, project_name): + view = proj_db.view('project/project_name') + for proj in view: + if proj.key == project_name: + return proj.value + return None + +def find_samp_from_view(samp_db, proj_name): + view = samp_db.view('names/id_to_proj') + samps = {} + for doc in view: + if (doc.value[0] == proj_name)|(doc.value[0] == proj_name.lower()): + samps[doc.key] = doc.value[1:3] + return samps + +def find_flowcell_from_view(flowcell_db, flowcell_name): + view = flowcell_db.view('names/id_to_name') + for doc in view: + id = doc.value.split('_')[1] + if (id == flowcell_name): + return doc.key + +def find_sample_run_id_from_view(samp_db,sample_run): + view = samp_db.view('names/id_to_name') + for doc in view: + if doc.value == sample_run: + return doc.key + return None + + diff --git a/setup.cfg b/setup.cfg new file mode 100644 index 00000000..49a81075 --- /dev/null +++ b/setup.cfg @@ -0,0 +1,2 @@ +[egg_info] +tag_svn_revision = true diff --git a/setup.py b/setup.py new file mode 100644 index 00000000..368d32de --- /dev/null +++ b/setup.py @@ -0,0 +1,38 @@ +from setuptools import setup, find_packages +import sys, os + +version = '0.2.3' + +setup(name='genologics', + version=version, + description="Python interface to the GenoLogics LIMS (Laboratory Information Management System) server via its REST API.", + long_description="""A basic module for interacting with the GenoLogics LIMS server via its REST API. + The goal is to provide simple access to the most common entities and their attributes in a reasonably Pythonic fashion.""", + classifiers=[ + "Development Status :: 4 - Beta", + "Environment :: Console", + "Intended Audience :: Developers", + "Intended Audience :: Healthcare Industry", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", + "Operating System :: POSIX :: Linux", + "Programming Language :: Python", + "Topic :: Scientific/Engineering :: Medical Science Apps." + ], + keywords='genologics api rest', + author='Per Kraulis', + author_email='per.kraulis@scilifelab.se', + mantainer='Roman Valls Guimera', + mantainer_email='roman@scilifelab.se', + url='https://github.com/scilifelab/genologics', + license='GPLv3', + packages=find_packages(exclude=['ez_setup', 'examples', 'tests']), + include_package_data=True, + zip_safe=False, + install_requires=[ + "requests" + ], + entry_points=""" + # -*- Entry points: -*- + """, + )