|
1 | 1 | # Data Types |
2 | 2 |
|
| 3 | +## Contributing |
| 4 | + |
| 5 | +If you're writing data types for local tools, all data types |
| 6 | +can be added directly to your Galaxy instance, but if you're |
| 7 | +publishing tools to the Galaxy Toolshed we strongly recommend |
| 8 | +adding all referenced data types upstream to Galaxy itself. |
| 9 | +Basic information on contributing to Galaxy can be found in |
| 10 | +the |
| 11 | +[CONTRIBUTING documentation](https://github.com/galaxyproject/galaxy/blob/dev/CONTRIBUTING.md). |
| 12 | + |
| 13 | +The rest of this document contains the basics of how to create and register |
| 14 | +data types in Galaxy, but we've noticed some common problems with simple |
| 15 | +implementations that can cause serious problems on production servers. |
| 16 | +When adding new data types to Galaxy, please copy this production checklist |
| 17 | +into your PR and update it with your answers to help use avoid these problems |
| 18 | +in the future. |
| 19 | + |
| 20 | +``` |
| 21 | +## Production Data Types Checklist |
| 22 | +
|
| 23 | +- Do any of your ``sniff``, ``set_meta``, ``set_peek`` implementations for data types |
| 24 | + potentially use an unbounded amount of memory? This will happen most often |
| 25 | + if they read a whole file into memory. This is not allowed even if you would |
| 26 | + expect these files to typically be small. |
| 27 | + - [ ] No. |
| 28 | +- Does your ``sniff`` implementation read through the entirety of the file? This |
| 29 | + is never allowed for production data types. |
| 30 | + - [ ] No. |
| 31 | +- Do either of your ``set_meta`` or ``set_peek`` implementations read through |
| 32 | + the entirety of the file? This is generally discouraged, but if you feel |
| 33 | + it substantially improves the usability of the datatype, please indicate why |
| 34 | + below. |
| 35 | + - [ ] No. |
| 36 | + - [ ] Yes. *Replace THIS sentence with the reason.* |
| 37 | +- Does your datatype include metadata elements? In general, metadata should not |
| 38 | + take the place of say a data type viewer or visualization, metadata should be |
| 39 | + used by the tools that consume this datatype. |
| 40 | + - [ ] No. |
| 41 | + - [ ] Yes. *Replace THIS sentence with how tools will consume this metadata.* |
| 42 | +``` |
| 43 | + |
3 | 44 | ## Adding a New Data Type (Subclassed) |
4 | 45 |
|
5 | 46 | This specification describes the Galaxy source code |
@@ -480,78 +521,5 @@ For this reason, users are not allowed to change the datatype of dataset between |
480 | 521 |
|
481 | 522 | ## Galaxy Tool Shed - Data Types |
482 | 523 |
|
483 | | -**Note:** These are deprecated and shouldn't be used. |
484 | | - |
485 | | -### Including custom data types that subclass from Galaxy data types in the distribution |
486 | | - |
487 | | -If your repository includes tools that require data types that are not defined in the Galaxy distribution, you can include the required data types in the repository along with your tools, or you can create a separate repository to contain them. The repository must include a file named `datatypes_conf.xml`, which is modeled after the file named `datatypes_conf.xml.sample` in the Galaxy distribution. This section describes support for including data types that subclass from data types in the Galaxy distribution. Refer to the next section for details about data types that use your own custom class modules included in your repository. An example of this is the `datatypes_conf.xml` file in the [emboss_datatypes repository](http://toolshed.g2.bx.psu.edu/repository/browse_categories?sort=name&id=3ac79d5752c6d938&f-deleted=False&webapp=community&f-free-text-search=emboss&operation=view_or_manage_repository) in the main Galaxy tool shed, shown below. |
488 | | - |
489 | | - |
490 | | - |
491 | | -Tool shed repositories that include valid `datatypes_conf.xml` files will display the data types in the **Preview tools and inspect metadata by tool version** section of the view or manage repository page. |
492 | | - |
493 | | - |
494 | | - |
495 | | - |
496 | | -### Including custom data types that use class modules contained in your repository |
497 | | - |
498 | | -Including custom data types that use class modules included in your repository is a bit tricky. As part of your development process for tools that use data types that fall into this category, it is highly recommended that you host a local Galaxy tool shed. When your newly developed tools have proven to be functionally correct within your local Galaxy instance, you should upload them, along with all associated custom data types files and modules to your local tool shed to ensure that everything is handled properly within the tool shed. When your local tool shed repository is functionally correct, install your repository from your local tool shed to a local Galaxy instance to ensure that your tools and data types properly load both at the time of installation and when you stop and restart your Galaxy server. You should not upload your tools to the main Galaxy tool shed until you have confirmed that everything works by following these steps. |
499 | | - |
500 | | -To illustrate how this works, we'll use the [gmap repository](http://toolshed.g2.bx.psu.edu/repository/browse_categories?sort=name&id=4131098bea459833&f-deleted=False&webapp=community&f-free-text-search=gmap&operation=view_or_manage_repository) in the main Galaxy tool shed as an example. The `datatypes_conf.xml` file included in this repository looks something like the following. You'll probably notice that this file is modeled after the `datatypes_conf.xml.sample` file in the Galaxy distribution, but with some slight differences. |
501 | | - |
502 | | -Notice the `<datatypes_files>` tag set. This tag set contains `<datatype_file>` tags, each of which refers to the name of a class module file name within your repository (in this example, there is only one file named `gmap.py`), which contains the custom data type classes you've defined for your tools. |
503 | | - |
504 | | -In addition, notice the value of each `type` attribute in the `<datatype>` tags. The `:` separates the class module included in the repository (in this example, the class module is `gmap`) from the class name (`GmapDB`, `IntervalAnnotation`, etc.). It is critical that you make sure your datatype tag definitions match the classes you've defined in your class modules or the data type will not properly load into a Galaxy instance when your repository is installed. |
505 | | -```xml |
506 | | -<?xml version="1.0"?> |
507 | | -<datatypes> |
508 | | - <datatype_files> |
509 | | - <datatype_file name="gmap.py"/> |
510 | | - </datatype_files> |
511 | | - <registration> |
512 | | - <datatype extension="gmapdb" type="galaxy.datatypes.gmap:GmapDB" display_in_upload="False"/> |
513 | | - <datatype extension="gmapsnpindex" type="galaxy.datatypes.gmap:GmapSnpIndex" display_in_upload="False"/> |
514 | | - <datatype extension="iit" type="galaxy.datatypes.gmap:IntervalIndexTree" display_in_upload="True"/> |
515 | | - <datatype extension="splicesites.iit" type="galaxy.datatypes.gmap:SpliceSitesIntervalIndexTree" display_in_upload="True"/> |
516 | | - <datatype extension="introns.iit" type="galaxy.datatypes.gmap:IntronsIntervalIndexTree" display_in_upload="True"/> |
517 | | - <datatype extension="snps.iit" type="galaxy.datatypes.gmap:SNPsIntervalIndexTree" display_in_upload="True"/> |
518 | | - <datatype extension="gmap_annotation" type="galaxy.datatypes.gmap:IntervalAnnotation" display_in_upload="False"/> |
519 | | - <datatype extension="gmap_splicesites" type="galaxy.datatypes.gmap:SpliceSiteAnnotation" display_in_upload="True"/> |
520 | | - <datatype extension="gmap_introns" type="galaxy.datatypes.gmap:IntronAnnotation" display_in_upload="True"/> |
521 | | - <datatype extension="gmap_snps" type="galaxy.datatypes.gmap:SNPAnnotation" display_in_upload="True"/> |
522 | | - </registration> |
523 | | - <sniffers> |
524 | | - <sniffer type="galaxy.datatypes.gmap:IntervalAnnotation"/> |
525 | | - <sniffer type="galaxy.datatypes.gmap:SpliceSiteAnnotation"/> |
526 | | - <sniffer type="galaxy.datatypes.gmap:IntronAnnotation"/> |
527 | | - <sniffer type="galaxy.datatypes.gmap:SNPAnnotation"/> |
528 | | - </sniffers> |
529 | | -</datatypes> |
530 | | -``` |
531 | | - |
532 | | -**Modules that include custom datatype class definitions cannot use relative import references for imported modules.** To function correctly when your repository is installed in a local Galaxy instance, your class module imports must be defined as absolute from the galaxy subdirectory inside the Galaxy root's lib subdirectory. For example, assume the following import statements are included in our example `gmap.py` file. They certainly work within the Galaxy development environment when the gmap tools were being developed. |
533 | | -```python |
534 | | -import data |
535 | | -from data import Text |
536 | | -from metadata import MetadataElement |
537 | | -``` |
538 | | - |
539 | | -However, the above relative imports will not work when the `gmap.py` class module is installed from the Tool Shed into a local Galaxy instance because the modules will not be found due to the use of the relative imports. The developer must use the following approach instead. Notice that the imports are written such that they are absolute relative to the `~/lib/galaxy` subdirectory. |
540 | | -```python |
541 | | -import galaxy.datatypes.data |
542 | | -from galaxy.datatypes.data import Text |
543 | | -from galaxy.datatypes.metadata import MetadataElement |
544 | | -``` |
545 | | - |
546 | | -The use of `<converter>` tags contained within `<datatype>` tags is supported in the same way they are supported within the `datatypes_conf.xml.sample` file in the Galaxy distribution. |
547 | | -```xml |
548 | | -<datatype extension="ref.taxonomy" type="galaxy.datatypes.metagenomics:RefTaxonomy" display_in_upload="true"> |
549 | | - <converter file="ref_to_seq_taxonomy_converter.xml" target_datatype="seq.taxonomy"/> |
550 | | -</datatype> |
551 | | -``` |
552 | | - |
553 | | -### Including datatype converters and display applications |
554 | | - |
555 | | -To include your custom datatype converters or display applications, add the appropriate tag set to your repository's `datatypes_conf.xml` file in the same way that they are defined in the `datatypes_conf.xml.sample` file in the Galaxy distribution. |
556 | | - |
557 | | -If you include datatype converter files in your repository, all files (the disk file referred to by the value of the "file" attribute) must be located in the same directory in your repository hierarchy. Similarly, your datatype display application files must all be in the same directory in your repository hierarchy (although the directory can be a different directory from the one containing your converter files). This is critical because the Galaxy components that load these custom items assume each of them are located in the same directory. |
| 524 | +**Note:** These are deprecated and shouldn't be used. If you find old |
| 525 | +documentation recommending these, please remove it. |
0 commit comments