Update Functional_annotation_of_protein_sequences WF#1100
Update Functional_annotation_of_protein_sequences WF#1100SantaMcCloud wants to merge 13 commits intogalaxyproject:mainfrom
Conversation
|
Hello @abretaud and @rlibouba, I did some modifications to your workflow. Mostly the intended use of it is still the same, but these are the changes in short:
If any changes are not okay, feel free to comment it here! This WF should be added to a different WF this is also the reason why some changes where made. @paulzierep or @bebatut can give you more details to this! |
Test Results (powered by Planemo)Test Summary
Failed Tests
|
Test Results (powered by Planemo)Test Summary
Failed Tests
|
|
Hmm i dont know why the test is failing i did run it without any error |
|
Okay now i did know why the test didnt run correctly. My VSC did crash and some changes i did made in the test file wherent save or they where reverted....now evreything should be fine. |
rlibouba
left a comment
There was a problem hiding this comment.
Hi @SantaMcCloud , sorry for my late review. Thank you for these changes. That's okay with me. There is a description missing in the .ga file.
| { | ||
| "description": "This workflow uses eggNOG mapper and Interproscan for functional annotation of protein sequences.", | ||
| "name": "input" | ||
| "description": "", |
There was a problem hiding this comment.
There is no description explaining what the workflow is used for. I can suggest : This workflow runs functional annotation tools to estimate the completeness of a metabolic pathway based on KOs from eggNOG
There was a problem hiding this comment.
Thank you for the review and sorry for the late responds! I did change it :)
There was a problem hiding this comment.
Pull request overview
This PR updates the “Functional annotation of protein sequences” Galaxy workflow to support collection-based inputs and adds KEGG pathway completeness calculation, alongside corresponding documentation, tests, and a version bump.
Changes:
- Switch workflow input from a single FASTA file to a FASTA collection and introduce parameters to enable/disable eggNOG / InterProScan.
- Add steps to derive KO lists from eggNOG output and run
kegg_pathways_completeness. - Update workflow tests, README, and changelog for the new behavior and release
0.2.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga | Refactors workflow to accept collections, adds conditional execution and KEGG completeness steps, bumps release to 0.2. |
| workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences-tests.yml | Updates test input to a list collection and adds assertions for the new KEGG output. |
| workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/README.md | Updates input description for collection usage and documents the KEGG completeness output. |
| workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/CHANGELOG.md | Adds a 0.2 entry describing the workflow updates. |
Comments suppressed due to low confidence (11)
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/CHANGELOG.md:9
- The changelog entry has multiple grammatical issues (e.g., “run parallel”, inconsistent capitalization of bullet starts, and trailing whitespace after “InterProScan”). Please clean these up to keep release notes consistent and professional.
- Change the input to a collection such that multiple sequences can be run parallel
- add a subworkflow to open the option to choose which input eggNOG will receive
- add the option that eggNOG and/or InterProScan can be skipped
- add the option to swap the input type for InterProScan
- add toolshed.g2.bx.psu.edu/repos/iuc/kegg_pathways_completeness/kegg_pathways_completeness/1.3.0+galaxy0 to calculate the pathway completeness with KOs coming from eggNOG
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:58
- Input label contains a typo (“completness”) and will be user-facing in the workflow UI/tests. Please rename it to “Run eggNOG + completeness calculation” (or similar) and update the test file keys accordingly.
{
"description": "",
"name": "Run eggNOG + completness calculation"
}
],
"label": "Run eggNOG + completness calculation",
"name": "Input parameter",
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:777
- This workflow output label (“seeds”) is not human-readable per IWC workflow guidelines. Please change it to a descriptive, spaced label (and keep any downstream references consistent).
"workflow_outputs": [
{
"label": "seeds",
"output_name": "data_param",
"uuid": "084ec2d2-02b4-4ced-a436-8631477ff9e1"
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:843
- This workflow output label (“anno”) is not human-readable per IWC workflow guidelines. Please change it to a descriptive, spaced label (and keep any downstream references consistent).
"workflow_outputs": [
{
"label": "anno",
"output_name": "data_param",
"uuid": "2b5937dd-d4fd-4796-955e-b1849e75ccfa"
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences-tests.yml:17
- Job parameter key has a typo (“completness”). Since this key must match the workflow input label exactly, please fix the label spelling in the workflow and update this test key to match.
Run eggNOG + completness calculation: true
Run InterProScan: true
eggNOG mode select: proteins
InterProScan mode select: Protein
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences-tests.yml:46
- Output key “kegg_pathways_table” is not human-readable (underscores) and violates the workflow output labeling guidelines. After updating the workflow output label to a spaced, human-friendly label, update this test output key to match exactly.
kegg_pathways_table:
asserts:
- that: "has_text"
text: "module_accession completeness pathway_name pathway_class matching_ko missing_ko"
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/README.md:11
- The new input description is missing punctuation and is a bit ambiguous about the expected collection type. Consider clarifying that this is a Galaxy list collection containing one or more FASTA files, and end the sentence with a period.
## Input dataset
This workflow requires a collection with at least one file in fasta format
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/README.md:31
- The new KEGG section has several grammar issues that make it hard to understand (e.g., “calculate”, “tools … classified”, “state the class of”). Please revise this paragraph for correct grammar and clearer wording about what the KEGG completeness output contains.
## Output for KEGG Pathways completeness
This tool calculate the completeness of a pathway from KOs which are coming from eggNOG. Together with the calculated completeness the tools also classified the contig with a name and state the class of the pathway and also return the matching KOs and the missing KOs of the pathway.
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:282
- The parameter name/label “Selected sequenece type” is misspelled. Please correct it to “Selected sequence type” and ensure the corresponding input_connections key is updated consistently.
"Selected sequenece type": {
"id": 1,
"input_subworkflow_step_id": 0,
"output_name": "output"
},
"Sequence collection": {
"id": 2,
"input_subworkflow_step_id": 1,
"output_name": "output"
},
"when": {
"id": 0,
"output_name": "output"
}
},
"inputs": [],
"label": null,
"name": "Functional annotation of protein sequences subworkflow",
"outputs": [],
"position": {
"left": 575.8291134761635,
"top": 259.4494665177702
},
"subworkflow": {
"a_galaxy_workflow": "true",
"annotation": "",
"comments": [],
"format-version": "0.1",
"name": "Functional annotation of protein sequences subworkflow",
"report": {
"markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n"
},
"steps": {
"0": {
"annotation": "",
"content_id": null,
"errors": null,
"id": 0,
"input_connections": {},
"inputs": [
{
"description": "",
"name": "Selected sequenece type"
}
],
"label": "Selected sequenece type",
"name": "Input parameter",
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:1288
- The workflow output label “kegg_pathways_table” uses underscores and isn’t human-readable. Please change this to a human-friendly label with spaces (e.g., “KEGG pathways completeness table”) and update the corresponding key in the workflow test file.
"workflow_outputs": [
{
"label": "kegg_pathways_table",
"output_name": "kegg_pathways_table",
"uuid": "6012fdc2-9ebe-4fc9-913b-001ac8d2d9d2"
workflows/genome_annotation/functional-annotation/functional-annotation-protein-sequences/Functional_annotation_of_protein_sequences.ga:992
- Dataset tag has a typo, which can lead to inconsistent tagging/searching. Please change the tag to the correctly spelled form.
"TagDatasetActionoutfile_tsv": {
"action_arguments": {
"tags": "interproscna-table"
},
"action_type": "TagDatasetAction",
Test Results (powered by Planemo)Test Summary
Errored Tests
|
paulzierep
left a comment
There was a problem hiding this comment.
I think can only have one test file per WF:
as many [Planemo test file](https://planemo.readthedocs.io/en/latest/test_format.html) as workflow files, with the same name as the workflow file, but with a -tests.yml extension, e.g., consensus-from-variation-tests.yml;as many [Planemo test file](https://planemo.readthedocs.io/en/latest/test_format.html) as workflow files, with the same name as the workflow file, but with a -tests.yml extension, e.g., consensus-from-variation-tests.yml;
But I think you can add multiple tests in that file, just add a new job item
will change it thank you! |
|
i think there was a connection problem in the test. I will restart it later to check it again. I can not test it on |
FOR CONTRIBUTOR:
FOR REVIEWERS:
This workflow does/runs/performs … xyz … to generate/analyze/etc …namefield should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id