Error model output by jordenrabasco · Pull Request #169 · qiime2/q2-dada2

jordenrabasco · 2024-09-22T19:39:25Z

This pull request is to resolve issue #158

Big changes:

Q2-DADA2 'denoise-' commands output a collection[DADA2stats] rather than a DADA2STATS object
New Q2-DADA2 action stats_viz will visualize all DADA2stats in the collection[DADA2stats] (denoised stats, and error model stats) as a singular visualization with different tabs for each DADA2STATs object.
tests were updated accommodate the new output type
tests added for the visualization to make sure that all support files are generated

- generates new action to visualize dada2 stats in a distinct tab format - alters dada2 stats output to be a collection[DADA2Stat] obj -updates existing tests to accommodate this change in data type

-update existing tests to come into line with the new output collection[dada2stats] -adds tests for the output error stats table

-adds tests for the error model vizualization - fixes bug with vizualizing ccs stats

-removing blanks from test files to fix errors in qiime CI

gregcaporaso · 2024-09-26T18:41:49Z

Thanks @jordenrabasco!

Because this introduces a breaking change (changing the output to a Collection), we can't merge this before the 2024.10 release because we always give warning about upcoming breaking changes. We can mention that this change is coming though in the 2024.10 release notes, and then review/merge early in the 2025.4 development cycle so this new functionality is available in development versions of QIIME 2.

Sound ok? Let us know if you have any questions.

Todo:

add note about the breaking change to the 2024.10 release notes

jordenrabasco · 2024-09-27T13:45:19Z

@gregcaporaso Of course! That all sounds good to me. In the meantime, let me know if you need me to change anything with the implementation on my end to make it easier for you guys!

colinvwood · 2024-12-03T20:54:01Z

Hello @jordenrabasco, how do you feel about making the new visualizer only display the transition plots? I suggest this because we already have the metadata tabulate action to display the dada2 stats which makes the "Denoised-Stats" tab in the new visualizer redundant. The metadata tabulate action also is more full featured, for example it displays the column datatype and the direction of sorting. Additionally, one might want to call metadata tabulate on the transition stats so metadata tabulate will remain a key tool for inspecting dada2 output one way or the other.

One other quick thing I noticed while trying out these changes, try to name artifacts-and-visualizers-like-this.qza (no capitalization and hyphens instead of underscores), this is just a convention we try to follow. I'll circle back around to a more in-depth review once the visualizer has been ironed out.

colinvwood · 2024-12-03T20:59:33Z

By the way, if at all possible it would be best if we could avoid adding additional R software to this repository. In fact, we're planning to transition some of the R code here to python using the rpy2 library (see #172). This makes things much easier for us to maintain. For example, the internal_plotErrors function that you added to run_dada.R would be much preferred in python. Or probably even better yet, you could simply reach out to dada2's plotErrors function for this purpose.

jordenrabasco · 2024-12-04T15:54:25Z

Hi @colinvwood I can change the names and start transferring things over to rpy2 no problem!

In terms of the tabs in the visualizer I initially incorporated the tab format with dada2-stats and the error-model-stats as it was requested in issue #158, should I remove the visualizer tab format entirely or keep the framework?

colinvwood · 2024-12-04T16:31:54Z

Hello @jordenrabasco,

In terms of the tabs in the visualizer I initially incorporated the tab format with dada2-stats and the error-model-stats as it was requested in issue #158

I see. In the interest of reducing code duplication I think it makes sense to keep them separate. If it's easy to do so you could perhaps borrow the tabulate template from q2-metadata. This way the presentation is as users are used to.

I also think the denoising stats and the transition frequencies are quite distinct sets of information, and most often users will likely only be interested in (and only able to immediately understand) the former. Keeping the two separate also means that the denoising stats are only able to be visualized one way--with metadata tabulate. This keeps things simple.

What's your opinion from an ease of use perspective--separate visualizers or a tabbed one? If you prefer the tabbed approach I think it would be best if we borrowed the tabulate functionality for the denoising stats tab.

I can change the names and start transferring things over to rpy2 no problem!

Don't worry about rpy2. This is something that I would like to do in the future, but I think it only makes sense if the entire run_dada.R script gets converted all in one sweep. What would be helpful however is if the transitions plot were not implemented in R (because this will mean more refactoring to do later when the R script gets removed). Is there a reason we couldn't use the plotErrors function from the dada2 package (see here for an example)? It's quite straightforward to get an R plot into a qiime2 visualizer: you can just save the plot to disk in a temporary directory, copy it into the visualizer's output_dir, and reference it in the index.html file.

jordenrabasco · 2024-12-04T17:20:18Z

What's your opinion from an ease of use perspective --separate visualizers or a tabbed one?

Personally I think that the having all of the diagnostic information readily available within the same visualization via tabs would be very useful to the user as opposed to having it be separated out and needing to be viewed discreetly. In my experience people generally only look at the diagnostics when things go wrong downstream, so I think having everything in one place could be handy. However, this may also be a personal preference of mine.

If you prefer the tabbed approach I think it would be best if we borrowed the tabulate functionality for the denoising stats tab.

I thought about doing this but couldn't figure out how to incorporate the tabulate functionality within a tabbed visualizer. I could convert the visualizer into a pipeline but that would just output different .qzv objects for the denoising stats, and the error plot stats.

Don't worry about rpy2. This is something that I would like to do in the future, but I think it only makes sense if the entire run_dada.R script gets converted all in one sweep.

Ah okay my mistake sorry for misunderstanding!

What would be helpful however is if the transitions plot were not implemented in R (because this will mean more refactoring to do later when the R script gets removed).

The transition plots themselves are generated in _vizualizer.py in python however the preprocessing of the plot information is in R within internal_plotErrors. Do you want all the preprocessing to happen in python as well?

Is there a reason we couldn't use the plotErrors function from the dada2 package (see here for an example)? It's quite straightforward to get an R plot into a qiime2 visualizer: you can just save the plot to disk in a temporary directory, copy it into the visualizer's output_dir, and reference it in the index.html file.

plotErrors returns a ggplot2 object and when processing the data in the denoise- actions I was uncertain of how to save that as a .qza file and then incorporate it into the tabbed visualizer in the stats-viz action. It is my understanding that the temporary dir is deleted at the end of a qiime2 action. Is this correct or does it stay for the entirety of your qiime session? With this in mind I opted to preprocess the data, save the df as a .qza and then generate the plot via python in the visualizer itself. I didn't invoke the ploterrors function within the visualizer itself due to not wanting to expand the code base by adding an additional rscript (I refactored the q2-dada2 repo a while ago to get rid of tangential and redundant Rscipts). However, if you think this is the way to go I can implement it this way (i.e. visualizer calls Rscript to make the plot).

Also to be clear I am open to going with whatever you guys think is best!
Let me know what you think!

colinvwood · 2024-12-04T19:04:59Z

@jordenrabasco,

Personally I think that the having all of the diagnostic information readily available within the same visualization via tabs would be very useful to the user

Sounds good, let's go forward with this approach.

I thought about doing this but couldn't figure out how to incorporate the tabulate functionality within a tabbed visualizer. I could convert the visualizer into a pipeline but that would just output different .qzv objects for the denoising stats, and the error plot stats.

This should definitely be doable. Were there any specific problems you ran into that I could help with?

plotErrors returns a ggplot2 object and when processing the data in the denoise- actions I was uncertain of how to save that as a .qza file and then incorporate it into the tabbed visualizer in the stats-viz action.

The temporary directory does go away once you exit the context manager for it, but as long as you copy the saved file into output_dir within the context manager (or save the file right into the output dir) everything works. Take a look at this function as a reference. I think this is probably going to be the best approach because its such a minimal amount of code--one plotErrors call and a few more lines to save the file into output_dir.

Hopefully this is enough to get started. Let me know if you get stuck on anything.

jordenrabasco · 2024-12-15T21:02:17Z

Hi @colinvwood I am having trouble with a few things.

This should definitely be doable. Were there any specific problems you ran into that I could help with?

I am a bit confused here, do you mean you can render a .qzv file within an html tabbed framework index file or is there another way to go about this?

The temporary directory does go away once you exit the context manager for it, but as long as you copy the saved file into output_dir within the context manager (or save the file right into the output dir) everything works. Take a look at this function as a reference. I think this is probably going to be the best approach because its such a minimal amount of code--one plotErrors call and a few more lines to save the file into output_dir.

I am having some issues changing over the code to the format you provided more specifically saving the errorplots via ggsave into the output dir. From my understanding only a visualizer has the output_dir as the default first arg and denoise_* is registered as a method and so it doesn't have access to the output_dir address. Do you know of a way around this?

Sorry if I am misunderstanding and thanks so much!

colinvwood · 2024-12-16T19:23:33Z

Hey @jordenrabasco,

I am a bit confused here, do you mean you can render a .qzv file within an html tabbed framework index file or is there another way to go about this?

You'll have to do some work to make the tabulate visualizer (or the template which it extends) be tabbed. We use jinja2 to manage the various visualizer templates so if you don't have any experience with that you might want to take some time to read its documentation. It looks like the table itself is made with jQuery but you probably shouldn't have to mess with that. The tabulate visualizer is in the q2_metadata repository and the base templates are in the q2templates repository.

All of this is maybe an argument against making a single tabbed visualizer and for making two separate visualizers...

I am having some issues changing over the code to the format you provided more specifically saving the errorplots via ggsave into the output dir. From my understanding only a visualizer has the output_dir as the default first arg and denoise_* is registered as a method and so it doesn't have access to the output_dir address. Do you know of a way around this?

You don't want to make the visualization in the denoise-* method, but in the visualization. My understanding was the the input needed by the plotErrors function was the new output that you added in this PR. Thus the visualizer would take this input, call plotErrors, and then save the figure to the output_dir. Let me know if I'm misunderstanding anything.

jordenrabasco · 2024-12-16T20:03:56Z

@colinvwood

You'll have to do some work to make the tabulate visualizer (or the template which it extends) be tabbed. We use jinja2 to manage the various visualizer templates so if you don't have any experience with that you might want to take some time to read its documentation. It looks like the table itself is made with jQuery but you probably shouldn't have to mess with that. The tabulate visualizer is in the q2_metadata repository and the base templates are in the q2templates repository.

Ah okay I understand now, I thought you wanted me to utilize the tabulate visualizer action itself within the code, instead of augmenting the code to be extended into tabbed format . Apologies for misunderstanding.

All of this is maybe an argument against making a single tabbed visualizer and for making two separate visualizers...

Quite possibly. I will try to see if I can change the table over into tabulate format within a tabbed format and if not I will revisit this idea. If that's okay with you?

You don't want to make the visualization in the denoise-* method, but in the visualization. My understanding was the the input needed by the plotErrors function was the new output that you added in this PR. Thus the visualizer would take this input, call plotErrors, and then save the figure to the output_dir. Let me know if I'm misunderstanding anything.

Ah okay, so the new output from the denoise functions is not the error model output but an augmented dataframe to allow for some preprocessing and to allow for saving both fwd/rev reads within the same obj, as QIIME doesn't allow for optional outputs. I can try and change things over to output the model information instead of the augmented dataframe, however if I do this I will need to create another R script to then import this data back into R and call ploterrors(). I considered this apporach previously but didn't want to expand the code base further via another R file. Is this an approach that would work better for you guys?

Let me know what you think!

colinvwood · 2024-12-17T20:32:06Z

Quite possibly. I will try to see if I can change the table over into tabulate format within a tabbed format and if not I will revisit this idea. If that's okay with you?

Of course, sounds good.

Ah okay, so the new output from the denoise functions is not the error model output but an augmented dataframe to allow for some preprocessing and to allow for saving both fwd/rev reads within the same obj, as QIIME doesn't allow for optional outputs. I can try and change things over to output the model information instead of the augmented dataframe, however if I do this I will need to create another R script to then import this data back into R and call ploterrors(). I considered this apporach previously but didn't want to expand the code base further via another R file. Is this an approach that would work better for you guys?

I think it's totally fine in this case because the R will be so minimal/boilerplate. Eventually it can be refactored using rpy2.

gregcaporaso · 2025-03-06T17:39:36Z

@jordenrabasco, are you still available to work on this for the 2025.4 release? We'd need to have it wrapped up in the next couple of weeks if so. If not, we can hold off till the next release (2025.10).

jordenrabasco · 2025-04-03T18:07:17Z

Hi @colinvwood,

I vaguely remember the error plots being visible in a previous version of this PR,

I believe we talked about this option however we never implemented.

What was the reasoning behind the stats-viz visualizer being a separate registered entity (as opposed to the dada2-* functions outputting the visualization directly)?

To my knowledge there is not a way within QIIME 2 to save a visualization (.QZV file) as well as a QIIME 2 data artifact (.QZA file) outside of generating a pipeline. Additionally, the plots for reverse/forward reads are generated independently within the DADA2 workflow. This means that without the intermediate step of either concatenating the images or saving the data together and then parsing it later on (as we do now), an optional output would be needed (fwd plot, rev plot) which QIIME 2 does not allow for. Another reason is that if the data from the intermediate error_stats file is exported from QIIME 2 it will produce a file that is comprehensible to the user and present the values depicted in the graph. I felt this was of importance as it would allow the user to investigate their own data without external resources.

A problem with this approach is that the new stats-viz errors if given the old dada2 stats file (which it seems like it should accept given the input semantic type).

If I am understanding correctly, you tried to input the dada2 read stats into the error stats visualizer? If this is the case the old dada2 stats file should error as it is not dada2 error data, but instead dada2 read retention data. A simple fix to avoid this confusion would be to rename the action error-stats-viz. Does this work on your end?

Let me know what you think!

colinvwood · 2025-04-03T18:22:42Z

Hello @jordenrabasco,

To my knowledge there is not a way within QIIME 2 to save a visualization (.QZV file) as well as a QIIME 2 data artifact (.QZA file) outside of generating a pipeline.

I think a pipeline makes a lot of sense since this new visualizer is completely coupled to the dada2 method.

This means that without the intermediate step of either concatenating the images or saving the data together and then parsing it later on (as we do now), an optional output would be needed (fwd plot, rev plot) which QIIME 2 does not allow for.

The visualizer can remain as is, but only be called from the pipeline (and not registered independently).

If I am understanding correctly, you tried to input the dada2 read stats into the error stats visualizer? If this is the case the old dada2 stats file should error as it is not dada2 error data, but instead dada2 read retention data.

Right, I expected it to fail, but the issue is that it should not have run in the first place. An action/visualizer that accepts semantic type X should run on all instances of X. Essentially, the new error plot stats should be a different semantic type since it is not interchangeable with other instances of the semantic type it has been registered as.

The pipeline technically would resolve this issue, as the call to the visualizer would be encapsulated in it, but even so I think it should be registered as separate semantic type for the reasons above.

jordenrabasco · 2025-04-03T18:45:37Z

@colinvwood,

Right, I expected it to fail, but the issue is that it should not have run in the first place. An action/visualizer that accepts semantic type X should run on all instances of X. Essentially, the new error plot stats should be a different semantic type since it is not interchangeable with other instances of the semantic type it has been registered as.

Ah I see, sorry for the misunderstanding!

The pipeline technically would resolve this issue, as the call to the visualizer would be encapsulated in it, but even so I think it should be registered as separate semantic type for the reasons above.

So just for clarity on my end, you would like me to convert all denoise-* actions into pipeline actions that will output the dada2 read stats obj, the dada2 error stats obj within the denoising stats folder and the dada2 error stats visualization as a separate output? Additionally you would like me to change the semantic type of the error stats obj and make it so that the error stats-viz action is only available internally to the pipeline actions?

colinvwood

Hello @jordenrabasco,

Yes, converting to pipelines is one option. The other is registering the error model stats as a different semantic type. Both of these solve the issue that is the error plot visualizer throwing an error when given a valid input type.

q2_dada2/_dada_stats/_visualizer.py

q2_dada2/_denoise.py

- changes to kebab case - changes input name to be more explicit

-attempts to make new semantic type within dada2 folder

jordenrabasco · 2025-04-09T17:03:04Z

Hi @colinvwood,

The minor changes involving kebab case and the naming conventions have been pushed however, I am struggling to generate a new semantic type that is specific to the score-viz action.
Could you please provide some guidance on how to accomplish this?

-Jorden

colinvwood · 2025-05-27T17:38:35Z

Hello @jordenrabasco, sorry for the delay in getting back to you.

See here for an example of creating a new semantic type. See here for an example of registering a semantic type and linking it to a format.

Let me know if these examples answer your questions, or if there's anything that still isn't clear!

gregcaporaso · 2025-07-31T17:41:54Z

@jordenrabasco, need anything on this? Feel free to DM me on the forum if you want to discuss at all.

- semantic type added for error model

jordenrabasco · 2025-08-03T20:47:04Z

@gregcaporaso @colinvwood I think we should be all set! I just pushed the requested changes in latest push.
Let me know if there are any changes once you look it over, apologies for the delay on this!

q2_dada2/plugin_setup.py

q2_dada2/tests/test_denoise.py

-change descriptions

jordenrabasco · 2025-08-09T13:46:15Z

@colinvwood, the requested changes have been pushed. It looks like the build is failing on the automated tests, I am unsure how to fix this. Let me know if there is anything else I need to do on my end!

colinvwood · 2025-08-16T00:01:33Z

Thanks @jordenrabasco 👍🏻

jordenrabasco added 10 commits September 18, 2024 11:39

initial commit for dada2 stats viz

ed6bf41

- generates new action to visualize dada2 stats in a distinct tab format - alters dada2 stats output to be a collection[DADA2Stat] obj -updates existing tests to accommodate this change in data type

update existing tests

1196869

-update existing tests to come into line with the new output collection[dada2stats] -adds tests for the output error stats table

tests_and_bug_fix

9aaa667

-adds tests for the error model vizualization - fixes bug with vizualizing ccs stats

Update _denoise.py

153ed85

yaml file update

4eaed14

yaml update 2

34f2c28

yaml update 3

8b755c6

yaml update 4

a583e4c

test update

418d156

-removing blanks from test files to fix errors in qiime CI

lint fix

685d8cb

hagenjp self-assigned this Oct 3, 2024

hagenjp removed their assignment Nov 14, 2024

colinvwood mentioned this pull request Dec 3, 2024

IMP: restructure dada2 invocation from R script to rpy2 #172

Open

colinvwood self-assigned this Dec 3, 2024

colinvwood mentioned this pull request Dec 3, 2024

BUG:Can not run qiime dada2 denoise-ccs without primers or adapters #163

Open

Merge branch 'qiime2:dev' into error_model_output

cfc0c02

colinvwood removed their assignment Feb 24, 2025

colinvwood moved this from In Development to In Review in 2025.4 🌻 Apr 3, 2025

colinvwood reviewed Apr 8, 2025

View reviewed changes

q2_dada2/_dada_stats/_visualizer.py Outdated Show resolved Hide resolved

q2_dada2/_denoise.py Outdated Show resolved Hide resolved

q2_dada2/_denoise.py Outdated Show resolved Hide resolved

jordenrabasco added 2 commits April 9, 2025 11:50

minor_aesthetic_fixes

7113e14

- changes to kebab case - changes input name to be more explicit

attempt 1

82b395d

-attempts to make new semantic type within dada2 folder

colinvwood moved this from In Review to Needs Review in 2025.4 🌻 Apr 15, 2025

colinvwood moved this from Needs Review to In Review in 2025.4 🌻 Apr 15, 2025

hagenjp removed this from 2025.4 🌻 Apr 17, 2025

hagenjp added this to 2025.10 👻 Apr 17, 2025

github-project-automation bot moved this to Backlog in 2025.10 👻 Apr 17, 2025

hagenjp moved this from Backlog to In Review in 2025.10 👻 Apr 17, 2025

semantic type added

a59ca53

- semantic type added for error model

colinvwood reviewed Aug 6, 2025

View reviewed changes

q2_dada2/plugin_setup.py Outdated Show resolved Hide resolved

q2_dada2/plugin_setup.py Outdated Show resolved Hide resolved

q2_dada2/tests/test_denoise.py Outdated Show resolved Hide resolved

change descriptions

dc40d36

-change descriptions

lizgehret and others added 4 commits August 14, 2025 10:36

Update meta.yaml

76f056a

change new output name, output descriptions

feb531f

refactor new base transition format and semantic type

6c590a3

small naming changes to new visualizer

4bb5d4d

colinvwood linked an issue Aug 15, 2025 that may be closed by this pull request

ENH: include error model plots in denoise-* output #158

Closed

colinvwood merged commit d2abd17 into qiime2:dev Aug 16, 2025
4 checks passed

github-project-automation bot moved this from In Review to Changelog Needed in 2025.10 👻 Aug 16, 2025

colinvwood moved this from Changelog Needed to Completed in 2025.10 👻 Aug 18, 2025

Conversation

jordenrabasco commented Sep 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gregcaporaso commented Sep 26, 2024 • edited by hagenjp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordenrabasco commented Sep 27, 2024

Uh oh!

colinvwood commented Dec 3, 2024

Uh oh!

colinvwood commented Dec 3, 2024

Uh oh!

jordenrabasco commented Dec 4, 2024

Uh oh!

colinvwood commented Dec 4, 2024

Uh oh!

jordenrabasco commented Dec 4, 2024

Uh oh!

colinvwood commented Dec 4, 2024

Uh oh!

jordenrabasco commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood commented Dec 16, 2024

Uh oh!

jordenrabasco commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood commented Dec 17, 2024

Uh oh!

gregcaporaso commented Mar 6, 2025

Uh oh!

jordenrabasco commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood commented Apr 3, 2025

Uh oh!

jordenrabasco commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jordenrabasco commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood commented May 27, 2025

Uh oh!

gregcaporaso commented Jul 31, 2025

Uh oh!

jordenrabasco commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jordenrabasco commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinvwood commented Aug 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jordenrabasco commented Sep 22, 2024 •

edited

Loading

gregcaporaso commented Sep 26, 2024 •

edited by hagenjp

Loading

jordenrabasco commented Dec 15, 2024 •

edited

Loading

jordenrabasco commented Dec 16, 2024 •

edited

Loading

jordenrabasco commented Apr 3, 2025 •

edited

Loading

jordenrabasco commented Apr 3, 2025 •

edited

Loading

jordenrabasco commented Apr 9, 2025 •

edited

Loading

jordenrabasco commented Aug 9, 2025 •

edited

Loading