Conversation
- generates new action to visualize dada2 stats in a distinct tab format - alters dada2 stats output to be a collection[DADA2Stat] obj -updates existing tests to accommodate this change in data type
-update existing tests to come into line with the new output collection[dada2stats] -adds tests for the output error stats table
-adds tests for the error model vizualization - fixes bug with vizualizing ccs stats
-removing blanks from test files to fix errors in qiime CI
|
Thanks @jordenrabasco! Because this introduces a breaking change (changing the output to a Sound ok? Let us know if you have any questions. Todo:
|
|
@gregcaporaso Of course! That all sounds good to me. In the meantime, let me know if you need me to change anything with the implementation on my end to make it easier for you guys! |
|
Hello @jordenrabasco, how do you feel about making the new visualizer only display the transition plots? I suggest this because we already have the One other quick thing I noticed while trying out these changes, try to name |
|
By the way, if at all possible it would be best if we could avoid adding additional R software to this repository. In fact, we're planning to transition some of the R code here to python using the |
|
Hi @colinvwood I can change the names and start transferring things over to rpy2 no problem! In terms of the tabs in the visualizer I initially incorporated the tab format with |
|
Hello @jordenrabasco,
I see. In the interest of reducing code duplication I think it makes sense to keep them separate. If it's easy to do so you could perhaps borrow the I also think the denoising stats and the transition frequencies are quite distinct sets of information, and most often users will likely only be interested in (and only able to immediately understand) the former. Keeping the two separate also means that the denoising stats are only able to be visualized one way--with What's your opinion from an ease of use perspective--separate visualizers or a tabbed one? If you prefer the tabbed approach I think it would be best if we borrowed the
Don't worry about rpy2. This is something that I would like to do in the future, but I think it only makes sense if the entire |
Personally I think that the having all of the diagnostic information readily available within the same visualization via tabs would be very useful to the user as opposed to having it be separated out and needing to be viewed discreetly. In my experience people generally only look at the diagnostics when things go wrong downstream, so I think having everything in one place could be handy. However, this may also be a personal preference of mine.
I thought about doing this but couldn't figure out how to incorporate the tabulate functionality within a tabbed visualizer. I could convert the visualizer into a pipeline but that would just output different .qzv objects for the denoising stats, and the error plot stats.
Ah okay my mistake sorry for misunderstanding!
The transition plots themselves are generated in
Also to be clear I am open to going with whatever you guys think is best! |
Sounds good, let's go forward with this approach.
This should definitely be doable. Were there any specific problems you ran into that I could help with?
The temporary directory does go away once you exit the context manager for it, but as long as you copy the saved file into Hopefully this is enough to get started. Let me know if you get stuck on anything. |
|
Hi @colinvwood I am having trouble with a few things.
I am a bit confused here, do you mean you can render a .qzv file within an html tabbed framework index file or is there another way to go about this?
I am having some issues changing over the code to the format you provided more specifically saving the errorplots via ggsave into the output dir. From my understanding only a visualizer has the output_dir as the default first arg and denoise_* is registered as a method and so it doesn't have access to the output_dir address. Do you know of a way around this? Sorry if I am misunderstanding and thanks so much! |
|
Hey @jordenrabasco,
You'll have to do some work to make the tabulate visualizer (or the template which it extends) be tabbed. We use jinja2 to manage the various visualizer templates so if you don't have any experience with that you might want to take some time to read its documentation. It looks like the table itself is made with jQuery but you probably shouldn't have to mess with that. The All of this is maybe an argument against making a single tabbed visualizer and for making two separate visualizers...
You don't want to make the visualization in the |
Ah okay I understand now, I thought you wanted me to utilize the tabulate visualizer action itself within the code, instead of augmenting the code to be extended into tabbed format . Apologies for misunderstanding.
Quite possibly. I will try to see if I can change the table over into tabulate format within a tabbed format and if not I will revisit this idea. If that's okay with you?
Ah okay, so the new output from the denoise functions is not the error model output but an augmented dataframe to allow for some preprocessing and to allow for saving both fwd/rev reads within the same obj, as QIIME doesn't allow for optional outputs. I can try and change things over to output the model information instead of the augmented dataframe, however if I do this I will need to create another R script to then import this data back into R and call ploterrors(). I considered this apporach previously but didn't want to expand the code base further via another R file. Is this an approach that would work better for you guys? Let me know what you think! |
Of course, sounds good.
I think it's totally fine in this case because the R will be so minimal/boilerplate. Eventually it can be refactored using |
|
@jordenrabasco, are you still available to work on this for the 2025.4 release? We'd need to have it wrapped up in the next couple of weeks if so. If not, we can hold off till the next release (2025.10). |
|
Hi @colinvwood,
I believe we talked about this option however we never implemented.
To my knowledge there is not a way within QIIME 2 to save a visualization (.QZV file) as well as a QIIME 2 data artifact (.QZA file) outside of generating a pipeline. Additionally, the plots for reverse/forward reads are generated independently within the DADA2 workflow. This means that without the intermediate step of either concatenating the images or saving the data together and then parsing it later on (as we do now), an optional output would be needed (fwd plot, rev plot) which QIIME 2 does not allow for. Another reason is that if the data from the intermediate error_stats file is exported from QIIME 2 it will produce a file that is comprehensible to the user and present the values depicted in the graph. I felt this was of importance as it would allow the user to investigate their own data without external resources.
If I am understanding correctly, you tried to input the dada2 read stats into the error stats visualizer? If this is the case the old dada2 stats file should error as it is not dada2 error data, but instead dada2 read retention data. A simple fix to avoid this confusion would be to rename the action error-stats-viz. Does this work on your end? Let me know what you think! |
|
Hello @jordenrabasco,
I think a pipeline makes a lot of sense since this new visualizer is completely coupled to the dada2 method.
The visualizer can remain as is, but only be called from the pipeline (and not registered independently).
Right, I expected it to fail, but the issue is that it should not have run in the first place. An action/visualizer that accepts semantic type X should run on all instances of X. Essentially, the new error plot stats should be a different semantic type since it is not interchangeable with other instances of the semantic type it has been registered as. The pipeline technically would resolve this issue, as the call to the visualizer would be encapsulated in it, but even so I think it should be registered as separate semantic type for the reasons above. |
Ah I see, sorry for the misunderstanding!
So just for clarity on my end, you would like me to convert all denoise-* actions into pipeline actions that will output the dada2 read stats obj, the dada2 error stats obj within the denoising stats folder and the dada2 error stats visualization as a separate output? Additionally you would like me to change the semantic type of the error stats obj and make it so that the error stats-viz action is only available internally to the pipeline actions? |
colinvwood
left a comment
There was a problem hiding this comment.
Hello @jordenrabasco,
Yes, converting to pipelines is one option. The other is registering the error model stats as a different semantic type. Both of these solve the issue that is the error plot visualizer throwing an error when given a valid input type.
- changes to kebab case - changes input name to be more explicit
|
Hi @colinvwood, The minor changes involving kebab case and the naming conventions have been pushed however, I am struggling to generate a new semantic type that is specific to the -Jorden |
|
Hello @jordenrabasco, sorry for the delay in getting back to you. See here for an example of creating a new semantic type. See here for an example of registering a semantic type and linking it to a format. Let me know if these examples answer your questions, or if there's anything that still isn't clear! |
|
@jordenrabasco, need anything on this? Feel free to DM me on the forum if you want to discuss at all. |
- semantic type added for error model
|
@gregcaporaso @colinvwood I think we should be all set! I just pushed the requested changes in latest push. |
-change descriptions
|
@colinvwood, the requested changes have been pushed. It looks like the build is failing on the automated tests, I am unsure how to fix this. Let me know if there is anything else I need to do on my end! |
|
Thanks @jordenrabasco 👍🏻 |
This pull request is to resolve issue #158
Big changes: