Skip to content

Commit 736350e

Browse files
committed
Add metadata section to vignette, run styler, R CMD check improvements
1 parent 7ce8d40 commit 736350e

17 files changed

+283
-171
lines changed

DESCRIPTION

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: TidyMultiqc
22
Type: Package
33
Title: Converts 'MultiQC' Reports into Tidy Data Frames
4-
Version: 0.1.1
4+
Version: 1.0.0
55
Author: Michael Milton
66
Maintainer: Michael Milton <[email protected]>
77
Description: Provides the means to convert 'multiqc_data.json' files,
@@ -14,7 +14,6 @@ Encoding: UTF-8
1414
Imports:
1515
assertthat,
1616
dplyr,
17-
HistDat (>= 0.2.0),
1817
jsonlite,
1918
magrittr,
2019
purrr,
@@ -26,7 +25,8 @@ Suggests:
2625
testthat (>= 3.0.0),
2726
knitr,
2827
rmarkdown,
29-
ggplot2
28+
ggplot2,
29+
HistDat
3030
Config/testthat/edition: 3
3131
RoxygenNote: 7.1.2
3232
Roxygen: list(markdown = TRUE)

NEWS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
## Breaking Changes
44

5-
* Removed the `plot_opts` key from the `load_multiqc` function. Instead, the plots are returns as list columns ie nested data frames inside the returned data frame. Users are then able to parse out summary statistics using normal `dplyr` and `tidyr` functions. Refer to the vignettes for examples. [[#1]](https://github.com/multimeric/TidyMultiqc/issues/1)
5+
* Removed the `plot_opts` key from the `load_multiqc` function. Instead, the plots are returns as list columns with nested data frames inside the returned data frame. Users are then able to parse out summary statistics using normal `dplyr` and `tidyr` functions. Refer to the vignettes for examples. [[#1]](https://github.com/multimeric/TidyMultiqc/issues/1)
66
* Renamed "plots" to "plot" in the `sections` argument. This ensures consistency with the data frame column names for plots, which are "plot.XX"
7+
* `metadata.sample_id` is now always the first column in the data frame, even if you have provided a metadata function
78

89
## New Features
910

R/internal_utils.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ sanitise_column_name <- function(name) {
3434
stringr::str_to_lower()
3535
}
3636

37-
ROW_IDENTIFIER = "metadata.sample_id"
37+
ROW_IDENTIFIER <- "metadata.sample_id"

R/multiqc.R

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -95,36 +95,39 @@ parse_metadata <- function(parsed, samples, find_metadata) {
9595
#' @param find_metadata A single function that will be called with a sample name and the
9696
#' parsed JSON for the entire report and returns a named list of metadata fields for the sample.
9797
#' Refer to the vignette for an example.
98-
#' @param sections A string vector of zero or more sections to include in the output.
98+
#' @param sections A string vector of zero or more sections to include in the output.
9999
#' Each section can be:
100100
#' \describe{
101101
#' \item{"plot"}{Parse plot data. Note that you should also provide a list of plots via the `plots` argument}
102102
#' \item{"general"}{parse the general stat section}
103103
#' \item{"raw"}{Parse the raw data section}
104104
#' }
105105
#' This defaults to 'general', which tends to contain the most useful statistics
106-
#' @param plots A string vector, each of which contains the ID of a plot you
106+
#' @param plots A string vector, each of which contains the ID of a plot you
107107
#' want to include in the output. You can use [TidyMultiqc::list_plots()] to help here.
108-
#' @param plot_parsers. [Advanced] A named list of custom parser functions.
108+
#' @param plot_parsers **Advanced**. A named list of custom parser functions.
109109
#' The names of the list should correspond to plotly plot types, such as "xy_line", and the values should be functions
110110
#' that return a named list of named lists. For the return value, the outer list is named by the sample ID, and the inner list
111111
#' is named by the name of the column. Refer to the source code for some examples.
112112
#' @export
113113
#' @return A tibble (data.frame subclass) with QC data and metadata as columns, and samples as rows.
114-
#' Columns are named according to the respective section they belong to,
114+
#' Columns are named according to the respective section they belong to,
115115
#' and will always be listed in the following order:
116-
#' \item{`metadata.X`}{This column contains metadata for this sample.
116+
#' \item{`metadata.X`}{This column contains metadata for this sample.
117117
#' By default this is only the sample ID, but if you have provided the
118-
#'`find_metadata` argument, there may be more columns.}
118+
#' `find_metadata` argument, there may be more columns.}
119119
#' \item{`general.X`}{This column contains a generally useful summary statistic for each sample}
120-
#' \item{`plot.X`}{This column contains a data frame of plot data for each sample}
121-
#' \item{`raw.X`}{This column contains a raw summary statistic or value relating to each sample}
122-
#' }
120+
#' \item{`plot.X`}{This column contains a data frame of plot data for each sample.
121+
#' Refer to the plot parsers documentation (ie the `parse_X` functions) for more information on the output format. }
122+
#' \item{`raw.X`}{This column contains a raw summary statistic or value relating to each sample }
123+
#' @seealso [TidyMultiqc::parse_xyline_plot()] [TidyMultiqc::parse_bar_graph()]
123124
#' @examples
124125
#' load_multiqc(system.file("extdata", "wgs/multiqc_data.json", package = "TidyMultiqc"))
125126
load_multiqc <- function(paths,
126127
plots = NULL,
127-
find_metadata = function(...) { list() },
128+
find_metadata = function(...) {
129+
list()
130+
},
128131
plot_parsers = list(),
129132
sections = "general") {
130133
assertthat::assert_that(all(sections %in% c(
@@ -141,7 +144,7 @@ load_multiqc <- function(paths,
141144
purrr::map(~ switch(.,
142145
general = parse_general(parsed),
143146
raw = parse_raw(parsed),
144-
plot = parse_plots(parsed, plots = plots, plot_parsers=plot_parsers)
147+
plot = parse_plots(parsed, plots = plots, plot_parsers = plot_parsers)
145148
)) %>%
146149
purrr::reduce(~ purrr::list_merge(.x, !!!.y), .init = list()) %>%
147150
purrr::imap(~ purrr::list_merge(.x, metadata.sample_id = .y))
@@ -152,14 +155,16 @@ load_multiqc <- function(paths,
152155
dplyr::bind_rows()
153156
}) %>%
154157
# Only arrange the columns if we have at least 1 column
155-
`if`(
158+
`if`(
156159
# Move the columns into the order: metadata, general, plot, raw
157160
ncol(.) > 0,
158161
(.) %>%
159162
dplyr::relocate(dplyr::starts_with("raw")) %>%
160163
dplyr::relocate(dplyr::starts_with("plot")) %>%
161164
dplyr::relocate(dplyr::starts_with("general")) %>%
162-
dplyr::relocate(dplyr::starts_with("metadata")),
165+
dplyr::relocate(dplyr::starts_with("metadata")) %>%
166+
# Always put the sample ID at the start
167+
dplyr::relocate(metadata.sample_id),
163168
.
164169
)
165170
}

R/plot_parsers.R

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
#' Takes the JSON dictionary for an xyline plot, and returns a named list of
2+
#' data frames, one for each sample.
3+
#' @keywords internal
4+
#' @import rlang
5+
#' @keywords plot_parser
6+
#' @return A list of data frames, one for each sample.
7+
#' Each data frame will have two columns: x, and y.
8+
#' These correspond to the x and y coordinates in the plot.
9+
#' For example, for histogram data, the x values are values of the random
10+
#' variable, and the y values are the number of counts for that value.
11+
parse_xyline_plot <- function(plot_data, name) {
12+
# This only works on xyline plots
13+
assertthat::assert_that(plot_data$plot_type == "xy_line")
14+
15+
plot_data$datasets %>%
16+
purrr::map(function(dataset) {
17+
# For some reason there are two levels of nesting here
18+
dataset %>%
19+
kv_map(function(subdataset) {
20+
name <- stringr::str_c("plot", name, sep = ".")
21+
list(
22+
key = subdataset$name,
23+
value = subdataset$data %>%
24+
purrr::map_dfr(~ list(x = .[[1]], y = .[[2]])) %>%
25+
# Chop the multi-row data frame into one row
26+
tidyr::nest({{ name }} := tidyr::everything()) # %>%
27+
)
28+
})
29+
}) %>%
30+
purrr::reduce(~ purrr::list_merge(.x, !!!.y))
31+
}
32+
33+
#' Takes the JSON dictionary for a bar graph, and returns a named list of
34+
#' data frames, one for each sample.
35+
#' @keywords internal
36+
#' @import rlang
37+
#' @keywords plot_parser
38+
#' @return A list of data frames, one for each sample.
39+
#' Each data frame will have one column corresponding to each category in the bar chart.
40+
#' For example, for the plot "SnpEff: Counts by Genomic Region", we will have
41+
#' one column for the number of intron variants, one column for the number of exon variants, etc.
42+
#' This means that the number of columns will be fairly variable for different plots.
43+
parse_bar_graph <- function(plot_data, name) {
44+
# This only works on bar_graphs
45+
assertthat::assert_that(plot_data$plot_type == "bar_graph")
46+
47+
# Make a list of samples
48+
samples <- plot_data$samples[[1]] %>% purrr::flatten_chr()
49+
50+
colname <- stringr::str_c("plot", sanitise_column_name(name), sep = ".")
51+
52+
plot_data$datasets[[1]] %>%
53+
# First, build up a dictionary of samples -> dictionary of quality metrics
54+
purrr::map(function(dataset) {
55+
segment_name <- dataset$name
56+
dataset$data %>%
57+
# For this segment, each sample has a value
58+
kv_map(function(value, idx) {
59+
list(
60+
key = samples[[idx]],
61+
value = list(value) %>% purrr::set_names(sanitise_column_name(segment_name))
62+
)
63+
}, map_keys = TRUE)
64+
}) %>%
65+
purrr::reduce(utils::modifyList) %>%
66+
# Then, convert each inner dictionary to a tibble row
67+
purrr::map(tibble::as_tibble_row) %>%
68+
# And nest each df so that we only have 1 cell of output per sample
69+
purrr::map(~ tidyr::nest(., {{ colname }} := tidyr::everything()))
70+
}

R/plots.R

Lines changed: 9 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,6 @@
1-
# Internal plot parsing functions
1+
# Plot parsing functions
22

3-
#' Takes the JSON dictionary for an xyline plot, and returns a named list of
4-
#' data frames, one for each sample.
5-
#' @keywords internal
6-
#' @import rlang
7-
#' @noRd
8-
parse_xyline_plot <- function(plot_data, name) {
9-
# This only works on xyline plots
10-
assertthat::assert_that(plot_data$plot_type == "xy_line")
11-
12-
plot_data$datasets %>%
13-
purrr::map(function(dataset) {
14-
# For some reason there are two levels of nesting here
15-
dataset %>%
16-
kv_map(function(subdataset) {
17-
name = stringr::str_c("plot", name, sep=".")
18-
list(
19-
key = subdataset$name,
20-
value = subdataset$data %>%
21-
purrr::map_dfr(~list(x=.[[1]], y=.[[2]])) %>%
22-
# Chop the multi-row data frame into one row
23-
tidyr::nest({{name}} := tidyr::everything()) #%>%
24-
)
25-
})
26-
}) %>%
27-
purrr::reduce(~ purrr::list_merge(.x, !!!.y))
28-
}
29-
30-
#' Takes the JSON dictionary for a bar graph, and returns a named list of
31-
#' data frames, one for each sample.
32-
#' @keywords internal
33-
#' @import rlang
34-
#' @noRd
35-
parse_bar_graph <- function(plot_data, name) {
36-
# This only works on bar_graphs
37-
assertthat::assert_that(plot_data$plot_type == "bar_graph")
38-
39-
# Make a list of samples
40-
samples <- plot_data$samples[[1]] %>% purrr::flatten_chr()
41-
42-
colname = stringr::str_c("plot", sanitise_column_name(name), sep = ".")
43-
44-
plot_data$datasets[[1]] %>%
45-
# First, build up a dictionary of samples -> dictionary of quality metrics
46-
purrr::map(function(dataset) {
47-
segment_name <- dataset$name
48-
dataset$data %>%
49-
# For this segment, each sample has a value
50-
kv_map(function(value, idx) {
51-
list(
52-
key = samples[[idx]],
53-
value = list(value) %>% purrr::set_names(sanitise_column_name(segment_name))
54-
)
55-
}, map_keys = TRUE)
56-
}) %>%
57-
purrr::reduce(utils::modifyList) %>%
58-
# Then, convert each inner dictionary to a tibble row
59-
purrr::map(tibble::as_tibble_row) %>%
60-
# And nest each df so that we only have 1 cell of output per sample
61-
purrr::map(~tidyr::nest(., {{colname}} := tidyr::everything()))
62-
}
63-
64-
DEFAULT_PLOT_PARSERS = list(
3+
DEFAULT_PLOT_PARSERS <- list(
654
xy_line = parse_xyline_plot,
665
bar_graph = parse_bar_graph
676
)
@@ -75,19 +14,18 @@ DEFAULT_PLOT_PARSERS = list(
7514
#' @noRd
7615
parse_plots <- function(parsed, plots, plot_parsers) {
7716
# Merge the default parsers with the user provided ones
78-
parsers = purrr::list_modify(DEFAULT_PLOT_PARSERS, !!!plot_parsers)
17+
parsers <- purrr::list_modify(DEFAULT_PLOT_PARSERS, !!!plot_parsers)
7918

8019
# Plot data is more complex
8120
parsed$report_plot_data %>%
8221
purrr::imap(function(plot_data, plot_name) {
8322
# Skip any plot not explicitly in this list, it's impossible to infer
8423
# what type of plot each is
8524
if (plot_name %in% plots || is.null(plots)) {
86-
parser = parsers[[plot_data$plot_type]]
87-
if (!is.null(parser)){
25+
parser <- parsers[[plot_data$plot_type]]
26+
if (!is.null(parser)) {
8827
parser(plot_data = plot_data, name = plot_name)
89-
}
90-
else {
28+
} else {
9129
warning(paste("No known (or provided) parser for a plot of type \"", plot_data$plot_type, "\""))
9230
}
9331
}
@@ -97,7 +35,7 @@ parse_plots <- function(parsed, plots, plot_parsers) {
9735
}
9836

9937
#' List the plot identifiers of all the plots in a given multiqc report
100-
#'
38+
#'
10139
#' @details The main use for this function is finding the plot identifiers
10240
#' that you will then pass into the `plots` argument of the [TidyMultiqc::load_multiqc()]
10341
#' function.
@@ -116,10 +54,10 @@ parse_plots <- function(parsed, plots, plot_parsers) {
11654
#' filepath <- system.file("extdata", "HG00096/multiqc_data.json", package = "TidyMultiqc")
11755
#' # This is the actual invocation
11856
#' list_plots(filepath)
119-
list_plots <- function(path){
57+
list_plots <- function(path) {
12058
jsonlite::read_json(path) %>%
12159
`$`("report_plot_data") %>%
122-
purrr::imap_dfr(function(plot, id){
60+
purrr::imap_dfr(function(plot, id) {
12361
list(
12462
id = id,
12563
title = plot$config$title

_pkgdown.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,8 @@ reference:
44
desc: The public API to this package
55
- contents:
66
- load_multiqc
7-
- list_plots
7+
- list_plots
8+
- title: Plot Parsers
9+
desc: These are internal functions that you will never need to call yourself, and aren't exported. However, these are used to document the format of the nested data frames for different types of plots.
10+
- contents:
11+
- has_keyword("plot_parser")

man/load_multiqc.Rd

Lines changed: 11 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/parse_bar_graph.Rd

Lines changed: 22 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)