Skip to content
This repository was archived by the owner on Oct 31, 2025. It is now read-only.

Commit 74b7aaa

Browse files
attlihantonvsdata
andauthored
Version 0.1.0 (#24)
* bug fix: impute_simple succeeds when only one feature satifies NA limit * Updated logging (#6) * Add factor check to pairwise t-test (#11) * Update logging (#8) * Update logging Updated stats.R logging. * Updated logging Removed unnecessary comments * Updated logging Removed more unnecessary comments * Update logging Error messages are now logged in the log file * Update logging Fixed mentioned issues * Update to mz/rt plot and data reading (#7) * Updated mz/rt plot Added feature to filter plot by p-values and to drop low quality features from the plot for MetaboSet object. Added feature to set title and subtitle to the plot. * Updated data reading Function now checks for unique feature IDs and that mz/rt values make sense. Added feature to specify ID column from the excel file. * Fixed commented things Asked changes made, all seem reasonable. * Minor fix to scatter.R Fixed plotter function to always having data frame as input. * Check QC column for missing values * Create dummy injection order When merging, dummy injection order is created if injection orders differ between modes * Warn if QC contains NAs Checks that QC column doesn't contain NAs * Skip mz/rt limit check when validating MetaboSet * Added tests for changing feature/sample names * Approved suggestion * Update volcano plot (#10) * Update volcano plot Allowed labeling plots with ability to filter labels by p-value * Approve suggestion Code follows notame syntax * Update volcano plot labeling (#16) * Update volcano plot Allowed labeling plots with ability to filter labels by p-value * Approve suggestion Co-authored-by: Anton Klåvus <anton.klavus@iki.fi> * Update volcano plot Label visualization is now clean Co-authored-by: Anton Klåvus <anton.klavus@iki.fi> * Fix correlation test argument (#15) ... arguments will now be passed to cor.test() * Update .Rd files (#13) I noticed these won't update automatically * Enabled pairwise paired t-tests * Add stats cleaning function (#12) * Stats cleaning function * Enabled column renaming * Fix dummy injection error (#19) dummy_injection was not found during merging if x was already merged * Fixed bug in pairwise t-tests * Add PERMANOVA (#9) * Add PERMANOVA Basic wrapper for package called PERMANOVA. * Minor fixes Fixed one argument name and documentation * Fix test_logging (#18) * Fix test_logging expect_null didn't work * Add tests for logging * Update Cohen's D (#14) * Update Cohen's D Cohen's D can now be counted to all groups and time points at once * Minor fixes * Fixed tests to match new commits Fixed hardcoded values in tests, known warnings in testing and group order in log messages when calculating Cohen's D * Changed column names in tests Changed column names in tests to match new style * Changed column names Changed column names to match new uniform style * Changed column names in functions Changed column names in t-tests, Cohen's D and fold change to match new uniform style. Linear models do not follow that style. * Cleaned code * Updated dependencies * Fixed bugs (#20) * Fixed tests to match new commits Fixed hardcoded values in tests, known warnings in testing and group order in log messages when calculating Cohen's D * Changed column names in tests Changed column names in tests to match new style * Changed column names in functions Changed column names in t-tests, Cohen's D and fold change to match new uniform style. Linear models do not follow that style. * Cleaned code * Updated dependencies * Updated manuals and examples * Removed BatchCorrMetabolomics dependency As it's not updated regularly * Enabled pairwise paired t-tests (#17) * Enabled pairwise paired t-tests * Changed column names Changed column names to match new uniform style * Updated warning handling when testing * Updated Cohen's d error handling * Updated volcano plot example * Updated Cohen's D error handling and other minor changes (#23) * Updated warning handling when testing * Updated Cohen's d error handling * Updated volcano plot example * Version 0.1.0 Co-authored-by: Anton Klåvus <anton.klavus@iki.fi>
1 parent 9f634ef commit 74b7aaa

29 files changed

+1151
-308
lines changed

DESCRIPTION

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
Package: notame
22
Type: Package
33
Title: Workflow for non-targeted LC-MS metabolic profiling
4-
Version: 0.0.11
4+
Version: 0.1.0
55
Authors@R: c(
66
person("Anton", "Klåvus", email = "anton.klavus@iki.fi", role = c("aut", "cre")),
77
person("Jussi", "Paananen", role = "aut"),
8-
person("Oskari", "Timonen", role = "aut"))
8+
person("Oskari", "Timonen", role = "aut"),
9+
person("Atte", "Lihtamo", role = "aut"))
910
Description: Automates common preprocessing steps in a LC-MS metabolomics workflow
1011
such as drift correction, quality control and common visualizations.
1112
License: MIT + file LICENSE
@@ -16,14 +17,15 @@ Depends:
1617
R (>= 3.5),
1718
Biobase,
1819
BiocGenerics,
20+
futile.logger,
1921
ggplot2,
2022
magrittr
2123
Imports:
2224
dplyr,
2325
foreach,
24-
futile.logger,
2526
grDevices,
2627
methods,
28+
openxlsx,
2729
tibble,
2830
tidyr
2931
Suggests:
@@ -42,7 +44,6 @@ Suggests:
4244
lmerTest,
4345
missForest,
4446
mixOmics,
45-
openxlsx,
4647
pcaMethods,
4748
PK,
4849
randomForest,
@@ -52,5 +53,5 @@ Suggests:
5253
RUVSeq,
5354
supraHex,
5455
testthat
55-
RoxygenNote: 7.1.2
56+
RoxygenNote: 7.2.1
5657
VignetteBuilder: knitr

NAMESPACE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ export(align_batches)
88
export(assess_quality)
99
export(assign_cluster_id)
1010
export(citations)
11+
export(clean_stats_results)
1112
export(cluster_features)
1213
export(cohens_d)
1314
export(combined_data)
@@ -74,6 +75,7 @@ export(perform_logistic)
7475
export(perform_oneway_anova)
7576
export(perform_paired_t_test)
7677
export(perform_pairwise_t_test)
78+
export(perform_permanova)
7779
export(perform_repeatability)
7880
export(perform_t_test)
7981
export(plot_dendrogram)

R/batch_correction.R

Lines changed: 2 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,16 @@
11
#' Batch correction
22
#'
3-
#' "Basic" batch correction by median? from BatchCorrMetabolomics::doBC
3+
#' DEPRECATED
44
#'
55
#' @param object a MetaboSet object
66
#' @param batch the column name for batch labels
77
#' @param ref the column name for reference sample labels
88
#' @param ref_label the label for reference samples
99
#' @param ... other parameters pased to doBC
1010
#'
11-
#' @return a MetaboSet object with the corrected abundances
12-
#'
13-
#' @examples
14-
#' \dontrun{
15-
#' batch_corrected <- dobc(merged_sample, batch = "Batch", ref = "QC", ref_label = "QC")
16-
#' # Evaluate batch correction
17-
#' pca_bhattacharyya_dist(merged_sample, batch = "Batch")
18-
#' pca_bhattacharyya_dist(batch_corrected, batch = "Batch")
19-
#' }
2011
#' @export
2112
dobc <- function(object, batch, ref, ref_label, ...) {
22-
23-
if (!requireNamespace("BatchCorrMetabolomics", quietly = TRUE)) {
24-
stop("Package \"BatchCorrMetabolomics\" needed for this function to work. Please install it from
25-
https://github.com/rwehrens/BatchCorrMetabolomics.",
26-
call. = FALSE)
27-
}
28-
add_citation("BatchCorrMetabolomics was used for batch correction:", citation("BatchCorrMetabolomics"))
29-
30-
ref_idx <- which(pData(object)[, ref] == ref_label)
31-
seq_idx <- object$Injection_order
32-
batch_idx <- pData(object)[, batch]
33-
34-
batch_corrected <- foreach::foreach(feature = featureNames(object), .combine = rbind) %dopar% {
35-
tmp <- BatchCorrMetabolomics::doBC(Xvec = exprs(object)[feature, ],
36-
ref.idx = ref_idx,
37-
batch.idx = batch_idx,
38-
seq.idx = seq_idx,
39-
minBsamp = 1,
40-
method = "lm",
41-
correctionFormula = "X ~ B")
42-
matrix(tmp, nrow = 1, dimnames = list(feature, names(tmp)))
43-
}
44-
45-
exprs(object) <- batch_corrected
46-
47-
object
13+
stop("This function is deprecated.")
4814
}
4915

5016
#' Remove Unwanted Variation

R/class_constructor.R

Lines changed: 74 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ log_text_if <- function(text, logif) {
55
}
66

77
# Helper function for checking integrity of pheno data
8-
check_pheno_data <- function(x, id_prefix, log_messages = FALSE) {
8+
check_pheno_data <- function(x, id_prefix, id_column = NULL, log_messages = FALSE) {
99
log_text_if("\nChecking sample information", log_messages)
1010

1111
# Check that Injection order is included
1212
if (!"Injection_order" %in% colnames(x)) {
1313
stop('"Injection_order" not found for the samples')
1414
}
15-
# No NAs allowd in Injection order
15+
# No NAs allowed in Injection order
1616
if (any(is.na(x$Injection_order))) {
1717
stop("Missing values in Injection_order")
1818
}
@@ -31,7 +31,18 @@ check_pheno_data <- function(x, id_prefix, log_messages = FALSE) {
3131
warning("QC column not found and can not be generated. Please create one before constructing a MetaboSet object.")
3232
}
3333
}
34-
34+
# If id_column is provided, try to change name of the column to "Sample_ID"
35+
if (!is.null(id_column)) {
36+
log_text_if("Checking provided sample ID column", log_messages)
37+
if (!id_column %in% colnames(x)) {
38+
log_text_if(paste0("ID column '", id_column, "' not found"), log_messages)
39+
} else if (!any(duplicated(x[, id_column])) && !any(is.na(x[, id_column]))) {
40+
x$Sample_ID <- x[, id_column]
41+
log_text_if(paste0("Column 'Sample_ID' created from ", id_column), log_messages)
42+
} else {
43+
log_text_if("Provided sample ID column is not valid", log_messages)
44+
}
45+
}
3546
# If Sample_ID is not provided explicitly, it will be created
3647
if (!"Sample_ID" %in% colnames(x)) {
3748
x$Sample_ID <- paste0(id_prefix, x$Injection_order)
@@ -101,7 +112,7 @@ looks_numeric <- function(x) {
101112

102113
# Check that all abundances look OK
103114
check_exprs <- function(exprs_, log_messages = FALSE) {
104-
log_text_if("Checking that feature abundances only contain numeric values", log_messages)
115+
log_text_if("Checking that feature abundances only contain numeric values", log_messages)
105116
# Check that all rows are full of numbers
106117
non_numerics <- exprs_ %>%
107118
apply(1, function(x){!looks_numeric(x)})
@@ -114,16 +125,34 @@ check_exprs <- function(exprs_, log_messages = FALSE) {
114125
exprs_
115126
}
116127

117-
check_feature_data <- function(feature_data, log_messages = FALSE) {
128+
check_feature_data <- function(feature_data, check_limits = TRUE, mz_limits = c(10, 2000), rt_limits = c(0, 20), log_messages = FALSE) {
129+
log_text_if("\nChecking feature information", log_messages)
118130
log_text_if("Checking that feature IDs are unique and not stored as numbers", log_messages)
119131
fid <- feature_data$Feature_ID
132+
if (any(duplicated(fid))) {
133+
stop("Feature_ID values are not unique")
134+
}
120135
if (any(is.na(fid))) {
121136
stop("Missing values in Feature IDs")
122137
}
123138
fid_num <- suppressWarnings(as.numeric(fid))
124139
if (any(!is.na(fid_num))) {
125140
stop("Numbers are not allowed as feature IDs")
126141
}
142+
fid_chr <- suppressWarnings(as.character(fid))
143+
if (any(grepl("^[[:digit:]]", fid_chr))) {
144+
stop("Feature IDs can not start with numbers")
145+
}
146+
if (check_limits) {
147+
log_text_if("Checking that m/z and retention time values are reasonable", log_messages)
148+
mz <- feature_data[, find_mz_rt_cols(feature_data)$mz_col]
149+
rt <- feature_data[, find_mz_rt_cols(feature_data)$rt_col]
150+
if (!(all(mz > mz_limits[1]) && all(mz < mz_limits[2])) ||
151+
!(all(rt > rt_limits[1]) && all(rt < rt_limits[2]))) {
152+
stop("Values in m/z or retention time columns are outside limits.")
153+
}
154+
}
155+
127156
feature_data
128157
}
129158

@@ -140,11 +169,14 @@ check_feature_data <- function(feature_data, log_messages = FALSE) {
140169
#'
141170
#' @param file path to the Excel file
142171
#' @param sheet the sheet number or name
172+
#' @param id_column character, column name for unique identification of samples
143173
#' @param corner_row integer, the bottom row of sample information, usually contains data file names and feature info column names. If set to NULL, will be detected automatically.
144174
#' @param corner_column integer or character, the corresponding column number or the column name (letter) in Excel. If set to NULL, will be detected automatically.
145175
#' @param id_prefix character, prefix for autogenerated sample IDs, see Details
146176
#' @param split_by character vector, in the case where all the modes are in the same Excel file, the column names of feature data used to separate the modes (usually Mode and Column)
147177
#' @param name in the case where the Excel file only contains one mode, the name of the mode, such as "Hilic_neg"
178+
#' @param mz_limits numeric vector of two, all m/z values should be in between these
179+
#' @param rt_limits numeric vector of two, all retention time values should be in between these
148180
#' @param skip_checks logical: skip checking data integrity. Not recommended, but sometimes useful when you
149181
#' just want to read the data in as is and fix errors later. NOTE: Sample_ID and QC columns will not be constructed.
150182
#' The data integrity checks need to be passed when contstructing MetaboSet objects.
@@ -162,16 +194,16 @@ check_feature_data <- function(feature_data, log_messages = FALSE) {
162194
#' The function will try to find columns for mass and retention time by looking at a few common alternatives,
163195
#' and throw an error if no matching column is found. Sample information needs to contain a row called "Injection_order",
164196
#' and the values need to be unique. In addition, a possible sample identifier row needs to be named "Sample_ID",
165-
#' and the values need to be unique, with an exception of QC samples: if there are any "QC" identifiers, they will
166-
#' be replaced with "QC_1", "QC_2" and so on. If a "Sample_ID" row is not found, it will be created using the \code{id_prefix}
167-
#' and injection order.
197+
#' or to be specified in \code{id_column}, and the values need to be unique, with an exception of QC samples:
198+
#' if there are any "QC" identifiers, they will be replaced with "QC_1", "QC_2" and so on.
199+
#' If a "Sample_ID" row is not found, it will be created using the \code{id_prefix} and injection order.
168200
#'
169201
#'
170202
#' @importFrom magrittr "%>%"
171203
#'
172204
#' @export
173-
read_from_excel <- function(file, sheet = 1, corner_row = NULL, corner_column = NULL,
174-
id_prefix = "ID_", split_by = NULL, name = NULL,
205+
read_from_excel <- function(file, sheet = 1, id_column = NULL, corner_row = NULL, corner_column = NULL,
206+
id_prefix = "ID_", split_by = NULL, name = NULL, mz_limits = c(10, 2000), rt_limits = c(0, 20),
175207
skip_checks = FALSE) {
176208

177209
if (!requireNamespace("openxlsx", quietly = TRUE)) {
@@ -232,10 +264,17 @@ read_from_excel <- function(file, sheet = 1, corner_row = NULL, corner_column =
232264

233265
# If the file only contains one mode, add the mode name as Split column
234266
if (!is.null(name)) {
235-
log_text(paste0("Assigning ", name, " as the value of the Split column for each feature"))
267+
log_text(paste0("Assigning ", name,
268+
" as the value of the Split column for each feature"))
236269
feature_data$Split <- name
237270
split_by <- "Split"
238271
} else { # Multiple modes in the file, create Split column to separate modes
272+
if (!all(split_by %in% colnames(feature_data))) {
273+
stop(paste0("Couldn't find column(s): ",
274+
paste(split_by[!(split_by %in% colnames(feature_data))],
275+
collapse = ", ")
276+
))
277+
}
239278
log_text(paste0("Creating Split column from ",
240279
paste0(split_by, collapse = ", ")))
241280
feature_data <- feature_data %>%
@@ -253,17 +292,25 @@ read_from_excel <- function(file, sheet = 1, corner_row = NULL, corner_column =
253292
best_classes() %>%
254293
dplyr::mutate_if(is.factor, as.character)
255294
rownames(feature_data) <- feature_data$Feature_ID
295+
log_text("Replacing dots (.) in feature information column names with underscores (_)")
296+
colnames(feature_data) <- gsub("[.]", "_", colnames(feature_data))
256297

257298
# Extract LC-MS measurements as matrix
258-
log_text(paste0("\nExtracting feature abundances from rows ", cr+1, " to ", nrow(dada),
299+
log_text(paste0("\nExtracting feature abundances from rows ", cr+1, " to ", nrow(dada),
259300
" and columns ", excel_columns[cc + 1], " to ", excel_columns[ncol(dada)]))
260301
exprs_ <- dada[(cr+1):nrow(dada), (cc+1):ncol(dada)]
261302

262303
# Skip checks
263304
if (!skip_checks) {
264-
pheno_data <- check_pheno_data(x = pheno_data, id_prefix = id_prefix, log_messages = TRUE)
305+
pheno_data <- check_pheno_data(x = pheno_data, id_prefix = id_prefix,
306+
id_column = id_column, log_messages = TRUE
307+
)
265308
exprs_ <- check_exprs(exprs_, log_messages = TRUE)
266-
feature_data <- check_feature_data(feature_data, log_messages = TRUE)
309+
feature_data <- check_feature_data(feature_data,
310+
mz_limits = mz_limits,
311+
rt_limits = rt_limits,
312+
log_messages = TRUE
313+
)
267314
}
268315

269316
rownames(exprs_) <- rownames(feature_data)
@@ -275,9 +322,9 @@ read_from_excel <- function(file, sheet = 1, corner_row = NULL, corner_column =
275322
# Helper function to search for mass and retention time column names
276323
find_mz_rt_cols <- function(feature_data) {
277324
# Find mass and retention time columns
278-
mz_tags <- c("mass", "average mz", "average.mz", "molecularweight", "molecular weight")
325+
mz_tags <- c("mass", "average mz", "average.mz", "molecularweight", "molecular weight", "average_mz")
279326
rt_tags <- c("retention time", "retentiontime", "average rt[(]min[)]",
280-
"average[.]rt[.]min[.]", "^rt$")
327+
"average[_]rt[_]min[_]", "average[.]rt[.]min[.]", "^rt$")
281328

282329
mz_col <- NULL
283330
for (tag in mz_tags) {
@@ -360,20 +407,22 @@ MetaboSet <- setClass("MetaboSet",
360407

361408
setValidity("MetaboSet",
362409
function(object) {
363-
if (!is.na(object@group_col) & !object@group_col %in% colnames(object@phenoData@data)) {
364-
paste0("Column '", object@group_col, "' not found in pheno data")
365-
} else if (!is.na(object@time_col) & !object@time_col %in% colnames(object@phenoData@data)) {
366-
paste("Column", object@time_col, "not found in pheno data")
367-
} else if (!is.na(object@subject_col) & !object@subject_col %in% colnames(object@phenoData@data)) {
368-
paste("Column", object@subject_col, "not found in pheno data")
410+
if (!is.na(group_col(object)) & !group_col(object) %in% colnames(pData(object))) {
411+
return(paste0("Column '", group_col(object), "' not found in pheno data"))
412+
} else if (!is.na(time_col(object)) & !time_col(object) %in% colnames(pData(object))) {
413+
return(paste("Column", time_col(object), "not found in pheno data"))
414+
} else if (!is.na(subject_col(object)) & !subject_col(object) %in% colnames(pData(object))) {
415+
return(paste("Column", subject_col(object), "not found in pheno data"))
369416
} else if (!all(c("Injection_order", "Sample_ID", "QC") %in% colnames(pData(object)))) {
370-
"Pheno data should contain columns Sample_ID, QC and Injection_order"
417+
return("Pheno data should contain columns Sample_ID, QC and Injection_order")
418+
} else if (any(is.na(pData(object)[, "QC"]))) {
419+
return("QC column should not contain NAs")
371420
} else if (!"Flag" %in% colnames(fData(object))) {
372-
"Flag column not found in fData"
421+
return("Flag column not found in fData")
373422
} else {
374423
x <- check_pheno_data(pData(object), id_prefix = "")
375424
x <- check_exprs(exprs(object))
376-
x <- check_feature_data(fData(object))
425+
x <- check_feature_data(fData(object), check_limits = FALSE)
377426
TRUE
378427
}
379428
})

R/logging.R

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,24 @@
44
#' Initialize a log file with the current data and time.
55
#' All major operations run after this will be logged to the specified file.
66
#'
7-
#' @section Warning:
8-
#' This overwrites the current contents of the file
9-
#'
107
#' @param log_file Path to the log file
118
#'
129
#' @examples
1310
#' file_name <- "~/log.txt"
1411
#' init_log(file_name)
1512
#' # Print the contents of the file
16-
#' scan(file_name, sep="\n", what = "chracter")
13+
#' scan(file_name, sep = "\n", what = "character")
1714
#'
1815
#' @seealso \code{\link{log_text}}, \code{\link{finish_log}}, \code{\link{log_state}}
1916
#'
2017
#' @export
2118
init_log <- function(log_file) {
2219
futile.logger::flog.appender(futile.logger::appender.tee(log_file), name = "notame")
23-
log_text(paste0("Starting logging"))
20+
log_text("Starting logging")
21+
# Pass errors to log
22+
options(error = function() {
23+
futile.logger::flog.error(geterrmessage(), name = "notame")
24+
})
2425
}
2526

2627
#' Log text to the current log file
@@ -35,13 +36,13 @@ init_log <- function(log_file) {
3536
#' init_log(file_name)
3637
#' log_text("Hello World!")
3738
#' # Print the contents of the file
38-
#' scan(file_name, sep="\n", what = "chracter")
39+
#' scan(file_name, sep = "\n", what = "character")
3940
#'
4041
#' @seealso \code{\link{init_log}}, \code{\link{finish_log}}, \code{\link{log_state}}
4142
#'
4243
#' @export
4344
log_text <- function(text) {
44-
futile.logger::flog.info(text)
45+
futile.logger::flog.info(text, name = "notame")
4546
}
4647

4748
#' Finish a log
@@ -52,8 +53,10 @@ log_text <- function(text) {
5253
#'
5354
#' @export
5455
finish_log <- function() {
56+
# Return default option for error
57+
options(error = NULL)
5558
# Log end of session info
5659
futile.logger::flog.info(paste("Finished analysis. ", date(), "\nSession info:\n", sep=""))
5760
futile.logger::flog.info(capture.output(sessionInfo()))
58-
futile.logger::flog.appender(futile.logger::appender.console(), name = "notame")
61+
invisible(futile.logger::flog.appender(futile.logger::appender.console(), name = "notame"))
5962
}

0 commit comments

Comments
 (0)