Skip to content

Commit 2005487

Browse files
committed
Wrote regarding better speedups for all of those functions in terms of having more rows vs having more columns after some testing (benchmarking) and research
1 parent ba96c07 commit 2005487

File tree

1 file changed

+25
-3
lines changed

1 file changed

+25
-3
lines changed

man/openmp-utils.Rd

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@
4545
\item The checking and handling of undefined values (such as NaNs)
4646
}
4747
48+
Since this function is used to find rows where a column's value falls within a specific range, it benefits more from parallelization when the input data consists of a large number of rows.
49+
4850
\item\file{cj.c} - \code{\link{CJ}()}
4951

5052
OpenMP is used here to parallelize:
@@ -55,6 +57,8 @@
5557
\item The creation of all combinations of the input vectors over the cross-product space
5658
}
5759

60+
Given that the number of combinations increases exponentially as more columns are added, better speedup can be expected when dealing with a large number of columns.
61+
5862
\item\file{coalesce.c} - \code{\link{fcoalesce}()}
5963

6064
OpenMP is used here to parallelize:
@@ -64,10 +68,14 @@
6468
\item The conditional checks within parallelized loops
6569
}
6670

71+
Significant speedup can be expected for more number of columns here, given that this function operates efficiently across multiple columns to find non-NA values.
72+
6773
\item\file{fifelse.c} - \code{\link{fifelse}()}
6874

6975
For logical, integer, and real types, OpenMP is being used here to parallelize loops that perform conditional checks along with assignment operations over the elements of the supplied logical vector based on the condition (\code{test}) and values provided for the remaining arguments (\code{yes}, \code{no}, and \code{na}).
7076

77+
Better speedup can be expected for more number of columns here as well, given that this function operates column-wise with independent vector operations.
78+
7179
\item\file{fread.c} - \code{\link{fread}()}
7280

7381
OpenMP is used here to:
@@ -78,6 +86,8 @@
7886
}
7987
There are no explicit pragmas for parallelizing loops, and instead the use of OpenMP here is mainly in controlling access to shared resources (with the use of critical sections, for instance) in a multi-threaded environment.
8088

89+
This function is highly optimized in reading and processing data with both large numbers of rows and columns, but the efficiency is more pronounced across rows.
90+
8191
\item\file{forder.c}, \file{fsort.c}, and \file{reorder.c} - \code{\link{forder}()} and related
8292

8393
OpenMP is used here to parallelize multiple operations that come together to sort a \code{data.table} using the Radix algorithm. These include:
@@ -88,26 +98,38 @@
8898
\item Creation of histograms which are used to sort data based on significant bits (each thread processes a separate batch of the data, computes the MSB of each element, and then increments the corresponding bins), with the distribution and merging of buckets (specific to \file{fsort.c})
8999
\item The process of reordering a vector or each column in a list of vectors (such as in a \code{data.table}) based on a given vector that dictates the new ordering of elements (specific to \file{reorder.c})
90100
}
101+
102+
Better speedups can be expected when the input data contains a large number of rows as the sorting complexity increases with more rows.
91103

92104
\item\file{froll.c}, \file{frolladaptive.c}, and \file{frollR.c} - \code{\link{froll}()} and family
93105

94106
OpenMP is used here to parallelize the loops that compute the rolling means (\code{frollmean}) and sums (\code{frollsum}) over a sliding window for each position in the input vector.
107+
108+
These functions benefit more in terms of speedup when the data has a large number of columns, primarily due to the efficient memory access patterns (cache-friendly) used when processing the data for each column sequentially in memory to compute the rolling statistic.
95109

96110
\item\file{fwrite.c} - \code{\link{fwrite}()}
97111

98112
OpenMP is used here primarily to parallelize the process of writing rows to the output file, but error handling and compression (if enabled) are also managed within the parallel region. Special attention is paid to thread safety and synchronization, especially in the ordered sections where output to the file and handling of errors is serialized to maintain the correct sequence of rows.
99113

114+
Similar to \code{\link{fread}()}, this function is highly efficient in parallely processing data with large numbers of both rows and columns, but it has more notable speedups with an increased number of rows.
115+
100116
\item\file{gsumm.c} - GForce in various places, see \link{GForce}
101117

102-
Functions with GForce optimization are internally parallelized to speed up grouped summaries over a large \code{data.table}. OpenMP is used here to parallelize operations involved in calculating group-wise statistics like sum, mean, and median (implying faster computation of \code{sd}, \code{var}, and \code{prod} as well). The input data is split into batches (groups), and each thread processes a subset of the data based on them.
118+
Functions with GForce optimization are internally parallelized to speed up grouped summaries over a large \code{data.table}. OpenMP is used here to parallelize operations involved in calculating group-wise statistics like sum, mean, and median (implying faster computation of \code{sd}, \code{var}, and \code{prod} as well).
119+
120+
These optimized grouping operations benefit more in terms of speedup if the input data contains a large number of rows (since they are often used to aggregate data across groups).
103121

104122
\item\file{nafill.c} - \code{\link{nafill}()}
105123

106-
OpenMP is being used here to parallelize the loop that fills missing values over columns of the input data. This includes handling different data types (double, integer, and integer64) and applying the designated filling method (constant, last observation carried forward, or next observation carried backward) to each column in parallel.
124+
OpenMP is being used here to parallelize the loop that fills missing values over columns of the input data. This includes handling different data types (double, integer, and integer64) and applying the designated filling method (constant, last observation carried forward, or next observation carried backward) to each column in parallel.
125+
126+
Given its optimization for column-wise operations, better speedups can be expected when the input data consists of a large number of columns.
107127

108128
\item\file{subset.c} - Used in \code{\link[=data.table]{[.data.table}} subsetting
109129

110-
OpenMP is used here to parallelize the loops that perform the subsetting of vectors, with conditional checks and filtering of data.
130+
OpenMP is used here to parallelize the loops that perform the subsetting of vectors, with conditional checks and filtering of data.
131+
132+
Since subset operations tend to be usually row-dependent, better speedups can be expected when dealing with a large number of rows. However, it also depends on whether the computations are focused on rows or columns (as dictated by the subsetting criteria).
111133

112134
\item\file{types.c} - Internal testing usage
113135

0 commit comments

Comments
 (0)