Skip to content

Commit 79d7b01

Browse files
committed
Wrote about function-specific use of OpenMP in parallelization for fread(), added a minor bit to the gsumm.c part
1 parent f3c5355 commit 79d7b01

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

man/openmp-utils.Rd

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,15 @@
6868
For logical, integer, and real types, OpenMP is being used here to parallelize loops that perform conditional checks along with assignment operations over the elements of the supplied logical vector based on the condition (\code{test}) and values provided for the remaining arguments (\code{yes}, \code{no}, and \code{na}).
6969
7070
\item\file{fread.c} - \code{\link{fread}()}
71+
72+
Parallelism is used here to read and process data in chunks (blocks of lines/rows). Expect significant speedup for large files, as I/O operations benefit greatly from parallel processing. OpenMP is used here to:
73+
74+
\itemize{
75+
\item Avoid race conditions or concurrent writes to the output \code{data.table} by having atomic operations on the string data
76+
\item Managing synchronized updates to the progress bar and serializing the output to the console
77+
}
78+
There are no explicit pragmas for parallelizing loops, and instead the use of OpenMP here is in controlling access to shared resources (with the use of critical sections, for instance) in a multi-threaded environment.
79+
7180
\item\file{forder.c}, \file{fsort.c}, and \file{reorder.c} - \code{\link{forder}()} and related
7281
\item\file{froll.c}, \file{frolladaptive.c}, and \file{frollR.c} - \code{\link{froll}()} and family
7382
\item\file{fwrite.c} - \code{\link{fwrite}()}
@@ -76,7 +85,7 @@
7685
7786
\item\file{gsumm.c} - GForce in various places, see \link{GForce}
7887
79-
Functions with GForce optimization are internally parallelized to speed up grouped summaries over a large \code{data.table}. OpenMP is used here to parallelize operations involved in calculating group-wise statistics like sum, mean, and median. The input data is split into batches (groups), and each thread processes a subset of the data based on them.
88+
Functions with GForce optimization are internally parallelized to speed up grouped summaries over a large \code{data.table}. OpenMP is used here to parallelize operations involved in calculating group-wise statistics like sum, mean, and median (implying faster computation of \code{sd}, \code{var}, and \code{prod} as well). The input data is split into batches (groups), and each thread processes a subset of the data based on them.
8089
8190
\item\file{nafill.c} - \code{\link{nafill}()}
8291

0 commit comments

Comments
 (0)