Skip to content

Commit 79841ab

Browse files
authored
Update datatable-benchmarking.Rmd redtored line 102
1 parent d38139a commit 79841ab

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vignettes/datatable-benchmarking.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Unless this is what you truly want to measure you should prepare input objects o
109109
Repeating a benchmark many times usually does not give the clearest picture for data processing tools. Of course, it makes perfect sense for more atomic calculations, but this is not a good representation of the most common way these tools will actually be used, namely for data processing tasks, which consist of batches of sequentially provided transformations, each run once.
110110
Matt once said:
111111

112-
Note: I'm very wary of benchmarks measured in anything under 1 second. Much prefer 10 seconds or more for a single run, achieved by increasing data size. A repetition count of 500 is setting off alarm bells. 3-5 runs should be enough to convince on larger data. Call overhead and time to GC affect inferences at this very small scale.
112+
> I'm very wary of benchmarks measured in anything under 1 second. Much prefer 10 seconds or more for a single run, achieved by increasing data size. A repetition count of 500 is setting off alarm bells. 3-5 runs should be enough to convince on larger data. Call overhead and time to GC affect inferences at this very small scale.
113113
114114
This is very valid. The smaller time measurement is the relatively bigger noise is. Noise generated by method dispatch, package/class initialization, etc. Main focus of benchmark should be on real use case scenarios.
115115

0 commit comments

Comments
 (0)