Skip to content

Commit 5b5b4bf

Browse files
Update vignettes/datatable-benchmarking.Rmd
Co-authored-by: Michael Chirico <[email protected]>
1 parent 6ee3826 commit 5b5b4bf

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vignettes/datatable-benchmarking.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This document is meant to guide on measuring performance of `data.table`. Single
2727

2828
## General suggestions
2929

30-
Lets assume you are measuring particular process. It is blazingly fast, it takes only microseonds to evalute.
30+
Let's assume you are measuring a particular process. It is blazingly fast, taking only microseconds to evaluate.
3131
What does it mean and how to approach such measurements?
3232
The smaller time measurements are, the relatively bigger call overhead is. Call overhead can be perceived as a noise in measurement due by method dispatch, package/class initialization, low level object constructors, etc. As a result you naturally may want to measure timing many times and take the average to deal with the noise. This is valid approach, but the magnitude of timing is much more important. What will be the impact of extra 5, or lets say 5000 microseconds if writing results to target environment/format takes a minute? 1 second is 1 000 000 microseconds. Does the microseconds, or even miliseconds makes any difference? There are cases where it makes difference, for example when you call a function for every row, then you definitely should care about micro timings. The point is that in most user's benchmarks it won't make difference. Most of common R functions are vectorized, thus you are not calling them for every row. If something is blazingly fast for your data and use case then perhaps you may not have to worry about performance and benchmarks. Unless you want to scale your process, then you should worry because if something is blazingly fast today it might not be that fast tomorrow, just because your process will receive more data on input. In consequence you should confirm that your process will scale.
3333
There are multiple dimensions that you should consider when examining scaling of your process.

0 commit comments

Comments
 (0)