Update vignettes/datatable-benchmarking.Rmd

jangorecki · MichaelChirico · web-flow · commit 5b5b4bf60c63 · 2025-08-27T20:57:55.000+06:00
Co-authored-by: Michael Chirico &lt;chiricom@google.com&gt;
diff --git a/vignettes/datatable-benchmarking.Rmd b/vignettes/datatable-benchmarking.Rmd
@@ -27,7 +27,7 @@ This document is meant to guide on measuring performance of `data.table`. Single
 
 ## General suggestions
 
-Lets assume you are measuring particular process. It is blazingly fast, it takes only microseonds to evalute.
+Let's assume you are measuring a particular process. It is blazingly fast, taking only microseconds to evaluate.
 What does it mean and how to approach such measurements?
 The smaller time measurements are, the relatively bigger call overhead is. Call overhead can be perceived as a noise in measurement due by method dispatch, package/class initialization, low level object constructors, etc. As a result you naturally may want to measure timing many times and take the average to deal with the noise. This is valid approach, but the magnitude of timing is much more important. What will be the impact of extra 5, or lets say 5000 microseconds if writing results to target environment/format takes a minute? 1 second is 1 000 000 microseconds. Does the microseconds, or even miliseconds makes any difference? There are cases where it makes difference, for example when you call a function for every row, then you definitely should care about micro timings. The point is that in most user's benchmarks it won't make difference. Most of common R functions are vectorized, thus you are not calling them for every row. If something is blazingly fast for your data and use case then perhaps you may not have to worry about performance and benchmarks. Unless you want to scale your process, then you should worry because if something is blazingly fast today it might not be that fast tomorrow, just because your process will receive more data on input. In consequence you should confirm that your process will scale.
 There are multiple dimensions that you should consider when examining scaling of your process.