-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Description
Hi, I'm trying to use your code as part of an algorithm on a large dataset using Scala and Apache Spark. I'm having great results in terms of accuracy but I did if on several samples of GPS tracklog data and have a very skewed distribution of duration
| Metric | Min | 25th percentile | Median | 75th percentile | Max |
|---|---|---|---|---|---|
| Duration | 10 s | 1.2 min | 6.2 min | 12 min | 51 min |
| GC Time | 0.2 s | 2 s | 4 s | 4 s | 10 s |
| Input | 25.5 MB | 128.1 MB | 128.1 MB | 128.1 MB | 128.1 MB |
| Output | 18.4 MB | 93.3 MB | 93.6 MB | 93.7 MB | 94.0 MB |
I would like to know it this is an expected behaviour of this algorithm or if you have some tips and tricks to have a more stable results for any dataset (having less variance, maybe at the cost of having an higher average duration.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels