Fix the erroneous note on time complexity of the maxsum algorithm #272
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.


The documentation for the max sum distance algorithm says (or can be easily interpreted to say) that time complexity of the algorithm is
O(n^2). It is correct only about the inner part of the algorithm's main loop. The loop itself iterates over all possible combinations oftop_nphrases amongnr_candidates(usually2 * top_n).It's a bit complicated to derive the actual time complexity. The loop body runs
C(top_n, nr_candidates)times (whereCdenotes a binomial coefficient), each run with quadratic complexity. For the case whennr_candidates == 2 * top_n, I believe in asymptotic terms it ends up asO(n^n)(thus I marked it as "super-exponential" in the docs) but I'm not sure.Anyway, this is much worse than quadratic complexity; look at this growth:
top_n == 5: 252 combinations to evaluate;top_n == 10: 185 thousands combinations;top_n == 20: 139 billions combinations.No wonder it somehow worked on
top_n = 10but I never got it to work attop_n = 20.This PR fixes the misleading notes on time complexity of the algorithm and clarifies the explanation of the algorithm itself a bit. The docstring for it is still weird since the note on complexity is written as if the argument
nr_candidatesdoes not exist and2 * top_nis always used (even though in some example in the docs these arguments are set separately).