Performance improvements #6
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are currently trying to benchmark reaction optimization algorithms, including EDBO+. However, while benchmarking we noticed that each iteration of EDBO+ was a time-consuming process, especially when we need ~10000 iteration steps per benchmarking function. After some performance profiling, we found and optimized a few sections to help speed up the process.
The first improvement was achieved by optimizing calculating the training and testing indicies in
EDBOplus.run(). This ended up speeding up eachEDBOplus.run()call by 40x. To do this, I removed a redundant calculation ofinternal_dfand used a more efficient lookup method for rows containing "PENDING" to create it.Additionally, I added an extra, optional parameter to
EDBOplus.run()calledwrite_extra_datato make writing the predictions file at each step optional. This helps save time, since the file can be quite large depending upon your reaction space. To maintain status quo, I made the default valueTrueso the prediction file is still written.I also included some comments about the details of the timing improvements for reference, but they can be removed if necessary.