GH-41364: [GLib][Ruby] Allow passing thread pool to ExecutePlan#48462
GH-41364: [GLib][Ruby] Allow passing thread pool to ExecutePlan#48462kou merged 14 commits intoapache:mainfrom
Conversation
|
|
25a9d01 to
3bb6dad
Compare
c_glib/arrow-glib/thread-pool.cpp
Outdated
There was a problem hiding this comment.
| auto arrow_thread_pool_result = arrow::internal::ThreadPool::Make(num_threads); | |
| auto arrow_thread_pool_result = arrow::internal::ThreadPool::Make(n_threads); |
c_glib/arrow-glib/thread-pool.cpp
Outdated
There was a problem hiding this comment.
We should pass arrow::internal::ThraedPool as property instead of setting directly via priv like we did in GArrowArray:
| auto thread_pool = GARROW_THREAD_POOL(g_object_new(GARROW_TYPE_THREAD_POOL, NULL)); | |
| auto priv = GARROW_THREAD_POOL_GET_PRIVATE(thread_pool); | |
| priv->thread_pool = *arrow_thread_pool_result; | |
| auto arrow_thread_pool = *arrow_thread_pool_result; | |
| auto thread_pool = GARROW_THREAD_POOL(g_object_new(GARROW_TYPE_THREAD_POOL, "thread-pool", *arrow_thread_pool, nullptr)); |
If you don't know how to do it, can I push some commits to this branch?
There was a problem hiding this comment.
What it is the reason for doing it that way? I would like to learn.
I pushed a change, but I don't really understand what I'm doing. For example the property memory-pool appears in the documentation, which I find confusing: https://arrow.apache.org/docs/c_glib/arrow-glib/property.MemoryPool.memory-pool.html
There was a problem hiding this comment.
It's for sub class. If we do it in constructor, sub class needs to re-implement it in sub class' constructor.
If we do it in property setter, sub class can reuse it.
For example the property
memory-poolappears in the documentation, which I find confusing: https://arrow.apache.org/docs/c_glib/arrow-glib/property.MemoryPool.memory-pool.html
We need the documentation for it. It should say that this is for advanced users who use C++/GLib/sub class/....
There was a problem hiding this comment.
Could you rename to executer.{cpp,h,hpp}?
ThreadPool is one of executors.
There was a problem hiding this comment.
I renamed to executor.{cpp,h,hpp} which is what I assume you meant.
On that note, perhaps we should create a base class Executor, and have ExecuteContext accept that instead of a ThreadPool?
There was a problem hiding this comment.
I renamed to
executor.{cpp,h,hpp}which is what I assume you meant.
Oh, sorry. You're right.
On that note, perhaps we should create a base class
Executor, and haveExecuteContextaccept that instead of aThreadPool?
Yes. It's better.
There was a problem hiding this comment.
Add Executor base class.
c_glib/arrow-glib/compute.cpp
Outdated
There was a problem hiding this comment.
We should set thread_pool as property.
There was a problem hiding this comment.
I changed it, but that means we cannot create the arrow::compute::ExecContext until the property is set. Is that ok?
There was a problem hiding this comment.
If we want to avoid needless arrow::compute::ExecContext construction, we need to change
typedef struct GArrowExecuteContextPrivate_
{
arrow::compute::ExecContext context;
} GArrowExecuteContextPrivate;to
typedef struct GArrowExecuteContextPrivate_
{
std::shared_ptr<arrow::compute::ExecContext> context;
} GArrowExecuteContextPrivate;and call std::make_shared<arrow::compute::ExecContext>(...) in garrow_execute_context_set_property(PROP_THREAD_POOL).
c_glib/arrow-glib/compute.cpp
Outdated
There was a problem hiding this comment.
Let's use GArrowThreadPool here:
| std::shared_ptr<arrow::internal::ThreadPool> thread_pool; | |
| GArrowThreadPool *thread_pool; |
|
Thanks for the review @kou! I have applied you suggestions and it looks much better now. |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit db9f556. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Aggregators like
firstandlastare unusable in Ruby because they don't work when the execution plan is executed using multiple threads.What changes are included in this PR?
This adds the
ThreadPoolclass to be able create a thread pool with a single thread. This can then be passed when creating anExecuteContext, which in turn can be passed when creating anExecutePlan.Are these changes tested?
A Ruby test that shows that the
firstaggregator works.Are there any user-facing changes?
A new
GArrowThreadPoolclass, and changed signatures of the functionsgarrow_execute_context_newandgarrow_execute_plan_new. However since the new arguments are nullable, it is backwards compatible for the Ruby API.This PR includes breaking changes to public APIs.