Skip to content

GH-41364: [GLib][Ruby] Allow passing thread pool to ExecutePlan#48462

Merged
kou merged 14 commits intoapache:mainfrom
stenlarsson:glib-execute-plan-threads
Dec 16, 2025
Merged

GH-41364: [GLib][Ruby] Allow passing thread pool to ExecutePlan#48462
kou merged 14 commits intoapache:mainfrom
stenlarsson:glib-execute-plan-threads

Conversation

@stenlarsson
Copy link
Contributor

@stenlarsson stenlarsson commented Dec 11, 2025

Rationale for this change

Aggregators like first and last are unusable in Ruby because they don't work when the execution plan is executed using multiple threads.

What changes are included in this PR?

This adds the ThreadPool class to be able create a thread pool with a single thread. This can then be passed when creating an ExecuteContext, which in turn can be passed when creating an ExecutePlan.

Are these changes tested?

A Ruby test that shows that the first aggregator works.

Are there any user-facing changes?

A new GArrowThreadPool class, and changed signatures of the functions garrow_execute_context_new and garrow_execute_plan_new. However since the new arguments are nullable, it is backwards compatible for the Ruby API.

This PR includes breaking changes to public APIs.

@stenlarsson stenlarsson requested a review from kou as a code owner December 11, 2025 11:19
@github-actions
Copy link

⚠️ GitHub issue #41364 has been automatically assigned in GitHub to PR creator.

@stenlarsson stenlarsson force-pushed the glib-execute-plan-threads branch from 25a9d01 to 3bb6dad Compare December 11, 2025 11:24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto arrow_thread_pool_result = arrow::internal::ThreadPool::Make(num_threads);
auto arrow_thread_pool_result = arrow::internal::ThreadPool::Make(n_threads);

Comment on lines 83 to 85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass arrow::internal::ThraedPool as property instead of setting directly via priv like we did in GArrowArray:

Suggested change
auto thread_pool = GARROW_THREAD_POOL(g_object_new(GARROW_TYPE_THREAD_POOL, NULL));
auto priv = GARROW_THREAD_POOL_GET_PRIVATE(thread_pool);
priv->thread_pool = *arrow_thread_pool_result;
auto arrow_thread_pool = *arrow_thread_pool_result;
auto thread_pool = GARROW_THREAD_POOL(g_object_new(GARROW_TYPE_THREAD_POOL, "thread-pool", *arrow_thread_pool, nullptr));

If you don't know how to do it, can I push some commits to this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What it is the reason for doing it that way? I would like to learn.

I pushed a change, but I don't really understand what I'm doing. For example the property memory-pool appears in the documentation, which I find confusing: https://arrow.apache.org/docs/c_glib/arrow-glib/property.MemoryPool.memory-pool.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for sub class. If we do it in constructor, sub class needs to re-implement it in sub class' constructor.

If we do it in property setter, sub class can reuse it.

For example the property memory-pool appears in the documentation, which I find confusing: https://arrow.apache.org/docs/c_glib/arrow-glib/property.MemoryPool.memory-pool.html

We need the documentation for it. It should say that this is for advanced users who use C++/GLib/sub class/....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename to executer.{cpp,h,hpp}?
ThreadPool is one of executors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to executor.{cpp,h,hpp} which is what I assume you meant.

On that note, perhaps we should create a base class Executor, and have ExecuteContext accept that instead of a ThreadPool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to executor.{cpp,h,hpp} which is what I assume you meant.

Oh, sorry. You're right.

On that note, perhaps we should create a base class Executor, and have ExecuteContext accept that instead of a ThreadPool?

Yes. It's better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Executor base class.

Comment on lines 328 to 329
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set thread_pool as property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, but that means we cannot create the arrow::compute::ExecContext until the property is set. Is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid needless arrow::compute::ExecContext construction, we need to change

typedef struct GArrowExecuteContextPrivate_
{
  arrow::compute::ExecContext context;
} GArrowExecuteContextPrivate;

to

typedef struct GArrowExecuteContextPrivate_
{
  std::shared_ptr<arrow::compute::ExecContext> context;
} GArrowExecuteContextPrivate;

and call std::make_shared<arrow::compute::ExecContext>(...) in garrow_execute_context_set_property(PROP_THREAD_POOL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Dec 14, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Dec 14, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use GArrowThreadPool here:

Suggested change
std::shared_ptr<arrow::internal::ThreadPool> thread_pool;
GArrowThreadPool *thread_pool;

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 14, 2025
@stenlarsson stenlarsson requested a review from kou December 15, 2025 08:40
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 16, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 16, 2025
@stenlarsson
Copy link
Contributor Author

Thanks for the review @kou! I have applied you suggestions and it looks much better now.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Dec 16, 2025
@kou kou merged commit db9f556 into apache:main Dec 16, 2025
13 checks passed
@kou kou removed the awaiting merge Awaiting merge label Dec 16, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit db9f556.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments