Skip to content

[fileio] Add file-io.allow-cache for RESTTokenFileIO#5054

Merged
JingsongLi merged 6 commits intoapache:masterfrom
JingsongLi:fileio_allow_cache
Feb 12, 2025
Merged

[fileio] Add file-io.allow-cache for RESTTokenFileIO#5054
JingsongLi merged 6 commits intoapache:masterfrom
JingsongLi:fileio_allow_cache

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

  1. Add file-io.allow-cache for RESTTokenFileIO, file io should be closed in RESTTokenFileIO, so it can not be cached.
  2. Improve HadoopCompliantFileIO, it should use Map.computeIfAbsent.
  3. We should use new Configuration(false) to not load files in configuration for improving performance.

Tests

API and Format

Documentation

<td><h5>resolving-file-io.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable resolving fileio, when this option is enabled, in conjunction with the table's property data-file.external-paths, Paimon can read and write to external storage paths, such as OSS or S3. In order to access these external paths correctly, you also need to configure the corresponding access key and secret key.</td>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileio -> file io same with line 87?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok in description, we can just be careful for option key, it is API part.


@Override
public void close() {
if (!allowCache) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means, if allowCache, the fileIo need to close by outside?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not allow cache, the fileio should be closed.

If allow cache, the fileio is in the static cache, we cannot close them.

<td><h5>file-io.allow-cache</h5></td>
<td style="word-wrap: break-word;">true</td>
<td>Boolean</td>
<td>Whether to allow static cache in file io implementation. If not allowed, this means that there may be a large number of FileIO instances generated, enabling caching can lead to resource leakage.</td>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here:
If enabling caching, the number of FileIO instances will be reduce. Why enabling caching can lead to resource leakage?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our original design was to place it in the cache of static variables, but for the case of REST Catalog, it may have a lot of FileIO generated, and there may be a FileIO for each table because each table has different file access permissions.

So if there are too many FileIOs in this situation, we cannot cache them in memory casually. If there are too many, it will lead to too many resources, that is, resource leakage.

@jerry-024
Copy link
Copy Markdown
Contributor

LGTM

@JingsongLi JingsongLi merged commit 155e51c into apache:master Feb 12, 2025
13 checks passed
JackeyLee007 pushed a commit to JackeyLee007/paimon that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants