Skip to content

feat: upload in multiple commits when many files#300

Open
drbh wants to merge 4 commits intomainfrom
prefer-upload-large-folder-when-needed
Open

feat: upload in multiple commits when many files#300
drbh wants to merge 4 commits intomainfrom
prefer-upload-large-folder-when-needed

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Feb 27, 2026

This PR adds a path to use the upload_large_folder api when there are more than 200 files in the build output. This helps avoid timeouts when many files are in the build. Otherwise the normal upload_folder is preferred since it has a bit more flexibility around delete_patterns and the commit_message

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul
sayakpaul previously approved these changes Feb 28, 2026
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test as well? No strong opinions, of course.

Comment on lines +79 to +90
if file_count > 200:
print(
f"⚠️ Found {file_count} files to upload, which exceeds the 200 file limit for a single commit. Deleting old build files and re-uploading the whole build folder to avoid hitting file limits."
)
kernel_root_dir = build_dir.parent
api.upload_large_folder(
repo_id=repo_id,
folder_path=kernel_root_dir,
revision=branch,
repo_type="model",
allow_patterns=["build/torch*"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not just remove old build files, since it can break our version contract. Also, if we automatically switch to a different upload type, the behavior should be exactly the same.

Is it possible to add delete_patterns to upload_large_folder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately upload_large_folder does not contain delete_patterns. and I actually misunderstood the upload functionality and added an incorrect comment. The upload_large_folder function doesn't seem to delete any files. It upserts the files that match the allow pattern.

I fully agree they should act exactly the same, however don't we want to delete the older build within a version? For example my understanding is the upload_folder path deletes old files like

{'torch210-cxx11-cu130-x86_64-linux/**', '.cache/**', 'torch210-cxx11-cu126-x86_64-linux/**', 'torch29-cxx11-cu126-x86_64-linux/**', 'torch29-cxx11-cu128-x86_64-linux/**', 'torch210-cxx11-cu128-x86_64-linux/**', 'torch29-cxx11-cu130-x86_64-linux/**'}

so the old .so is removed and replaced with the new .so (since they will have unique suffixes). I think to fulfill this logic with the upload_large_folder there will need to be an explicit delete call made before or after the large folder upload.

another possible option is to avoid upload_large_folder all together and in the case of many files, we either retry the upload command a couple times (this works in practice) or we iterate over each build variant and upload each in its own commit (but this may still suffer if a single build variant is too large)

what do you think is the best approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need two different behaviors:

  • We need to if a variant torch210-cxx11-cu130-x86_64-linux is uploaded, we need to remove all stale files like you mention, e.g. the old .so file.
  • At the same time, if we e.g. upload only torch210-cxx11-cu130-x86_64-linux and torch210-cxx11-cu126-x86_64-linux, we should not remove torch29-cxx11-cu126-x86_64-linux.

So the logic is: remove stale files of build variants that are uploaded, but do not remove any files of build variants that are not uploaded.

I think to fulfill this logic with the upload_large_folder there will need to be an explicit delete call made before or after the large folder upload.

But I think deletes should be atomic or at least after the large upload to avoid having an undownloadable kernel if upload fails.

For now, for large folders doing one upload per build variant sounds like the best option? We guarantee the existing semantics and it might be enough for most cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, for large folders doing one upload per build variant sounds like the best option? We guarantee the existing semantics and it might be enough for most cases?

Good point! I agree this should cut it for now. If anything more advanced arises, we can also discuss a bit with the Hub team to figure out what's best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc @Wauplin just in case for viz)

Copy link
Contributor

@Wauplin Wauplin Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping! General note, never hesitate to ping me or @hanouticelina on such topics either in Github or directly on Slack so we can either advice on a solution or adapt huggingface_hub if some inconsistencies can be resolved in it.

Copy link
Contributor

@Wauplin Wauplin Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this specific case, the "why" upload_large_folder do not have delete_patterns it's because the method is meant for heavy uploads, with a resumable and multi-commit process. Primary purpose is for example to upload very large datasets once, not necessarily as an automated process. In such cases, deleting files as part of the upload is not really important (repo author can always clean-up their files afterwards). If integrating in a more automated workflow -typically in this PR-, then we would change a bit the API I think.

In your very specific case, I think the best is to use the low-level API create_commit with a list of CommitOperationAdd and CommitOperationDelete. Something like this should work:

# build list of delete ops from previous build
delete_operations = [CommitOperationDelete(...) for file in list_repo_files(...) if file ...]

# build list of add ops
files_to_add = [...]  # to implement
add_operations = [CommitOperationAdd(...) for file in files_to_add]

# commit by batch of 200 ops (nb to tweak?)
for chunk_operations in chunk_iterator(delete_operations + add_operations, n=200):
    create_commit(...operations=chunk_operations)

This is a bit more work but you have full flexibility and it works well. And you can adapt the commit message as you want. The good part is that you have exactly the same logic no matter the number of ops (i.e. no "if ... then use upload_folder else use upload_large_folder").

Let me know what you think!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach, thanks for the tip! I've updated to follow this pattern in the latest commits

delete_patterns=list(delete_patterns),
commit_message="Build uploaded using `kernels`.",
allow_patterns=["torch*"],
file_count = sum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a separate function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I've moved this logic to a new function in the latest commit

@drbh
Copy link
Collaborator Author

drbh commented Mar 2, 2026

Should we add a test as well? No strong opinions, of course.

good idea! I've added a test to check that we enter the path that calls upload_large_folder in the kernels/tests/test_kernel_upload.py file. Thanks

*note, will update the test accordingly if we change the approach based on the question above.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments. LMK if they're clear.

)
file_count = get_file_count_in_build(build_dir)

if file_count > 1_000:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit): 1_000 could be made a constant.

_get_hf_api().delete_repo(repo_id=REPO_ID)


def test_large_kernel_upload_uses_kernel_root_path(monkeypatch, tmp_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is perfect!

print(
f"⚠️ Found {file_count} files to upload, which exceeds the 1,000 file limit for a single commit."
)
kernel_root_dir = build_dir.parent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call delete_files() here to maintain parity with the existing behaviour of upload_kernels().

repo_type="model",
allow_patterns=["build/torch*"],
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above to avoid having 2 different upload logic #300 (comment)

@drbh drbh force-pushed the prefer-upload-large-folder-when-needed branch from c7e74ee to f0ea3b7 Compare March 11, 2026 15:52
@drbh
Copy link
Collaborator Author

drbh commented Mar 11, 2026

@Wauplin thanks for the suggestions! I've updated in the latest commits, would you mind taking a quick look to ensure that I'm using the api correctly. Thanks

@drbh drbh changed the title feat: prefer upload_large_folder when many files feat: upload in multiple commits when many files Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants