Skip to content

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Sep 10, 2025

Stack from ghstack (oldest at bottom):

Support multiple PTD files in Module. This change updates the following private variables in Module:

std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_

And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.

std::unique_ptr<NamedDataMap> merged_data_map_

The process of using a PTD file is:

std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.

At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in std::unordered_set<std::string> named_data_map_paths_

Differential Revision: D82059808

Support multiple PTD files in Module. This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)

[ghstack-poisoned]
@lucylq lucylq requested a review from shoumikhin as a code owner September 10, 2025 16:44
Copy link

pytorch-bot bot commented Sep 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14158

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e6ed284 with merge base f7c009e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lucylq added a commit that referenced this pull request Sep 10, 2025
Support multiple PTD files in Module. This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)

ghstack-source-id: 308798953
Pull Request resolved: #14158
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82059808

Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Support multiple PTD files in Module. This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)

[ghstack-poisoned]
lucylq added a commit that referenced this pull request Sep 11, 2025
Pull Request resolved: #14158

Support multiple PTD files in Module. Context: https://docs.google.com/document/d/19RLLdWNHQoRi8Ufz4oE-gGjOz0IShjN_NZi5jlgMBZI/edit?tab=t.0

This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

TODO: add a MergedDataMap to extension/module that can merge all the data maps together.

ghstack-source-id: 308975994
@exported-using-ghexport

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D82059808

Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Support multiple PTD files in Module. This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)

[ghstack-poisoned]
lucylq added a commit that referenced this pull request Sep 30, 2025
Pull Request resolved: #14158

Support multiple PTD files in Module. Context: https://docs.google.com/document/d/19RLLdWNHQoRi8Ufz4oE-gGjOz0IShjN_NZi5jlgMBZI/edit?tab=t.0

This change updates the following private variables in Module:

```
std::string data_path --> std::unordered_set<std::string> data_files_
std::unique_ptr<DataLoader> data_map_loader --> std::vectror<std::unique_ptr<DataLoader>> data_map_loaders_
std::unique_ptr<NamedDataMap> data_map --> std::vector<std::unique_ptr<NamedDataMap> named_data_maps_
```
And introduces a new private variable. When we have multiple NamedDataMaps, they need to be merged into one, for use in method, etc. This is not implemented yet.
```
std::unique_ptr<NamedDataMap> merged_data_map_
```

The process of using a PTD file is:
```
std::string file --> wrapped in DataLoader --> wrapped in NamedDataMap.
```
At each stage we can have multiple.

This diff also introduces a new Module constructor that takes in `std::unordered_set<std::string> named_data_map_paths_`

TODO: add a MergedDataMap to extension/module that can merge all the data maps together.

ghstack-source-id: 313188117
@exported-using-ghexport

Differential Revision: [D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)
@facebook-github-bot
Copy link
Contributor

@lucylq has exported this pull request. If you are a Meta employee, you can view the originating Diff in D82059808.

@facebook-github-bot facebook-github-bot merged commit b373abc into gh/lucylq/110/base Oct 1, 2025
139 of 148 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/lucylq/110/head branch October 1, 2025 04:17
mergennachin pushed a commit that referenced this pull request Oct 1, 2025
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #14158 by
@lucylq
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/orig
Differential Revision:
[D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)
@diff-train-skip-merge

Co-authored-by: lucylq <[email protected]>
@lucylq
Copy link
Contributor Author

lucylq commented Oct 6, 2025

@pytorchbot cherry-pick --onto release/1.0 -c examples

Copy link

pytorch-bot bot commented Oct 6, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: argument -c/--classification: invalid choice: 'examples' (choose from 'regression', 'critical', 'fixnewfeature', 'docs', 'release')

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

pytorchbot added a commit that referenced this pull request Oct 6, 2025
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #14158 by
@lucylq
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/orig
Differential Revision:
[D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)
@diff-train-skip-merge

Co-authored-by: lucylq <[email protected]>
(cherry picked from commit 421539e)
GregoryComer pushed a commit that referenced this pull request Oct 7, 2025
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #14158 by
@lucylq
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/lucylq/110/orig
Differential Revision:
[D82059808](https://our.internmc.facebook.com/intern/diff/D82059808/)
@diff-train-skip-merge

Co-authored-by: lucylq <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants