-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add option for custom filtering based on json #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
33d781b
add json filtering
verityw 2ee47c4
fix minor bugs
verityw 9307027
add filter json generation code
verityw de46c76
move option to remove last n in filter ranges to json generator
verityw f1888e2
move filter_last_n_in_ranges to json
verityw fc43fb1
update comments, readme
verityw 4a3c5fd
remove old filter
verityw ff850ef
update to follow required merge patterns
verityw 00fea47
fix precommit formatting issues
verityw a9e0fc4
subsume filter_last_n_in_ranges into json ranges
verityw 5a93876
minor edits, requested by karl
verityw 1f4506d
Merge branch 'main' into main
kpertsch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| """ | ||
| Iterates through the DROID dataset and a json mapping from episode unique IDs to ranges of time steps | ||
| that should not be filtered out (all others are). | ||
|
|
||
| Specifically, we look for ranges of consecutive steps that contain at most min_idle_len consecutive idle frames | ||
| (default to 7 -- as most DROID action-chunking policies run the first 8 actions generated in each chunk, filtering | ||
| this way means the policy will not get stuck outputting stationary actions). Additionally, we also only keep non-idle | ||
| ranges of length at least min_non_idle_len (default to 16 frames = ~1 second), while also removing the last | ||
| filter_last_n_in_ranges frames from the end of each range (as those all correspond to action chunks with many idle actions). | ||
|
|
||
| This leaves us with trajectory segments consisting of contiguous, significant movement. Training on this filtered set | ||
| yields policies that output fewer stationary actions (i.e., get "stuck" in states less). | ||
| """ | ||
|
|
||
| import json | ||
| import os | ||
| from pathlib import Path | ||
|
|
||
| import numpy as np | ||
| import tensorflow as tf | ||
| import tensorflow_datasets as tfds | ||
| from tqdm import tqdm | ||
|
|
||
| os.environ["CUDA_VISIBLE_DEVICES"] = "" # Set to the GPU you want to use, or leave empty for CPU | ||
|
|
||
| builder = tfds.builder_from_directory( | ||
| # path to the `droid` directory (not its parent) | ||
| builder_dir="<path_to_droid_dataset_tfds_files>", | ||
| ) | ||
| ds = builder.as_dataset(split="train", shuffle_files=False) | ||
| tf.data.experimental.ignore_errors(ds) | ||
|
|
||
| keep_ranges_path = "<path_to_where_to_save_the_json>" | ||
|
|
||
| min_idle_len = 7 # If more than this number of consecutive idle frames, filter all of them out | ||
| min_non_idle_len = 16 # If fewer than this number of consecutive non-idle frames, filter all of them out | ||
| filter_last_n_in_ranges = 10 # When using a filter dict, remove this many frames from the end of each range | ||
|
|
||
| keep_ranges_map = {} | ||
| if Path(keep_ranges_path).exists(): | ||
| with Path(keep_ranges_path).open("r") as f: | ||
| keep_ranges_map = json.load(f) | ||
| print(f"Resuming from {len(keep_ranges_map)} episodes already processed") | ||
|
|
||
| for ep_idx, ep in enumerate(tqdm(ds)): | ||
| recording_folderpath = ep["episode_metadata"]["recording_folderpath"].numpy().decode() | ||
| file_path = ep["episode_metadata"]["file_path"].numpy().decode() | ||
|
|
||
| key = f"{recording_folderpath}--{file_path}" | ||
| if key in keep_ranges_map: | ||
| continue | ||
|
|
||
| joint_velocities = [step["action_dict"]["joint_velocity"].numpy() for step in ep["steps"]] | ||
| joint_velocities = np.array(joint_velocities) | ||
|
|
||
| is_idle_array = np.hstack( | ||
| [np.array([False]), np.all(np.abs(joint_velocities[1:] - joint_velocities[:-1]) < 1e-3, axis=1)] | ||
| ) | ||
|
|
||
| # Find what steps go from idle to non-idle and vice-versa | ||
| is_idle_padded = np.concatenate( | ||
| [[False], is_idle_array, [False]] | ||
| ) # Start and end with False, so idle at first step is a start of motion | ||
|
|
||
| is_idle_diff = np.diff(is_idle_padded.astype(int)) | ||
| is_idle_true_starts = np.where(is_idle_diff == 1)[0] # +1 transitions --> going from idle to non-idle | ||
| is_idle_true_ends = np.where(is_idle_diff == -1)[0] # -1 transitions --> going from non-idle to idle | ||
|
|
||
| # Find which steps correspond to idle segments of length at least min_idle_len | ||
| true_segment_masks = (is_idle_true_ends - is_idle_true_starts) >= min_idle_len | ||
| is_idle_true_starts = is_idle_true_starts[true_segment_masks] | ||
| is_idle_true_ends = is_idle_true_ends[true_segment_masks] | ||
|
|
||
| keep_mask = np.ones(len(joint_velocities), dtype=bool) | ||
| for start, end in zip(is_idle_true_starts, is_idle_true_ends, strict=True): | ||
| keep_mask[start:end] = False | ||
|
|
||
| # Get all non-idle ranges of at least 16 | ||
| # Same logic as above, but for keep_mask, allowing us to filter out contiguous ranges of length < min_non_idle_len | ||
| keep_padded = np.concatenate([[False], keep_mask, [False]]) | ||
|
|
||
| keep_diff = np.diff(keep_padded.astype(int)) | ||
| keep_true_starts = np.where(keep_diff == 1)[0] # +1 transitions --> going from filter out to keep | ||
| keep_true_ends = np.where(keep_diff == -1)[0] # -1 transitions --> going from keep to filter out | ||
|
|
||
| # Find which steps correspond to non-idle segments of length at least min_non_idle_len | ||
| true_segment_masks = (keep_true_ends - keep_true_starts) >= min_non_idle_len | ||
| keep_true_starts = keep_true_starts[true_segment_masks] | ||
| keep_true_ends = keep_true_ends[true_segment_masks] | ||
|
|
||
| # Add mapping from episode unique ID key to list of non-idle ranges to keep | ||
| keep_ranges_map[key] = [] | ||
| for start, end in zip(keep_true_starts, keep_true_ends, strict=True): | ||
| keep_ranges_map[key].append((int(start), int(end) - filter_last_n_in_ranges)) | ||
|
|
||
| if ep_idx % 1000 == 0: | ||
| with Path(keep_ranges_path).open("w") as f: | ||
| json.dump(keep_ranges_map, f) | ||
|
|
||
| print("Done!") | ||
| with Path(keep_ranges_path).open("w") as f: | ||
| json.dump(keep_ranges_map, f) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.