Skip to content

Commit a1cd179

Browse files
authored
Bump version to 1.0.0 (#163)
* (chore) version bump to 1.0.0 * (docs) Update curator panel documentation and fix mkdocs formatting
1 parent af33599 commit a1cd179

File tree

9 files changed

+66
-69
lines changed

9 files changed

+66
-69
lines changed

docs/img/curator-exclusion.png

-287 KB
Loading

docs/img/curator-panel.png

28.6 KB
Loading

docs/img/curator-preview.png

-180 KB
Loading

docs/img/curator-setup.png

-130 KB
Loading

docs/user-guide/curator-mode.md

Lines changed: 60 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
# Image Generation Model Dataset Curator
1+
# Dataset Curation
22

3-
The Dataset Curator is a powerful feature in PhotoMapAI designed to help you select a diverse or representative subset of images from a large album. This is particularly useful for creating training datasets for LoRA (Low-Rank Adaptation) models or simply thinning out a large collection using CLIP embeddings as the driver.
3+
The **Dataset Curator** is a powerful feature in PhotoMapAI designed to help you select a diverse or representative subset of images from a large album. This is particularly useful for creating training datasets for LoRA (Low-Rank Adaptation) image generation/classification models or simply reducing the redundancy in a large collection of images.
44

55
![Curator Mode Panel](../img/curator-panel.png)
66

77
## Accessing the Curator
88

9-
1. Open an album in the grid view.
10-
2. Click the **Favorites** menu button (⭐) in the top-right corner.
9+
1. Open an album in the grid or semantic map view.
10+
2. Click the **Favorites** menu button (⭐) in the bottom-right of the window.
1111
3. Select **Curate** (pencil icon 📝) from the dropdown menu.
1212

1313
The curator panel will appear and can be repositioned by dragging its title bar.
@@ -18,59 +18,66 @@ The curator offers two distinct algorithms for selecting images, selectable via
1818

1919
### Diversity (FPS)
2020
**Farthest Point Sampling** selects images that are as different from each other as possible.
21-
- **Best for:** Ensuring your dataset covers the widest possible range of visual concepts, lighting conditions, and angles.
22-
- **When to use:**
23-
- **High Quality Data:** FPS seeks outliers. In a "dirty" dataset, outliers are often blurry or broken images. In a "clean" dataset, outliers are your rare concepts (side profiles, dramatic lighting).
24-
- **Unbalanced Data:** If you have 50 full-body images and 10 close-ups, FPS will prioritize the close-ups to ensure the AI learns the rare concept, rather than just the common one.
25-
- **How it works:** It starts with a random image (or your excluded selection) and iteratively picks the image whose feature vector is farthest from the current set.
21+
22+
* **Best for:** Ensuring your dataset covers the widest possible range of visual concepts, lighting conditions, and angles. Use it for:
23+
24+
* **High Quality Data:** FPS seeks outliers. In a "dirty" dataset, outliers are often blurry or broken images. In a "clean" dataset, outliers are your rare concepts (side profiles, dramatic lighting).
25+
26+
* **Unbalanced Data:** If you have 50 full-body images and 10 close-ups, FPS will prioritize the close-ups to ensure the AI learns the rare concept, rather than just the common one.
27+
28+
* **How it works:** It starts with a random image (or your excluded selection) and iteratively picks the image whose feature vector is farthest from the current set.
2629

2730
### Blocks (K-Means)
2831
**K-Means Clustering** groups images into clusters and picks a representative image from each cluster.
29-
- **Best for:** Reducing redundancy while maintaining the overall distribution of the dataset (Representative Sampling).
30-
- **When to use:**
31-
- **Balanced Distribution:** If you have 50 full-body images and 10 close-ups, K-Means will select roughly 5 full-body images for every 1 close-up, preserving the original ratios of your dataset.
32-
- **How it works:** It divides your images into N clusters (where N is your target count) and selects the image closest to the mathematical center of each cluster.
3332

33+
* **Best for:** Reducing redundancy while maintaining the overall distribution of the dataset (Representative Sampling). Use it for:
34+
35+
* **Balanced Distribution:** If you have 50 full-body images and 10 close-ups, K-Means will select roughly 5 full-body images for every 1 close-up, preserving the original ratios of your dataset.
3436

37+
* **How it works:** It divides your images into N clusters (where N is your target count) and selects the image closest to the mathematical center of each cluster.
3538

3639
## Workflow
40+
41+
<img src="../../img/curator-setup.png" width="480" alt="Curator Panel Setup" class="img-hover-zoom">
42+
3743
1. **Setup your UI**:
3844
- When the curator panel opens, the UMAP visualization automatically switches to grey mode - all points turn grey to make the colored selection overlays more visible.
3945
- Unclustered points (normally very faint) increase in opacity to match clustered points, providing a uniform background.
40-
- Recommend turning off "Show landmarks" and "Show hover thumbnails" in the UMAP controls for a cleaner view.
41-
42-
![Curator Mode Setup](../img/curator-setup.png)
43-
46+
- It is recommended to turn off "Show landmarks" and "Show hover thumbnails" in the UMAP controls for a cleaner view.
4447
2. **Set Target Count**: Choose how many images you want in your final set (e.g., 50, 150).
4548
3. **Set Iterations**:
4649
- Algorithms like FPS can be sensitive to the starting point. Running multiple iterations (Monte Carlo simulation) helps identify the "consensus" selections—images that are statistically important regardless of the random start.
4750
- **Recommendation:** Set to 20 iterations for analysis.
48-
4. **Run Selection**: Click **Select Training Set** to select a diverse distribution of images.
49-
- A yellow-and-white progress bar appears below the title, showing real-time progress (e.g., "Iteration 5/20").
51+
4. **Run Selection**: Click **Select Images** (circled button) to select a diverse distribution of images.
52+
- A yellow-and-white progress bar appears below the title, showing the progress of the selected algorithm.
5053

51-
![Curator Mode Preview](../img/curator-preview.png)
54+
<img src="../../img/curator-preview.png" width="480" alt="After selecting images" class="img-hover-zoom">
5255

5356
### Stability Heatmap
5457
The results are displayed as a Stability Heatmap:
55-
- 🟣 **Magenta**: Core Outliers (Selected in >90% of runs). These are your most mathematically unique images.
56-
- 🔵 **Cyan**: Stable (Selected in >70% of runs).
57-
- 🟢 **Green**: Variable (Selected in <70% of runs). Edge cases that usually fill gaps.
5858

59-
Unselected images will be dimmed. When you have an active curation selection, the "Exit Search" button appears, allowing you to clear the selection and return to normal view.
59+
* 🟣 **Magenta**: Core Outliers (Selected in >90% of runs). These are your most mathematically unique images.
60+
* 🔵 **Cyan**: Stable (Selected in >70% of runs).
61+
* 🟢 **Green**: Variable (Selected in <70% of runs). Edge cases that usually fill gaps.
62+
63+
If you now open the grid view (by hiding or minimizing the semantic map window) you will see the selected images at full brightness, while others will be dimmed. Press the "Clear" button on the curator panel or the "X" button on the bottom right of the main window, in order to clear the search and return to the normal view.
6064

6165
## Refinement & Exclusion
62-
You can manually refine the selection by "Excluding" images. Excluding an image removes it from calculations and exports.
66+
You can manually refine the selection by excluding images. Excluding an image removes it from calculations and exports. This is commonly needed when your collection contains "garbage", such as blank or blurry images, that appear to the algorithm as interesting outliers.
6367

6468
This allows for a "Drill Down" workflow:
69+
6570
1. Run the analysis.
66-
2. If the top results (Magenta) are garbage (e.g., blurry images), Exclude them.
67-
3. Run Select Diverse Images again. The algorithm is forced to ignore the excluded images and find the next best candidates.
6871

69-
- **Click-to-Exclude**: Toggle this mode and click images in the grid (or UMAP) to exclude/include them. Excluded images appear with a **Red Border**.
70-
- **Exclude Matches**: Bulk-exclude all images that meet a certain frequency threshold (e.g., >90%).
71-
- **Clear Exclusions**: Clear all exclusions and restart the analysis.
72+
2. If the top results (🟣 **Magenta**) are garbage, exclude them.
7273

73-
![Exclusion Example](../img/curator-exclusion.png)
74+
3. Run **Select Images** again. The algorithm will be forced to ignore the excluded images and find the next best candidates.
75+
76+
- **Click-to-Exclude**: Toggle this mode and click images in the grid (or UMAP) to exclude/include them. Excluded images appear as solid 🔴 **Red** circles. (See image below. Yellow arrows added for emphasis.)
77+
- **Exclude Matches**: Bulk-exclude all images that meet a certain frequency threshold (e.g., >90%).
78+
- **Clear Exclusions**: Clear all exclusions in order to start over.
79+
80+
<img src="../../img/curator-exclusion.png" width="480" alt="After excluding garbage" class="img-hover-zoom">
7481

7582
## Recommended Workflows
7683

@@ -87,40 +94,30 @@ This allows for a "Drill Down" workflow:
8794
2. Set **Target Count** to your desired training size (e.g., 150).
8895
3. Set **Iterations** to 20.
8996
4. Click **Select Training Set**.
90-
5. Review the selection. If you see images you don't want in your LoRA, **Exclude** them and run Select Diverse Images again to replace them with fresh alternatives.
91-
6. **Export Dataset**.
97+
5. Review the selection. If you see images you don't want in your training set, **Exclude** them and run **Select Images** again to replace them with fresh alternatives.
98+
6. Repeat as needed.
9299

93100
## Exporting
94-
![Ready To Export Example](../img/curator-readytoexport.png)
95-
Once you are satisfied with your selection (Magenta/Cyan/Green images):
101+
102+
Once you are satisfied with your selection:
103+
96104
1. Click the folder icon (📁) next to the **Export Path** field to browse for a destination folder.
97105
- The selected path is saved in your browser and persists across sessions.
98-
- The Export Dataset button remains disabled until a valid path is selected.
106+
99107
2. Click **Export Dataset**.
100-
3. The system will copy the selected images (and associated text files) to the folder.
101-
4. Click the **CSV** button to export data on the included and excluded files.
102-
5. Click the **Set Favorites** button (⭐) to replace your current favorites with the curated selection.
103-
- The star button is disabled when there's no selection.
104-
- This provides quick access to your curated images for review.
105-
106-
* *Note: Text files are also exported! If you have 0001.jpg and 0001.txt in the album, they will be exported together.*
107-
* *Note: Excluded (Red) images are NOT exported.*
108-
* *Note: Filename collisions (e.g. apple/01.jpg vs orange/01.jpg) are automatically handled by renaming.*
109-
110-
## Clearing Results
111-
When you have an active curation selection, the "Exit Search" button becomes visible in the search panel. Click it to:
112-
- Clear the curation selection
113-
- Remove colored overlays from the UMAP
114-
- Return the UMAP to normal cluster colors
115-
- Hide the "Exit Search" button
116-
117-
## Visual Feedback
118-
- **Panel Position**: The curator panel can be dragged by its title bar to any position on screen
119-
- **UMAP Integration**: When the panel is open, the UMAP automatically adjusts:
120-
- All points turn grey for better contrast with selection colors
121-
- Unclustered points become fully visible (opacity 0.75)
122-
- The current image marker (yellow dot) remains visible
123-
- **Progress Tracking**: Real-time iteration progress with accurate percentage display
124-
- **Button States**: All action buttons (Export, CSV, Set Favorites) are intelligently enabled/disabled based on selection state
125-
126-
### Contact /u/AcadiaVivid on reddit or NMWave on github for more info on implementation.
108+
109+
3. The system will copy the selected images (and associated text files, see below) to the folder. The original images will remain in place.
110+
111+
4. Click the **CSV** button to export a tab-delimited inventory of the included and excluded files.
112+
113+
At any point, you may also click the **Set Favorites** button (⭐) to replace your current favorites with the curated selection. This allows you to show and hide the selection conveniently using the **Favorites** menu, as well as to move the selected images to a new folder while preserving them in the index.
114+
115+
## Notes
116+
117+
* *Text files are also exported! If you have 0001.jpg and 0001.txt in the album, they will be exported together. This is useful for maintaining external text annotations of images.
118+
* *Excluded (Red) images are NOT exported.
119+
* *Filename collisions (e.g. apple/01.jpg vs orange/01.jpg) are automatically handled by renaming.
120+
121+
## For More Information
122+
123+
Contact */u/AcadiaVivid* on reddit or *NMWave* on github for assistance and information.

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ nav:
1414
- Albums: user-guide/albums.md
1515
- Configuration: user-guide/configuration.md
1616
- Semantic Map: user-guide/semantic-map.md
17-
- Curator Mode: user-guide/curator-mode.md
17+
- Image Dataset Curation: user-guide/curator-mode.md
1818
- Updating PhotoMapAI: user-guide/upgrading.md
1919
- Keyboard Shortcuts: user-guide/keyboard-shortcuts.md
2020
- Running from Docker: docker.md

photomap/frontend/static/javascript/curation.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
import { bookmarkManager } from './bookmarks.js';
22
import { createSimpleDirectoryPicker } from './filetree.js';
33
import { updateSearchCheckmarks } from './search-ui.js';
4-
import { state } from './state.js';
54
import { slideState } from './slide-state.js';
5+
import { state } from './state.js';
66
import { highlightCurationSelection, setCurationMode, setUmapClickCallback, updateCurrentImageMarker } from './umap.js';
77
import { hideSpinner, showSpinner } from './utils.js';
88

@@ -265,7 +265,7 @@ function setupEventListeners() {
265265
if (lockThresholdBtn) {
266266
lockThresholdBtn.onclick = () => {
267267
if (analysisResults.length === 0) {
268-
setStatus("No analysis data. Run training set selection first.", "error");
268+
setStatus("No analysis data. Run Select Images first.", "error");
269269
return;
270270
}
271271

photomap/frontend/templates/modules/curation.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<!-- Curation / LoRA Training Panel -->
22
<div id="curationPanel" class="curation-panel hidden">
33
<div class="curation-header">
4-
<h3>Model Training Dataset Curator</h3>
4+
<h3>Dataset Curator</h3>
55
<button id="curationCloseBtn" class="close-icon">&times;</button>
66
</div>
77

@@ -64,7 +64,7 @@ <h3>Model Training Dataset Curator</h3>
6464

6565
<!-- Main Actions -->
6666
<div class="action-row" style="display: flex; gap: 5px; margin-bottom: 15px;">
67-
<button id="curationRunBtn" class="btn-primary" style="flex: 2;">Select Training Set</button>
67+
<button id="curationRunBtn" class="btn-primary" style="flex: 2;">Select Images</button>
6868
<button id="curationClearBtn" class="btn-secondary" style="flex: 1; background: #444;">Clear</button>
6969
</div>
7070

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "photomapai"
7-
version = "0.9.7"
7+
version = "1.0.0"
88
description = "AI-based image clustering and exploration tool"
99
authors = [
1010
{ name = "Lincoln Stein", email = "[email protected]" }

0 commit comments

Comments
 (0)