Skip to content

Commit 864c883

Browse files
authored
Merge pull request #227 from siapy/fix
2 parents d466307 + c6bb864 commit 864c883

19 files changed

+543
-7
lines changed

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ repos:
1414
- id: end-of-file-fixer
1515
- id: trailing-whitespace
1616
- repo: https://github.com/astral-sh/ruff-pre-commit
17-
rev: v0.11.6
17+
rev: v0.11.13
1818
hooks:
1919
- id: ruff
2020
args:
@@ -23,7 +23,7 @@ repos:
2323
files: ^(siapy|tests)/
2424
- id: ruff-format
2525
- repo: https://github.com/gitleaks/gitleaks
26-
rev: v8.24.3
26+
rev: v8.27.2
2727
hooks:
2828
- id: gitleaks
2929
- repo: https://github.com/codespell-project/codespell
@@ -33,7 +33,7 @@ repos:
3333
additional_dependencies:
3434
- tomli
3535
- repo: https://github.com/compilerla/conventional-pre-commit
36-
rev: v4.0.0
36+
rev: v4.2.0
3737
hooks:
3838
- id: conventional-pre-commit
3939
stages: [commit-msg]

docs/concepts/datasets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
??? note "API Documentation"
44
`siapy.datasets`
55

6-
The `datasets` module provides structured containers and utilities for transforming spectral image data into formats optimized for analysis and machine learning. It bridges the gap between raw spectral data and analytical workflows.
6+
The datasets module provides structured containers and utilities for transforming spectral image data into formats optimized for analysis and machine learning. It bridges the gap between raw spectral data and analytical workflows.
77

88
```python
99
--8<-- "docs/concepts/src/datasets_01.py"

docs/concepts/features.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Features
2+
3+
??? note "API Documentation"
4+
`siapy.features`
5+
6+
The features module provides automated feature engineering and selection capabilities specifically designed for spectral data analysis.
7+
8+
## Spectral Indices
9+
10+
??? api "API Documentation"
11+
[`siapy.features.spectral_indices`][siapy.features.spectral_indices]
12+
13+
Spectral indices are mathematical combinations of spectral bands that highlight specific characteristics of materials or conditions. The module provides functions to discover available indices and compute them from spectral data.
14+
15+
### Getting available indices
16+
17+
The `get_spectral_indices()` function returns all spectral indices that can be computed from the available bands:
18+
19+
```python
20+
--8<-- "docs/concepts/src/features_01.py"
21+
```
22+
23+
### Computing spectral indices
24+
25+
The `compute_spectral_indices()` function calculates spectral indices from DataFrame data:
26+
27+
```python
28+
--8<-- "docs/concepts/src/features_02.py"
29+
```
30+
31+
### Band mapping
32+
33+
When your data uses non-standard column names, use the `bands_map` parameter:
34+
35+
```python
36+
--8<-- "docs/concepts/src/features_03.py:map"
37+
```
38+
39+
## Automatic features generation
40+
41+
??? api "API Documentation"
42+
[`siapy.features.AutoFeatClassification`][siapy.features.AutoFeatClassification]<br>
43+
[`siapy.features.AutoFeatRegression`][siapy.features.AutoFeatRegression]<br>
44+
[`siapy.features.AutoSpectralIndicesClassification`][siapy.features.AutoSpectralIndicesClassification]<br>
45+
[`siapy.features.AutoSpectralIndicesRegression`][siapy.features.AutoSpectralIndicesRegression]
46+
47+
### Mathematically extracted features
48+
49+
The AutoFeat classes provide deterministic wrappers around the AutoFeat library, which automatically generates and selects engineered features through symbolic regression.
50+
51+
```python
52+
--8<-- "docs/concepts/src/features_04.py"
53+
```
54+
55+
### Features extracted using spectral indices
56+
57+
These classes integrate spectral index computation with automated feature selection, offering end-to-end pipelines for identifying the most relevant spectral indices.
58+
59+
```python
60+
--8<-- "docs/concepts/src/features_05.py"
61+
```
62+
63+
## Integration with siapy enitites
64+
65+
The features module integrates seamlessly with siapy entity system.
66+
67+
```python
68+
--8<-- "docs/concepts/src/features_06.py"
69+
```

docs/concepts/optimizers.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Optimizers
2+
3+
??? note "API Documentation"
4+
`siapy.optimizers`
5+
6+
The optimizers module provides hyperparameter optimization capabilities for machine learning models used in spectral data analysis. It integrates with Optuna for efficient hyperparameter search and includes other evaluation tools.
7+
8+
## Tabular Optimizer
9+
10+
??? api "API Documentation"
11+
[`siapy.optimizers.optimizers.TabularOptimizer`][siapy.optimizers.optimizers.TabularOptimizer]<br>
12+
[`siapy.optimizers.configs.TabularOptimizerConfig`][siapy.optimizers.configs.TabularOptimizerConfig]
13+
14+
The `TabularOptimizer` class provides automated hyperparameter optimization for sklearn-compatible models using tabular spectral data.
15+
16+
```python
17+
--8<-- "docs/concepts/src/optimizers_01.py"
18+
```
19+
20+
## Trial Parameters
21+
22+
??? api "API Documentation"
23+
[`siapy.optimizers.parameters.TrialParameters`][siapy.optimizers.parameters.TrialParameters]<br>
24+
[`siapy.optimizers.parameters.IntParameter`][siapy.optimizers.parameters.IntParameter]<br>
25+
[`siapy.optimizers.parameters.FloatParameter`][siapy.optimizers.parameters.FloatParameter]<br>
26+
[`siapy.optimizers.parameters.CategoricalParameter`][siapy.optimizers.parameters.CategoricalParameter]
27+
28+
Trial parameters define the hyperparameter search space for optimization. You can specify integer, float, and categorical parameters:
29+
30+
```python
31+
--8<-- "docs/concepts/src/optimizers_02.py"
32+
```
33+
34+
## Scorers
35+
36+
??? api "API Documentation"
37+
[`siapy.optimizers.scorers.Scorer`][siapy.optimizers.scorers.Scorer]
38+
39+
Scorers define how model performance is evaluated during optimization.
40+
41+
### Cross-validation scorer
42+
43+
Use cross-validation for robust model evaluation:
44+
45+
```python
46+
--8<-- "docs/concepts/src/optimizers_03.py"
47+
```
48+
49+
### Hold-out scorer
50+
51+
Use hold-out validation for faster evaluation:
52+
53+
```python
54+
--8<-- "docs/concepts/src/optimizers_04.py"
55+
```
56+
57+
## Integration with siapy entities
58+
59+
The optimizers module integrates seamlessly with the siapy entity system.
60+
61+
```python
62+
--8<-- "docs/concepts/src/optimizers_05.py"
63+
```

docs/concepts/src/features_01.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
from siapy.features.spectral_indices import get_spectral_indices
2+
3+
# Get indices computable from Red and Green bands
4+
bands = ["R", "G"]
5+
available_indices = get_spectral_indices(bands)
6+
print(f"Found {len(available_indices)} indices")
7+
8+
# Display the names and long names of the available indices
9+
for name, index in list(available_indices.items()):
10+
print(f"{name}: {index.long_name}")

docs/concepts/src/features_02.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import numpy as np
2+
import pandas as pd
3+
4+
from siapy.features.spectral_indices import compute_spectral_indices
5+
6+
# Create sample spectral data
7+
np.random.seed(42)
8+
data = pd.DataFrame(
9+
{
10+
"R": np.random.random(100),
11+
"G": np.random.random(100),
12+
}
13+
)
14+
15+
indices_df = compute_spectral_indices(
16+
data=data,
17+
spectral_indices=["BIXS", "RI"], # Indices to compute
18+
)
19+
print(f"Computed indices\n: {indices_df.head()}")

docs/concepts/src/features_03.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import numpy as np
2+
import pandas as pd
3+
4+
from siapy.features.spectral_indices import compute_spectral_indices
5+
6+
# --8<-- [start:map]
7+
# Data with custom column names
8+
custom_data = pd.DataFrame(
9+
{"red_band": np.random.random(100), "green_band": np.random.random(100), "nir_band": np.random.random(100)}
10+
)
11+
12+
# Map custom names to standard band acronyms
13+
bands_map = {"red_band": "R", "green_band": "G", "nir_band": "N"}
14+
15+
indices_df = compute_spectral_indices(data=custom_data, spectral_indices=["NDVI", "GNDVI"], bands_map=bands_map)
16+
17+
# --8<-- [end:map]
18+
print(f"Computed indices with custom mapping:\n{indices_df.head()}")

docs/concepts/src/features_04.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import pandas as pd
2+
from sklearn.datasets import make_classification
3+
4+
from siapy.features import AutoFeatClassification
5+
6+
# Generate sample data
7+
X, y = make_classification(n_samples=200, n_features=5, random_state=42)
8+
data = pd.DataFrame(X, columns=[f"f{i}" for i in range(5)])
9+
target = pd.Series(y)
10+
11+
# Create and configure AutoFeat
12+
autofeat = AutoFeatClassification(
13+
random_seed=42, # For reproducibility
14+
verbose=1, # Show progress
15+
)
16+
17+
# Fit and transform
18+
features_engineered = autofeat.fit_transform(data, target)
19+
print(f"Original features: {data.shape[1]}")
20+
print(f"Engineered features: {features_engineered.shape[1]}")

docs/concepts/src/features_05.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import pandas as pd
2+
from sklearn.datasets import make_classification
3+
4+
from siapy.features import AutoSpectralIndicesClassification
5+
from siapy.features.helpers import FeatureSelectorConfig
6+
from siapy.features.spectral_indices import get_spectral_indices
7+
8+
# Create spectral-like data
9+
X, y = make_classification(n_samples=300, n_features=4, random_state=42)
10+
data = pd.DataFrame(X, columns=["R", "G", "B", "N"]) # Red, Green, Blue, NIR
11+
target = pd.Series(y)
12+
13+
# Get available spectral indices
14+
available_indices = get_spectral_indices(["R", "G", "B", "N"])
15+
print(f"Available indices: {len(available_indices)}")
16+
17+
# Configure feature selection
18+
config = FeatureSelectorConfig(
19+
k_features=(5, 20), # Select 5-20 best indices
20+
cv=5, # Cross-validation for feature selection
21+
verbose=1,
22+
)
23+
24+
# Create automated spectral indices classifier
25+
auto_spectral = AutoSpectralIndicesClassification(
26+
spectral_indices=list(available_indices.keys()),
27+
selector_config=config,
28+
merge_with_original=True, # Include original bands
29+
)
30+
31+
# Fit and transform
32+
enhanced_features = auto_spectral.fit_transform(data, target)
33+
print(f"\nOriginal features: {data.shape[1]}")
34+
print(f"Enhanced features: {enhanced_features.shape[1]}")

docs/concepts/src/features_06.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
import numpy as np
2+
import pandas as pd
3+
from sklearn.datasets import make_classification
4+
5+
from siapy.entities import Pixels, Signatures, SpectralImage
6+
from siapy.features import AutoSpectralIndicesClassification
7+
from siapy.features.helpers import FeatureSelectorConfig
8+
from siapy.features.spectral_indices import compute_spectral_indices, get_spectral_indices
9+
10+
# Create a mock spectral image with 4 bands (Red, Green, Blue, Near-infrared)
11+
rng = np.random.default_rng(seed=42)
12+
image_array = rng.random((50, 50, 4)) # height, width, bands (R, G, B, N)
13+
image = SpectralImage.from_numpy(image_array)
14+
15+
# Define region of interest (ROI) pixels for sampling
16+
roi_pixels = Pixels.from_iterable(
17+
[(10, 15), (12, 18), (15, 20), (18, 22), (20, 25), (25, 30), (28, 32), (30, 35), (32, 38), (35, 40)]
18+
)
19+
20+
# Extract spectral signatures from ROI pixels
21+
signatures = image.to_signatures(roi_pixels)
22+
print(f"Extracted {len(signatures)} signatures from the image")
23+
24+
# Convert signatures to DataFrame and assign standard band names
25+
spectral_data = signatures.signals.df.copy()
26+
spectral_data = spectral_data.rename(columns=dict(zip(spectral_data.columns, ["R", "G", "B", "N"])))
27+
28+
# Create synthetic classification labels for demonstration purposes
29+
_, y = make_classification(n_samples=len(spectral_data), n_features=4, random_state=42)
30+
target = pd.Series(y[: len(spectral_data)])
31+
32+
# Get all spectral indices that can be computed with available bands
33+
available_indices = get_spectral_indices(["R", "G", "B", "N"])
34+
print(f"Found {len(available_indices)} computable spectral indices")
35+
36+
# Method 1: Manually compute spectral indices
37+
indices_df = compute_spectral_indices(
38+
data=spectral_data,
39+
spectral_indices=list(available_indices.keys())[:10], # Use first 10 indices
40+
)
41+
print(f"Computed {indices_df.shape[1]} spectral indices")
42+
43+
# Method 2: Automated feature selection with spectral indices
44+
# Configure the feature selector
45+
config = FeatureSelectorConfig(
46+
k_features=5, # Select 5 best performing indices
47+
cv=3, # Use 3-fold cross-validation
48+
verbose=0,
49+
)
50+
51+
# Create automated selector that finds optimal spectral indices
52+
auto_spectral = AutoSpectralIndicesClassification(
53+
spectral_indices=list(available_indices.keys())[:15], # Use first 15 indices as candidates
54+
selector_config=config,
55+
merge_with_original=False, # Return only selected indices, not original bands
56+
)
57+
58+
# Apply feature selection to find the best spectral indices
59+
selected_features = auto_spectral.fit_transform(spectral_data, target)
60+
print(f"Selected {selected_features.shape[1]} optimal spectral indices")
61+
62+
# Create new signatures object with selected features
63+
enhanced_signatures = Signatures.from_signals_and_pixels(signals=selected_features, pixels=signatures.pixels)
64+
print(f"Created enhanced signatures with shape: {enhanced_signatures.signals.df.shape}")
65+
66+
# Display results - the enhanced signatures contain only the most informative spectral indices
67+
print(f"Enhanced signatures DataFrame:\n{enhanced_signatures.to_dataframe().head()}")

0 commit comments

Comments
 (0)