Skip to content

Commit b907a01

Browse files
authored
Merge pull request #182 from jkmckenna/0.2.5
0.2.5
2 parents 762ca16 + 19e32c9 commit b907a01

32 files changed

+333
-149
lines changed

README.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,22 +12,3 @@ The following CLI tools need to be installed and configured before using the inf
1212
1) [Dorado](https://github.com/nanoporetech/dorado) -> Basecalling, alignment, demultiplexing. Required for Nanopore SMF experiments, but not Illumina SMF experiments.
1313
2) [Minimap2](https://github.com/lh3/minimap2) -> Aligner if not using dorado. Support for other aligners could eventually be added if needed.
1414
3) [Modkit](https://github.com/nanoporetech/modkit) -> Extracting read level methylation metrics from the MM/ML tags in BAM files. Only required for direct modification detection SMF protocols.
15-
16-
## Announcements
17-
18-
### 12/02/25 - Version 0.2.3 is available through PyPI
19-
Version 0.2.3 provides the core smftools functionality through several command line commands (load, preprocess, spatial, hmm).
20-
21-
### 11/05/25 - Version 0.2.1 is available through PyPI
22-
Version 0.2.1 makes the core workflow (smftools load) a command line tool that takes in an experiment_config.csv file for input/output and parameter management.
23-
24-
### 05/29/25 - Version 0.1.6 is available through PyPI.
25-
Informatics, preprocessing, tools, plotting modules have core functionality that is approaching stability on MacOS(Intel/Silicon) and Linux(Ubuntu). I will work on improving documentation/tutorials shortly. The base PyTorch/Scikit-Learn ML-infrastructure is going through some organizational changes to work with PyTorch Lightning, Hydra, and WanDB to facilitate organizational scaling, multi-device usage, and logging.
26-
27-
### 10/01/24 - More recent versions are being updated frequently. Installation from source over PyPI is recommended!
28-
29-
### 09/09/24 - The version 0.1.1 package ([smftools-0.1.1](https://pypi.org/project/smftools/)) is installable through pypi!
30-
The informatics module has been bumped to alpha-phase status. This module can deal with POD5s and unaligned BAMS from nanopore conversion and direct SMF experiments, as well as FASTQs from Illumina conversion SMF experiments. Primary output from this module is an AnnData object containing all relevant SMF data, which is compatible with all downstream smftools modules. The other modules are still in pre-alpha phase. Preprocessing, Tools, and Plotting modules should be promoted to alpha-phase within the next month or so.
31-
32-
### 08/30/24 - The version 0.1.0 package ([smftools-0.1.0](https://pypi.org/project/smftools/)) is installable through pypi!
33-
Currently, this package (smftools-0.1.0) is going through rapid improvement (dependency handling accross Linux and Mac OS, testing, documentation, debugging) and is still too early in development for widespread use. The underlying functionality was originally developed as a collection of scripts for single molecule footprinting (SMF) experiments in our lab, but is being packaged/developed to facilitate the expansion of SMF to any lab that is interested in performing these styles of experiments/analyses. The alpha-phase package is expected to be available within a couple months, so stay tuned!

docs/source/basic_usage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This command takes a user passed config file handling:
1313
- Experiment info (SMF modality, sequencer type, barcoding kit if nanopore, sample sheet with metadata mapping)
1414
- Options to override default workflow parameters from smftools/config. Params are handled from default.yaml -> modality_type.yaml -> user passed config.csv.
1515

16-
![](docs/source/_static/smftools_informatics_diagram.png)
16+
![](_static/smftools_informatics_diagram.png)
1717

1818
## Preprocess Usage
1919

@@ -23,7 +23,7 @@ This command performs preprocessing on the anndata object. It automatically runs
2323
smftools preprocess "/Path_to_experiment_config.csv"
2424
```
2525

26-
![](docs/source/_static/smftools_preprocessing_diagram.png)
26+
![](_static/smftools_preprocessing_diagram.png)
2727

2828
## Spatial Usage
2929

docs/source/release-notes/0.1.0.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
(v0.1.0)=
22

33
0.1.0 2024-08-30
4-
smftools initial release. Pre-Alpha phase. J McKenna
4+
smftools initial release. Pre-Alpha phase. J McKenna
5+
Currently, this package (smftools-0.1.0) is going through rapid improvement (dependency handling accross Linux and Mac OS, testing, documentation, debugging) and is still too early in development for widespread use. The underlying functionality was originally developed as a collection of scripts for single molecule footprinting (SMF) experiments in our lab, but is being packaged/developed to facilitate the expansion of SMF to any lab that is interested in performing these styles of experiments/analyses. The alpha-phase package is expected to be available within a couple months, so stay tuned!

docs/source/release-notes/0.1.1.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
(v0.1.1)=
2+
3+
0.1.1 2024-09-09
4+
The informatics module has been bumped to alpha-phase status. This module can deal with POD5s and unaligned BAMS from nanopore conversion and direct SMF experiments, as well as FASTQs from Illumina conversion SMF experiments. Primary output from this module is an AnnData object containing all relevant SMF data, which is compatible with all downstream smftools modules. The other modules are still in pre-alpha phase. Preprocessing, Tools, and Plotting modules should be promoted to alpha-phase within the next month or so.

docs/source/release-notes/0.1.6.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
(v0.1.6)=
2+
3+
0.1.6 2025-05-29
4+
Informatics, preprocessing, tools, plotting modules have core functionality that is approaching stability on MacOS (Intel/Silicon) and Linux (Ubuntu). Documentation/tutorials are still being improved. The base PyTorch/Scikit-Learn ML-infrastructure is going through organizational changes to work with PyTorch Lightning, Hydra, and WanDB to facilitate scaling, multi-device usage, and logging.

docs/source/release-notes/0.2.1.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
(v0.2.1)=
2+
3+
0.2.1 2025-11-05
4+
Version 0.2.1 makes the core workflow (smftools load) a command line tool that takes in an experiment_config.csv file for input/output and parameter management.

docs/source/release-notes/0.2.3.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
(v0.2.3)=
2+
3+
0.2.3 2025-12-02
4+
Version 0.2.3 provides the core smftools functionality through several command line commands (load, preprocess, spatial, hmm).

src/smftools/hmm/HMM.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@
1010
import torch.nn as nn
1111
from scipy.sparse import issparse
1212

13+
from smftools.logging_utils import get_logger
14+
15+
logger = get_logger(__name__)
1316
# =============================================================================
1417
# Registry / Factory
1518
# =============================================================================
@@ -1228,7 +1231,11 @@ def fit_em(
12281231
self._normalize_emission()
12291232

12301233
if verbose:
1231-
print(f"[SingleBernoulliHMM.fit] iter={it} ll_proxy={hist[-1]:.6f}")
1234+
logger.info(
1235+
"[SingleBernoulliHMM.fit] iter=%s ll_proxy=%.6f",
1236+
it,
1237+
hist[-1],
1238+
)
12321239

12331240
if len(hist) > 1 and abs(hist[-1] - hist[-2]) < float(tol):
12341241
break
@@ -1450,7 +1457,11 @@ def fit_em(
14501457
self._normalize_emission()
14511458

14521459
if verbose:
1453-
print(f"[MultiBernoulliHMM.fit] iter={it} ll_proxy={hist[-1]:.6f}")
1460+
logger.info(
1461+
"[MultiBernoulliHMM.fit] iter=%s ll_proxy=%.6f",
1462+
it,
1463+
hist[-1],
1464+
)
14541465

14551466
if len(hist) > 1 and abs(hist[-1] - hist[-2]) < float(tol):
14561467
break
@@ -1783,7 +1794,11 @@ def fit_em(
17831794
self._normalize_trans_by_bin()
17841795

17851796
if verbose:
1786-
print(f"[DistanceBinnedSingle.fit] iter={it} ll_proxy={hist[-1]:.6f}")
1797+
logger.info(
1798+
"[DistanceBinnedSingle.fit] iter=%s ll_proxy=%.6f",
1799+
it,
1800+
hist[-1],
1801+
)
17871802

17881803
if len(hist) > 1 and abs(hist[-1] - hist[-2]) < float(tol):
17891804
break

src/smftools/hmm/call_hmm_peaks.py

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
from pathlib import Path
44
from typing import Any, Dict, Optional, Sequence, Union
55

6+
from smftools.logging_utils import get_logger
7+
8+
logger = get_logger(__name__)
9+
610

711
def call_hmm_peaks(
812
adata,
@@ -107,8 +111,10 @@ def call_hmm_peaks(
107111
candidates = [feature_key]
108112

109113
if not candidates:
110-
print(
111-
f"[call_hmm_peaks] WARNING: no layers found matching '{feature_key}' in ref '{ref}'. Skipping."
114+
logger.warning(
115+
"[call_hmm_peaks] No layers found matching '%s' in ref '%s'. Skipping.",
116+
feature_key,
117+
ref,
112118
)
113119
continue
114120

@@ -121,17 +127,22 @@ def call_hmm_peaks(
121127

122128
for layer_name in candidates:
123129
if layer_name not in adata.layers:
124-
print(
125-
f"[call_hmm_peaks] WARNING: layer '{layer_name}' not in adata.layers; skipping."
130+
logger.warning(
131+
"[call_hmm_peaks] Layer '%s' not in adata.layers; skipping.",
132+
layer_name,
126133
)
127134
continue
128135

129136
# Dense layer data
130137
L = adata.layers[layer_name]
131138
L = L.toarray() if issparse(L) else np.asarray(L)
132139
if L.shape != (adata.n_obs, adata.n_vars):
133-
print(
134-
f"[call_hmm_peaks] WARNING: layer '{layer_name}' has shape {L.shape}, expected ({adata.n_obs}, {adata.n_vars}); skipping."
140+
logger.warning(
141+
"[call_hmm_peaks] Layer '%s' has shape %s, expected (%s, %s); skipping.",
142+
layer_name,
143+
L.shape,
144+
adata.n_obs,
145+
adata.n_vars,
135146
)
136147
continue
137148

@@ -154,7 +165,11 @@ def call_hmm_peaks(
154165
peak_metric, prominence=peak_prom, distance=min_distance
155166
)
156167
if peak_indices.size == 0:
157-
print(f"[call_hmm_peaks] No peaks for layer '{layer_name}' in ref '{ref}'.")
168+
logger.info(
169+
"[call_hmm_peaks] No peaks for layer '%s' in ref '%s'.",
170+
layer_name,
171+
ref,
172+
)
158173
continue
159174

160175
peak_centers = coordinates[peak_indices]
@@ -185,7 +200,7 @@ def call_hmm_peaks(
185200
safe_layer = str(layer_name).replace("/", "_")
186201
fname = output_dir / f"{tag}_{safe_layer}_{safe_ref}_peaks.png"
187202
fig.savefig(fname, bbox_inches="tight", dpi=200)
188-
print(f"[call_hmm_peaks] Saved plot to {fname}")
203+
logger.info("[call_hmm_peaks] Saved plot to %s", fname)
189204
plt.close(fig)
190205
else:
191206
fig.tight_layout()
@@ -285,8 +300,11 @@ def call_hmm_peaks(
285300
else:
286301
adata.var[any_col] = False
287302

288-
print(
289-
f"[call_hmm_peaks] Annotated {len(peak_centers)} peaks for layer '{layer_name}' in ref '{ref}'."
303+
logger.info(
304+
"[call_hmm_peaks] Annotated %s peaks for layer '%s' in ref '%s'.",
305+
len(peak_centers),
306+
layer_name,
307+
ref,
290308
)
291309

292310
# global any-peak across all layers/refs

src/smftools/hmm/display_hmm.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
1+
from smftools.logging_utils import get_logger
2+
3+
logger = get_logger(__name__)
4+
5+
16
def display_hmm(hmm, state_labels=["Non-Methylated", "Methylated"], obs_labels=["0", "1"]):
27
import torch
38

4-
print("\n**HMM Model Overview**")
5-
print(hmm)
9+
logger.info("**HMM Model Overview**")
10+
logger.info("%s", hmm)
611

7-
print("\n**Transition Matrix**")
12+
logger.info("**Transition Matrix**")
813
transition_matrix = torch.exp(hmm.edges).detach().cpu().numpy()
914
for i, row in enumerate(transition_matrix):
1015
label = state_labels[i] if state_labels else f"State {i}"
1116
formatted_row = ", ".join(f"{p:.6f}" for p in row)
12-
print(f"{label}: [{formatted_row}]")
17+
logger.info("%s: [%s]", label, formatted_row)
1318

14-
print("\n**Emission Probabilities**")
19+
logger.info("**Emission Probabilities**")
1520
for i, dist in enumerate(hmm.distributions):
1621
label = state_labels[i] if state_labels else f"State {i}"
1722
probs = dist.probs.detach().cpu().numpy()
1823
formatted_emissions = {obs_labels[j]: probs[j] for j in range(len(probs))}
19-
print(f"{label}: {formatted_emissions}")
24+
logger.info("%s: %s", label, formatted_emissions)

0 commit comments

Comments
 (0)