Skip to content

Commit 705a03f

Browse files
committed
feat(deep_causality_algorithms): Parallelize mRMR feature selection algo.
This commit introduces parallel execution to the mRMR feature selection algorithm in the deep_causality_algorithms crate, significantly improving its performance on large datasets. The core changes include: - Parallelizing the initial and iterative feature selection loops in both mrmr_features_selector and mrmr_features_selector_cdl using the rayon crate. - Guarding all parallel logic with the existing parallel feature flag, ensuring sequential execution remains the default. - Correcting bugs in the test suite and benchmark data generation to ensure robust validation. - Updating documentation to reflect the new parallel execution capability and its performance benefits. The parallel implementation has been verified to be functionally equivalent to the sequential version and demonstrates a significant speedup, as validated by both synthetic benchmarks and a real-world case study on the ICU sepsis dataset. Signed-off-by: Marvin Hansen <[email protected]>
1 parent 9199c39 commit 705a03f

File tree

13 files changed

+465
-324
lines changed

13 files changed

+465
-324
lines changed

Cargo.lock

Lines changed: 4 additions & 156 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

deep_causality_algorithms/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,7 @@ reveal the nature of multi-variable interactions.
5252
* **Performance Optimized**:
5353
* **Algorithmic Capping**: Use the `MaxOrder` enum to limit the analysis to a tractable number of interactions (
5454
e.g., pairwise), reducing complexity from exponential `O(2^N)` to polynomial `O(N^k)`.
55-
* **Parallel Execution**: When compiled with the `parallel` feature flag, the main decomposition loop runs in
56-
parallel across all available CPU cores using `rayon`.
55+
* **Parallel Execution**: When compiled with the `parallel` feature flag, the main decomposition loop of the SURD algorithm and the feature selection loops of the mRMR algorithm run in parallel across all available CPU cores using `rayon`.
5756

5857
## Installation
5958

deep_causality_algorithms/benches/mrmr_benchmark.rs

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
*/
55

66
use criterion::{Criterion, criterion_group, criterion_main};
7-
use deep_causality_algorithms::mrmr::mrmr_features_selector;
7+
use deep_causality_algorithms::mrmr::{mrmr_features_selector, mrmr_features_selector_cdl};
88
use deep_causality_tensor::CausalTensor;
99

1010
fn generate_test_tensor(rows: usize, cols: usize) -> CausalTensor<f64> {
@@ -15,21 +15,40 @@ fn generate_test_tensor(rows: usize, cols: usize) -> CausalTensor<f64> {
1515
CausalTensor::new(data, vec![rows, cols]).unwrap()
1616
}
1717

18+
fn generate_test_tensor_cdl(rows: usize, cols: usize) -> CausalTensor<Option<f64>> {
19+
let mut data = Vec::with_capacity(rows * cols);
20+
for i in 0..(rows * cols) {
21+
if (i * 3 + i / 5) % 13 < 2 {
22+
// A more complex pattern
23+
data.push(None);
24+
} else {
25+
data.push(Some(i as f64));
26+
}
27+
}
28+
CausalTensor::new(data, vec![rows, cols]).unwrap()
29+
}
30+
1831
fn mrmr_benchmark(c: &mut Criterion) {
1932
let mut group = c.benchmark_group("mRMR Feature Selector");
2033

21-
let rows = 100;
22-
let cols = 20;
23-
let num_features_to_select = 5;
34+
let rows = 1000;
35+
let cols = 100;
36+
let num_features_to_select = 10;
2437
let target_col = cols - 1;
2538

26-
// Benchmark the new implementation
27-
group.bench_function("mrmr_features_selector_new_impl", |b| {
28-
let tensor = generate_test_tensor(rows, cols);
39+
// Benchmark the standard implementation
40+
group.bench_function("mrmr_features_selector", |b| {
41+
let mut tensor = generate_test_tensor(rows, cols);
42+
b.iter(|| {
43+
mrmr_features_selector(&mut tensor, num_features_to_select, target_col).unwrap();
44+
});
45+
});
46+
47+
// Benchmark the cdl implementation
48+
group.bench_function("mrmr_features_selector_cdl", |b| {
49+
let tensor = generate_test_tensor_cdl(rows, cols);
2950
b.iter(|| {
30-
// Clone the tensor for each iteration to ensure a fresh state
31-
let mut cloned_tensor = tensor.clone();
32-
mrmr_features_selector(&mut cloned_tensor, num_features_to_select, target_col).unwrap();
51+
mrmr_features_selector_cdl(&tensor, num_features_to_select, target_col).unwrap();
3352
});
3453
});
3554

0 commit comments

Comments
 (0)