Skip to content

Commit 3ec5a2a

Browse files
Merge pull request #368 from marvin-hansen/main
Updated parquet to latest version
2 parents 0bc5d64 + 073069e commit 3ec5a2a

File tree

5 files changed

+174
-63
lines changed

5 files changed

+174
-63
lines changed

AGENTS.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,28 @@ src/errors/mod.rs # contains each error type in a separate file
7272
src/traits/mod.rs # contains each trait in a separate file
7373
src/type/mod.rs # contains each type in a separate file
7474

75-
As a rule, one type, one file.
75+
## One type, one Rust module.
7676

77-
Optional src folders
77+
For very small types (total implementation in less than 25 lines), the type is stored in file named as snail_case of the type name. For example:
78+
79+
src/types/small_type.rs
80+
81+
For more complex types, the type is stored a folder module for example,
82+
the type Uncertain is stored in:
83+
84+
src/types/uncertain/mod.rs
85+
86+
The mod.rs contains the type definition and constructors.
87+
88+
When the type implements multiple traits, each trait is stored within
89+
a file named after the implementing trait or trait group. For example,
90+
when implementing PartialEq and Debug for type Uncertain, these would be in
91+
files:
92+
93+
src/types/uncertain/uncertain_debug.rs
94+
src/types/uncertain/uncertain_part_eq.rs
95+
96+
## Optional src folders
7897
src/extensions/mod.rs # contains type extensions i.e. a default impl for a trait
7998
src/utils/mod.rs # contains utils
8099

Cargo.lock

Lines changed: 27 additions & 58 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

deep_causality_discovery/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ version = "0.2.0"
3434
# External dependencies
3535
[dependencies]
3636
csv = {version = "1.4", default-features = false}
37-
parquet = {version = "56", default-features = false}
37+
parquet = {version = "57", default-features = false}
3838

3939

4040
[dev-dependencies]

examples/case_study_icu_sepsis/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ deep_causality_tensor = { path = "../../deep_causality_tensor" }
1616

1717

1818
# Exernal dependencies
19-
arrow-array = {version = "56", default-features = false}
20-
parquet = { version = "56" , default-features = false, features = ["arrow"]}
19+
arrow-array = {version = "57", default-features = false}
20+
parquet = { version = "57", default-features = false, features = ["arrow"]}
2121

2222
[features]
2323
default = []
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# HKT Witness Types for Uncertain and MaybeUncertain
2+
3+
## 1. Introduction
4+
5+
This report investigates the feasibility and implications of introducing Higher-Kinded Type (HKT) witness types for `Uncertain<T>` and `MaybeUncertain<T>` within the `deep_causality_uncertain` crate. The goal is to enhance the composability and abstract programming capabilities of these types by integrating them with the functional programming traits (Functor, Applicative, Monad) provided by the `deep_causality_haft` crate. We will also discuss the sensible exclusion of the `Foldable` trait for these specific types.
6+
7+
## 2. `Uncertain<T>` as an HKT
8+
9+
The `Uncertain<T>` type represents a single value `T` with inherent uncertainty, modeled as a probability distribution. It is generic over a single type parameter `T`, making it a natural candidate for an HKT of kind `* -> *`.
10+
11+
### 2.1. `UncertainWitness` Definition
12+
13+
To integrate `Uncertain<T>` into the HKT system, a zero-sized witness type `UncertainWitness` would be defined:
14+
15+
```rust
16+
pub struct UncertainWitness;
17+
```
18+
19+
### 2.2. `HKT` Trait Implementation
20+
21+
`UncertainWitness` would implement the core `HKT` trait as follows:
22+
23+
```rust
24+
impl HKT for UncertainWitness {
25+
type Type<T> = Uncertain<T>;
26+
}
27+
```
28+
29+
This implementation declares that `UncertainWitness` represents the `Uncertain<T>` type constructor, allowing generic functions to abstract over its "shape."
30+
31+
### 2.3. Feasibility of Functional Traits (`Functor`, `Applicative`, `Monad`)
32+
33+
Implementing `Functor`, `Applicative`, and `Monad` for `UncertainWitness` (and thus for `Uncertain<T>`) is highly feasible and conceptually aligned with the type's probabilistic nature.
34+
35+
* **`Functor` (fmap):**
36+
* **Feasibility:** High. The `Uncertain<f64>::map` and `Uncertain<f64>::map_to_bool` methods already exist, demonstrating the ability to apply a function to the inner value of `Uncertain<T>` while preserving its uncertain context.
37+
* **Implementation:** `fmap` would involve creating a new `Uncertain<B>` whose computation node applies the given function `f: Fn(A) -> B` to the result of sampling the original `Uncertain<A>`.
38+
* **Benefit:** Allows for generic transformations of the uncertain value without altering its underlying probabilistic structure.
39+
40+
* **`Applicative` (pure, apply):**
41+
* **Feasibility:** High.
42+
* **`pure`:** The `Uncertain::<T>::point(value)` constructor already serves the purpose of lifting a pure, certain value into the `Uncertain` context.
43+
* **`apply`:** This operation would involve applying an `Uncertain<Func>` (where `Func` is a function type `Fn(A) -> B`) to an `Uncertain<A>`. The implementation would sample both the uncertain function and the uncertain argument, then apply the sampled function to the sampled argument.
44+
* **Benefit:** Enables combining independent uncertain computations in a structured way, particularly useful for operations with multiple uncertain inputs.
45+
46+
* **`Monad` (bind):**
47+
* **Feasibility:** High.
48+
* **Implementation:** `bind` would take an `Uncertain<A>` and a function `f: Fn(A) -> Uncertain<B>`. It would sample `Uncertain<A>`, apply `f` to the sampled value `A` to get an `Uncertain<B>`, and then sample from this resulting `Uncertain<B>`. This effectively chains dependent uncertain computations.
49+
* **Benefit:** Provides a powerful mechanism for sequencing uncertain operations where the outcome of one uncertain step influences the definition of the next, crucial for complex probabilistic workflows.
50+
51+
## 3. `MaybeUncertain<T>` as an HKT
52+
53+
The `MaybeUncertain<T>` type represents a value that is probabilistically present or absent. If present, its value is itself `Uncertain<T>`. It is also generic over a single type parameter `T`, making it an HKT of kind `* -> *`.
54+
55+
### 3.1. `MaybeUncertainWitness` Definition
56+
57+
A zero-sized witness type `MaybeUncertainWitness` would be defined:
58+
59+
```rust
60+
pub struct MaybeUncertainWitness;
61+
```
62+
63+
### 3.2. `HKT` Trait Implementation
64+
65+
`MaybeUncertainWitness` would implement the core `HKT` trait as follows:
66+
67+
```rust
68+
impl HKT for MaybeUncertainWitness {
69+
type Type<T> = MaybeUncertain<T>;
70+
}
71+
```
72+
73+
### 3.3. Feasibility of Functional Traits (`Functor`, `Applicative`, `Monad`)
74+
75+
Implementing functional traits for `MaybeUncertainWitness` is feasible, but requires careful consideration of the probabilistic presence (`is_present: Uncertain<bool>`).
76+
77+
* **`Functor` (fmap):**
78+
* **Feasibility:** High.
79+
* **Implementation:** `fmap` would apply the function `f` to the inner `Uncertain<A>` only if `is_present` evaluates to `true` during sampling. Otherwise, it would yield `None`. The `is_present` flag would be propagated unchanged.
80+
* **Benefit:** Allows for transformations of the uncertain value while respecting its potential absence.
81+
82+
* **`Applicative` (pure, apply):**
83+
* **Feasibility:** High.
84+
* **`pure`:** Would create a `MaybeUncertain<T>` that is certainly present (`Uncertain::<bool>::point(true)`) and whose value is `Uncertain::<T>::point(value)`.
85+
* **`apply`:** This would involve applying a `MaybeUncertain<Func>` to a `MaybeUncertain<A>`. The `is_present` flags of both would be combined (e.g., using logical AND), and the function would be applied to the inner `Uncertain<A>` only if both are present.
86+
* **Benefit:** Combines independent uncertain computations, correctly propagating the possibility of absence.
87+
88+
* **`Monad` (bind):**
89+
* **Feasibility:** High.
90+
* **Implementation:** `bind` would take a `MaybeUncertain<A>` and a function `f: Fn(A) -> MaybeUncertain<B>`. It would first sample the `is_present` flag of `MaybeUncertain<A>`. If `false`, the result is `None`. If `true`, it samples the inner `Uncertain<A>`, applies `f` to get a `MaybeUncertain<B>`, and then samples this result.
91+
* **Benefit:** Enables sequencing dependent uncertain operations, where the presence or absence of a value at one step affects the next.
92+
93+
## 4. Exclusion of the `Foldable` Trait
94+
95+
While `Functor`, `Applicative`, and `Monad` are highly sensible for `Uncertain<T>` and `MaybeUncertain<T>`, the `Foldable` trait is generally **not a sensible inclusion** for these types.
96+
97+
### 4.1. Conceptual Mismatch
98+
99+
* **`Foldable`'s Purpose:** The `Foldable` trait is designed for data structures that represent *collections* of values (e.g., `Vec`, `Option` as a collection of 0 or 1 items, `BTreeMap` as a collection of key-value pairs). Its primary operation, `fold`, reduces this collection to a single summary value.
100+
* **`Uncertain` and `MaybeUncertain`'s Nature:** `Uncertain<T>` and `MaybeUncertain<T>` represent *single probabilistic entities*, not collections. They encapsulate a single value (or the potential absence of one) whose exact state is unknown until sampled.
101+
102+
### 4.2. Loss of Information
103+
104+
* Applying a traditional `fold` operation to a single `Uncertain<T>` would force it into a single, certain value, thereby destroying the very uncertainty it is designed to model. This would be a lossy and semantically misleading operation.
105+
* While one could conceive of "folding over samples" (e.g., summing up many samples to get an expected value), this is already covered by specific statistical methods like `expected_value()` or `standard_deviation()`. These are specialized aggregations that respect the probabilistic nature, rather than a generic `fold` that implies a structural reduction.
106+
107+
### 4.3. Clarity and Idiomatic Usage
108+
109+
* Excluding `Foldable` maintains clarity about the nature of `Uncertain<T>` and `MaybeUncertain<T>` as single, probabilistic values rather than iterable collections.
110+
* It prevents developers from attempting to use `fold` in ways that might be semantically inappropriate or lead to unexpected results given the probabilistic context.
111+
112+
## 5. Benefits of HKT Integration
113+
114+
Integrating `Uncertain<T>` and `MaybeUncertain<T>` into the HKT system via witness types and functional traits offers several significant advantages:
115+
116+
* **Enhanced Composability:** Allows these types to be seamlessly combined with other HKT-enabled types and abstractions within the `deep_causality` ecosystem and beyond.
117+
* **Increased Genericity:** Enables writing more abstract and reusable code that operates uniformly over any type that exhibits Functor, Applicative, or Monadic behavior, regardless of its specific underlying structure.
118+
* **Improved Code Clarity:** By adhering to well-established functional programming patterns, the code becomes more predictable and easier to reason about, especially when dealing with complex chains of uncertain computations.
119+
* **Stronger Abstraction:** Promotes a higher level of abstraction, separating the "what" (the computation logic) from the "how" (the uncertainty propagation mechanism).
120+
121+
## 6. Conclusion
122+
123+
The integration of `Uncertain<T>` and `MaybeUncertain<T>` with HKT witness types and the `Functor`, `Applicative`, and `Monad` traits from `deep_causality_haft` is highly feasible and recommended. This approach aligns with modern functional programming principles and will significantly enhance the utility and composability of the `deep_causality_uncertain` crate. The deliberate exclusion of the `Foldable` trait is sensible, as it avoids semantic confusion and respects the fundamental nature of these types as single probabilistic entities rather than collections.

0 commit comments

Comments
 (0)