Skip to content

Commit ba7bd1e

Browse files
doublegateclaude
andcommitted
chore: bump version to v2.3.4
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 8c72e54 commit ba7bd1e

File tree

21 files changed

+680
-233
lines changed

21 files changed

+680
-233
lines changed

CHANGELOG.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,58 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
---
1111

12+
## [2.3.4] - 2026-01-30 - Performance Optimizations & Security Hardening
13+
14+
### Overview
15+
16+
Release implementing 18 performance optimization proposals (T5.1-T5.18) identified in benchmark analysis, with focus on zero-copy operations, memory allocation reduction, and cryptographic hardening. Key improvements: WebSocket mimicry frame wrapping 55-85% faster, DoH tunnel creation 70-86% faster, frame pipeline 11-30% faster, message header deserialization 53% faster, Noise handshake 2.6% faster, plus security enhancements including `zeroize` for secure memory cleanup.
17+
18+
### Changed
19+
20+
#### Performance Optimizations - Obfuscation (wraith-obfuscation)
21+
- **T5.6: WebSocket mimicry frame wrapping**: 55-85% faster via pre-allocated buffers and 4-byte chunked XOR masking (1456B: 4.01 GiB/s → 7.45 GiB/s, 65KB: 3.08 GiB/s → 5.78 GiB/s)
22+
- **T5.7: WebSocket RNG optimization**: Struct-level `Mutex<SmallRng>` replacing per-call RNG creation for mask key generation
23+
- **T5.12: DoH tunnel query creation**: 70-86% faster via pre-allocated Vec and single allocation (244B: 12.8 GiB/s → 45.2 GiB/s, 512B: 12.3 GiB/s → 22.0 GiB/s)
24+
- **T5.13: DoH tunnel zero-copy parsing**: New `parse_dns_response_slice` API avoiding allocation for in-memory responses
25+
- **T5.14: DNS label length validation**: Added RFC compliance checks for 63-byte label length limits
26+
- **T5.15: DoH response bounds-checking**: Hardened response parsing against malformed data
27+
28+
#### Performance Optimizations - Core (wraith-core)
29+
- **T5.4: Frame full pipeline optimization**: 11-30% faster via `Vec::with_capacity` and unsafe `set_len` eliminating redundant zero-initialization (1456B: 5.85 GiB/s → 7.62 GiB/s, 65KB: 8.04 GiB/s → 8.88 GiB/s)
30+
- **T5.8: Frame padding RNG optimization**: Thread-local `RefCell<SmallRng>` caching eliminating per-call RNG creation (3 call sites: `pad_to_multiple`, `Frame::new`, `Frame::build`)
31+
- **T5.9: Frame build delegation**: `build()` delegates to `build_into()` reducing code duplication and maintenance burden
32+
- **T5.10: Ratchet error path optimization**: `#[cold]` annotation on key-commitment parsing error path
33+
- **T5.11: Frame benchmarks expansion**: New `build_into_from_parts` benchmark covering 5 payload sizes (64B-65KB)
34+
35+
#### Performance Optimizations - Crypto (wraith-crypto)
36+
- **T5.16: Message header deserialization**: 53% faster via direct buffer read and offset calculation (25.6 ns → 12.0 ns)
37+
- **T5.17: Noise handshake optimization**: 2.6% faster via reduced allocations and streamlined validation (25.1 us → 24.4 us)
38+
39+
#### Security Hardening - Files (wraith-files)
40+
- **T5.18: Secure memory cleanup**: Added `zeroize` on `IncrementalTreeHasher` drop for secure erasure of in-progress hash state
41+
42+
#### Regression Fixes During Development
43+
- **T5.1 (reverted)**: Read offset cursor pattern caused unbounded buffer growth in repeated `build_into` calls
44+
- **T5.2 (reverted)**: `Vec::with_capacity(1MB)` pre-allocation introduced 1MB allocation overhead per frame construction
45+
- **T5.4 (fixed)**: Eliminated zero-initialization overhead in frame build via unsafe `set_len` after explicit writes
46+
- **T5.5 (reverted)**: Removed `#[inline]` on `from_bytes()` - caused instruction cache pressure negating benefits
47+
48+
### Added
49+
50+
#### Dependencies
51+
- **wraith-files**: `zeroize` crate for secure memory cleanup of cryptographic material
52+
53+
#### Documentation
54+
- **Benchmark Analysis**: `docs/testing/BENCHMARK-ANALYSIS-v2.3.4.md` - Comprehensive performance analysis and optimization proposals
55+
56+
### Security
57+
58+
- Secure memory cleanup in file hashing preventing potential key material leakage
59+
- Hardened DoH response parsing against malformed DNS packets
60+
- DNS label length validation per RFC specifications
61+
62+
---
63+
1264
## [2.3.2] - 2026-01-29 - Benchmark-Driven Performance & Security Optimizations
1365

1466
### Overview

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Guidance for Claude Code when working with this repository.
66

77
WRAITH (Wire-speed Resilient Authenticated Invisible Transfer Handler) is a decentralized secure file transfer protocol implemented in Rust.
88

9-
**Status:** v2.3.2 - Benchmark-Driven Performance & Security Optimizations
9+
**Status:** v2.3.4 - Performance Optimizations & Security Hardening
1010

1111
### Metrics
1212
| Metric | Value |

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ exclude = [
3939
]
4040

4141
[workspace.package]
42-
version = "2.3.3"
42+
version = "2.3.4"
4343
edition = "2024"
4444
rust-version = "1.88"
4545
license = "MIT"
@@ -68,7 +68,7 @@ x25519-dalek = { version = "2.0", features = ["static_secrets"] }
6868
ed25519-dalek = { version = "2.1", features = ["rand_core"] }
6969
blake3 = "1.5"
7070
snow = "0.10"
71-
rand = "0.8"
71+
rand = { version = "0.8", features = ["small_rng"] }
7272
rand_core = { version = "0.6", features = ["getrandom"] }
7373
rand_distr = "0.4"
7474
zeroize = { version = "1.7", features = ["derive"] }

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ A decentralized secure file transfer protocol optimized for high-throughput, low
1414
<a href="https://github.com/doublegate/WRAITH-Protocol/actions/workflows/codeql.yml"><img src="https://github.com/doublegate/WRAITH-Protocol/actions/workflows/codeql.yml/badge.svg" alt="CodeQL"></a>
1515
<a href="https://github.com/doublegate/WRAITH-Protocol/actions/workflows/release.yml"><img src="https://github.com/doublegate/WRAITH-Protocol/actions/workflows/release.yml/badge.svg" alt="Release"></a>
1616
<br>
17-
<a href="https://github.com/doublegate/WRAITH-Protocol/releases"><img src="https://img.shields.io/badge/version-2.3.2-blue.svg" alt="Version"></a>
17+
<a href="https://github.com/doublegate/WRAITH-Protocol/releases"><img src="https://img.shields.io/badge/version-2.3.4-blue.svg" alt="Version"></a>
1818
<a href="docs/security/SECURITY_AUDIT_v1.1.0.md"><img src="https://img.shields.io/badge/security-audited-green.svg" alt="Security"></a>
1919
<a href="https://www.rust-lang.org/"><img src="https://img.shields.io/badge/rust-1.88%2B-orange.svg" alt="Rust"></a>
2020
<a href="https://doc.rust-lang.org/edition-guide/rust-2024/index.html"><img src="https://img.shields.io/badge/edition-2024-orange.svg" alt="Edition"></a>

crates/wraith-core/benches/frame_bench.rs

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use criterion::{Criterion, Throughput, criterion_group, criterion_main};
22
use std::hint::black_box;
3-
use wraith_core::{FRAME_HEADER_SIZE, Frame, FrameBuilder, FrameType};
3+
use wraith_core::{FRAME_HEADER_SIZE, Frame, FrameBuilder, FrameType, build_into_from_parts};
44

55
fn bench_frame_parse(c: &mut Criterion) {
66
let frame_data = FrameBuilder::new()
@@ -306,6 +306,40 @@ fn bench_frame_full_pipeline(c: &mut Criterion) {
306306
group.finish();
307307
}
308308

309+
fn bench_frame_build_into_from_parts(c: &mut Criterion) {
310+
let sizes: Vec<(usize, &str)> = vec![
311+
(64, "64_bytes"),
312+
(256, "256_bytes"),
313+
(512, "512_bytes"),
314+
(1024, "1024_bytes"),
315+
(1456, "1456_bytes"),
316+
];
317+
318+
let mut group = c.benchmark_group("frame_build_into_from_parts");
319+
320+
for (size, name) in sizes {
321+
let payload_len = size.saturating_sub(FRAME_HEADER_SIZE);
322+
let payload = vec![0x42; payload_len];
323+
324+
group.throughput(Throughput::Bytes(size as u64));
325+
group.bench_function(name, |b| {
326+
let mut buf = vec![0u8; size];
327+
b.iter(|| {
328+
build_into_from_parts(
329+
black_box(FrameType::Data),
330+
black_box(42),
331+
black_box(1000),
332+
black_box(0),
333+
black_box(&payload),
334+
black_box(&mut buf),
335+
)
336+
})
337+
});
338+
}
339+
340+
group.finish();
341+
}
342+
309343
criterion_group!(
310344
benches,
311345
bench_frame_parse,
@@ -318,6 +352,7 @@ criterion_group!(
318352
bench_parse_implementations_by_size,
319353
bench_parse_throughput,
320354
bench_frame_build_into,
355+
bench_frame_build_into_from_parts,
321356
bench_frame_full_pipeline
322357
);
323358
criterion_main!(benches);

crates/wraith-core/src/congestion.rs

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ use std::collections::VecDeque;
77
use std::time::{Duration, Instant};
88

99
/// Maximum number of bandwidth samples to keep.
10-
/// 20 samples captures 2.5 full ProbeBw 8-cycle gains, providing
11-
/// more stable bandwidth estimates than the previous 10-sample window.
12-
const BW_WINDOW_SIZE: usize = 20;
10+
/// 30 samples captures ~3.75 full ProbeBw 8-cycle gains, providing
11+
/// more stable bandwidth estimates for high-throughput scenarios.
12+
const BW_WINDOW_SIZE: usize = 30;
1313

1414
/// Maximum number of RTT samples to keep.
1515
/// 20 samples provides more stable min-RTT estimates across
@@ -122,9 +122,9 @@ impl BbrState {
122122
let now = Instant::now();
123123
Self {
124124
btl_bw: 0,
125-
min_rtt: Duration::from_millis(100), // Initial estimate
126-
pacing_gain: 2.89, // Startup gain (2/ln(2))
127-
pacing_gain_fp: STARTUP_GAIN_FP, // Fixed-point startup gain
125+
min_rtt: Duration::from_millis(50), // Initial estimate (50ms for modern networks)
126+
pacing_gain: 2.89, // Startup gain (2/ln(2))
127+
pacing_gain_fp: STARTUP_GAIN_FP, // Fixed-point startup gain
128128
cwnd_gain: 2.0,
129129
cwnd_gain_fp: CWND_GAIN_FP, // Fixed-point cwnd gain
130130
bdp: 0,
@@ -798,8 +798,9 @@ mod tests {
798798
fn test_bbr_cwnd_with_bdp() {
799799
let mut bbr = BbrState::new();
800800

801-
bbr.update_bandwidth(10_000_000, Duration::from_secs(1));
801+
// Update RTT first so BDP calculation uses the actual RTT, not the initial estimate
802802
bbr.update_rtt(Duration::from_millis(100));
803+
bbr.update_bandwidth(10_000_000, Duration::from_secs(1));
803804

804805
let cwnd = bbr.cwnd();
805806
let expected_bdp = (10_000_000.0 * 0.1) as u64;

crates/wraith-core/src/frame.rs

Lines changed: 58 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,15 @@
1414
use crate::FRAME_HEADER_SIZE;
1515
use crate::error::FrameError;
1616
use rand::Rng;
17+
use rand::SeedableRng;
18+
use rand::rngs::SmallRng;
19+
use std::cell::RefCell;
20+
21+
thread_local! {
22+
static PADDING_RNG: RefCell<SmallRng> = RefCell::new(
23+
SmallRng::from_rng(&mut rand::thread_rng()).expect("RNG seeding failed")
24+
);
25+
}
1726

1827
/// Maximum payload size (9000 - header - auth tag = 8944)
1928
const MAX_PAYLOAD_SIZE: usize = 8944;
@@ -64,29 +73,46 @@ pub enum FrameType {
6473
PathResponse = 0x0F,
6574
}
6675

76+
/// Lookup table for constant-time frame type validation.
77+
/// Index 0x00 = Reserved, 0x01-0x0F = valid types, 0x10-0x1F = reserved, rest = invalid.
78+
/// Value 0 = invalid/reserved, non-zero = valid FrameType discriminant.
79+
static FRAME_TYPE_TABLE: [u8; 32] = [
80+
0, // 0x00: Reserved
81+
0x01, // 0x01: Data
82+
0x02, // 0x02: Ack
83+
0x03, // 0x03: Control
84+
0x04, // 0x04: Rekey
85+
0x05, // 0x05: Ping
86+
0x06, // 0x06: Pong
87+
0x07, // 0x07: Close
88+
0x08, // 0x08: Pad
89+
0x09, // 0x09: StreamOpen
90+
0x0A, // 0x0A: StreamClose
91+
0x0B, // 0x0B: StreamReset
92+
0x0C, // 0x0C: WindowUpdate
93+
0x0D, // 0x0D: GoAway
94+
0x0E, // 0x0E: PathChallenge
95+
0x0F, // 0x0F: PathResponse
96+
0, 0, 0, 0, 0, 0, 0, 0, // 0x10-0x17: Reserved
97+
0, 0, 0, 0, 0, 0, 0, 0, // 0x18-0x1F: Reserved
98+
];
99+
67100
impl TryFrom<u8> for FrameType {
68101
type Error = FrameError;
69102

70103
fn try_from(value: u8) -> Result<Self, Self::Error> {
71-
match value {
72-
0x00 => Err(FrameError::ReservedFrameType),
73-
0x01 => Ok(Self::Data),
74-
0x02 => Ok(Self::Ack),
75-
0x03 => Ok(Self::Control),
76-
0x04 => Ok(Self::Rekey),
77-
0x05 => Ok(Self::Ping),
78-
0x06 => Ok(Self::Pong),
79-
0x07 => Ok(Self::Close),
80-
0x08 => Ok(Self::Pad),
81-
0x09 => Ok(Self::StreamOpen),
82-
0x0A => Ok(Self::StreamClose),
83-
0x0B => Ok(Self::StreamReset),
84-
0x0C => Ok(Self::WindowUpdate),
85-
0x0D => Ok(Self::GoAway),
86-
0x0E => Ok(Self::PathChallenge),
87-
0x0F => Ok(Self::PathResponse),
88-
0x10..=0x1F => Err(FrameError::ReservedFrameType),
89-
_ => Err(FrameError::InvalidFrameType(value)),
104+
if value < 32 {
105+
let entry = FRAME_TYPE_TABLE[value as usize];
106+
if entry != 0 {
107+
// SAFETY: entry matches a valid FrameType discriminant (0x01..=0x0F)
108+
Ok(unsafe { core::mem::transmute::<u8, FrameType>(entry) })
109+
} else if value == 0x00 || (0x10..=0x1F).contains(&value) {
110+
Err(FrameError::ReservedFrameType)
111+
} else {
112+
Err(FrameError::InvalidFrameType(value))
113+
}
114+
} else {
115+
Err(FrameError::InvalidFrameType(value))
90116
}
91117
}
92118
}
@@ -617,12 +643,9 @@ pub fn build_into_from_parts(
617643
// Write payload
618644
buf[FRAME_HEADER_SIZE..FRAME_HEADER_SIZE + payload_len].copy_from_slice(payload);
619645

620-
// Write random padding
646+
// Write random padding using fast PRNG (padding is AEAD-encrypted)
621647
if padding_len > 0 {
622-
rand::Rng::fill(
623-
&mut rand::thread_rng(),
624-
&mut buf[FRAME_HEADER_SIZE + payload_len..],
625-
);
648+
PADDING_RNG.with_borrow_mut(|rng| rng.fill(&mut buf[FRAME_HEADER_SIZE + payload_len..]));
626649
}
627650

628651
Ok(total_size)
@@ -730,9 +753,10 @@ impl FrameBuilder {
730753
// Write payload
731754
buf[FRAME_HEADER_SIZE..FRAME_HEADER_SIZE + payload_len].copy_from_slice(&self.payload);
732755

733-
// Write random padding
756+
// Write random padding using fast PRNG (padding is AEAD-encrypted)
734757
if padding_len > 0 {
735-
rand::thread_rng().fill(&mut buf[FRAME_HEADER_SIZE + payload_len..]);
758+
PADDING_RNG
759+
.with_borrow_mut(|rng| rng.fill(&mut buf[FRAME_HEADER_SIZE + payload_len..]));
736760
}
737761

738762
Ok(total_size)
@@ -743,41 +767,16 @@ impl FrameBuilder {
743767
/// # Errors
744768
///
745769
/// Returns [`FrameError::PayloadOverflow`] if `total_size` is too small for header + payload.
770+
#[allow(clippy::uninit_vec)]
746771
pub fn build(self, total_size: usize) -> Result<Vec<u8>, FrameError> {
747-
let frame_type = self.frame_type.unwrap_or(FrameType::Data);
748-
let payload_len = self.payload.len();
749-
750-
if total_size < FRAME_HEADER_SIZE + payload_len {
751-
return Err(FrameError::PayloadOverflow);
752-
}
753-
754-
let padding_len = total_size - FRAME_HEADER_SIZE - payload_len;
755772
let mut buf = Vec::with_capacity(total_size);
756-
757-
// Write header
758-
buf.extend_from_slice(&self.nonce);
759-
buf.push(frame_type as u8);
760-
buf.push(self.flags.as_u8());
761-
buf.extend_from_slice(&self.stream_id.to_be_bytes());
762-
buf.extend_from_slice(&self.sequence.to_be_bytes());
763-
buf.extend_from_slice(&self.offset.to_be_bytes());
764-
#[allow(clippy::cast_possible_truncation)]
765-
let payload_len_u16 = payload_len as u16;
766-
buf.extend_from_slice(&payload_len_u16.to_be_bytes());
767-
buf.extend_from_slice(&[0u8; 2]); // Reserved
768-
769-
// Write payload
770-
buf.extend_from_slice(&self.payload);
771-
772-
// Write random padding using thread-local PRNG (fast, non-syscall).
773-
// Padding bytes are encrypted by AEAD before transmission, so
774-
// cryptographic-quality randomness is not required here.
775-
if padding_len > 0 {
776-
let start = buf.len();
777-
buf.resize(start + padding_len, 0);
778-
rand::thread_rng().fill(&mut buf[start..]);
779-
}
780-
773+
// SAFETY: build_into() writes every byte of the buffer:
774+
// - Header (28 bytes): nonce, type, flags, stream_id, sequence, offset, payload_len, reserved
775+
// - Payload: copied from self.payload
776+
// - Padding: filled with random bytes from PADDING_RNG
777+
// Total = FRAME_HEADER_SIZE + payload_len + padding_len = total_size
778+
unsafe { buf.set_len(total_size) };
779+
self.build_into(&mut buf)?;
781780
Ok(buf)
782781
}
783782
}

crates/wraith-core/src/node/obfuscation.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,9 @@ impl Node {
237237
// Use wraith-obfuscation DohTunnel for protocol mimicry
238238
// Note: DohTunnel creates DNS query packets with EDNS0 OPT records
239239
let tunnel = &self.inner.doh_tunnel;
240-
let wrapped = tunnel.create_dns_query("wraith.local", data);
240+
let wrapped = tunnel
241+
.create_dns_query("wraith.local", data)
242+
.map_err(|_| NodeError::Obfuscation("DNS label encoding failed".into()))?;
241243

242244
tracing::trace!(
243245
"Wrapped {} bytes as DoH (total: {} bytes)",

0 commit comments

Comments
 (0)