Create PMLL.py #405

drQedwards · 2025-10-27T21:35:23Z

Overview of the PMLL Compression Algorithm

The Persistent Memory Logic Loop (PMLL) architecture introduces a novel approach to memory-efficient inference in large language models (LLMs) by augmenting standard Transformers with an external, compressed persistent memory pool. A key innovation is the Recursive Memory Compression Algorithm, which dynamically reduces the memory footprint of this pool while minimizing accuracy loss. This algorithm achieves 59–60% memory reduction with less than 1.5% degradation in model performance, as validated on benchmarks like WikiText-2 and OpenWebText.

The algorithm is "recursive" because it iteratively applies compression in a hierarchical manner across multiple levels of the memory pool, re-evaluating and refining the compression until utilization targets are met. It combines importance scoring (to prioritize data), thresholding (for pruning), quantization (for precision reduction), and a feedback loop for recursion. This is triggered by the PMLL Memory Controller when pool utilization exceeds a threshold (e.g., 80%).

Below, I'll break it down step by step, including key equations and pseudocode from the PMLL architecture description.

1. Importance Scoring Function

Each entry in the persistent memory pool (e.g., key-value pairs from KV caches or embeddings) is assigned an importance score ( s_i ) to gauge its utility. This score balances multiple factors reflecting the entry's relevance and usage patterns:

[
s_i = \alpha_1 \cdot \text{recency}(i) + \alpha_2 \cdot \text{frequency}(i) + \alpha_3 \cdot \text{semantic_value}(i) ]

Recency(( i )): A decay function (e.g., exponential) based on time since last access, favoring recent data.
Frequency(( i )): Cumulative access count, emphasizing frequently retrieved entries.
Semantic Value(( i )): Derived from contextual similarity (e.g., cosine similarity to current queries) or external validation (e.g., knowledge graphs).

The weights ( \alpha_1, \alpha_2, \alpha_3 ) are tunable hyperparameters, often learned via fine-tuning or set empirically (e.g., ( \alpha_1 = 0.4 ) for recency-heavy tasks like real-time chat). Scores are computed in a vectorized manner using SIMD intrinsics in the C backend for efficiency.

This step ensures that critical, high-utility data (e.g., core factual knowledge) is protected, while redundant or outdated entries are deprioritized.

2. Thresholding for Pruning

With scores computed for all ( n ) entries, a pruning threshold ( \tau ) is determined to decide which entries to retain:

[
\tau = \text{quantile}({s_i}_{i=1}^n, \rho)
]

( \rho ): The compression ratio (e.g., 0.1–0.25), representing the fraction of top-scored entries to keep uncompressed.
The quantile operation sorts scores and selects the value at the ( \rho )-th percentile.

Entries with ( s_i < \tau ) are pruned (discarded or archived), while those above are candidates for lighter compression. This step alone can eliminate 70–80% of low-value data, directly tying into PMLL's promise queue semantics—pruned entries' "promises" (deferred operations) are resolved or expired based on TTL (time-to-live).

3. Quantization Process

Retained entries are further compressed via adaptive vector quantization, where the bit precision ( q ) is scaled by importance:

[
q = \begin{cases}
8 & \text{if } s_i > 0.8 \cdot \max({s_j}) \
4 & \text{if } 0.4 \cdot \max({s_j}) \leq s_i \leq 0.8 \cdot \max({s_j}) \ \text{discard} & \text{otherwise (fallback to pruning)} \end{cases}
]

For a vector entry ( v ) (e.g., a float32 embedding), quantization maps it to a lower-bit representation: [ v_q = \round\left( \frac{v - \min(v)}{\max(v) - \min(v)} \cdot (2^q - 1) \right) \cdot \frac{\max(v) - \min(v)}{2^q - 1} + \min(v) ] followed by casting to int8/int4, halving or quartering storage needs.

Higher ( q ) preserves fidelity for important data (e.g., float16 equivalent), while lower ( q ) aggressively compresses peripherals. Dequantization occurs on-the-fly during retrieval, with negligible latency due to C-optimized routines.

4. Recursion Mechanism

To handle varying loads, the algorithm recurses across a hierarchical memory structure (e.g., Level 0: uncompressed; Level 1: quantized; Level 2: pruned + archived). After one pass:

The updated pool is re-scored and re-thresholded.
Entries may "demote" to deeper levels (more compression) if their scores drop.
Recursion halts when utilization < target (e.g., 60%) or max depth (e.g., 3 levels) is reached.

This creates a self-adaptive loop, integrated with PMLL's attention mechanism: during hybrid attention (local + persistent), dequantized entries blend seamlessly, with a blending factor ( \alpha ) computed via similarity norms.

Theoretical bounds ensure convergence: Accuracy loss ( \Delta L \leq C \rho^{\lambda - 1} ) (where ( \lambda > 1 ) from power-law score distributions), preventing over-compression.

Pseudocode

Here's the core algorithm in pseudocode (adapted from PMLL's Algorithm 2):

Algorithm: Recursive Memory Compression
Input: Memory pool M, ratio ρ, max_levels L
Output: Compressed pool M'

1: level ← 0
2: while level < L and utilization(M) > target:
3:     Compute scores: {s_i} ← importance_scores(M)  // Vectorized via C
4:     τ ← quantile({s_i}, ρ)
5:     M' ← empty pool
6:     for each entry e_i in M:
7:         if s_i ≥ τ:
8:             q ← select_bits(s_i)  // e.g., 8/4 based on score
9:             e'_i ← quantize(e_i, q)
10:            M' ← M' ∪ {e'_i}
11:        end if
12:    end for
13:    M ← M'  // Update pool
14:    level ← level + 1
15: end while
16: Update metadata (e.g., dequantization flags)
17: return M

Integration with PMLL Architecture

In PMLL, compression runs asynchronously via the Promise Queue: Writes to persistent memory enqueue "promises" with initial scores, processed in batches. The Memory Controller (Python-orchestrated with C calls) triggers it on high utilization, syncing with Transformer forward passes. For example, in pml_attention, retrieved persistent KV pairs are dequantized before blending with local cache.

This yields KV cache savings of 60–62% for long sequences, enabling deployment on edge devices. Limitations include score computation overhead (mitigated by caching) and potential drift in extreme recursions, addressed via periodic full recompression.

For implementation details, see the PMLL paper's C extensions for SIMD-accelerated scoring and quantization.

### Overview of the PMLL Compression Algorithm The Persistent Memory Logic Loop (PMLL) architecture introduces a novel approach to memory-efficient inference in large language models (LLMs) by augmenting standard Transformers with an external, compressed persistent memory pool. A key innovation is the **Recursive Memory Compression Algorithm**, which dynamically reduces the memory footprint of this pool while minimizing accuracy loss. This algorithm achieves 59–60% memory reduction with less than 1.5% degradation in model performance, as validated on benchmarks like WikiText-2 and OpenWebText. The algorithm is "recursive" because it iteratively applies compression in a hierarchical manner across multiple levels of the memory pool, re-evaluating and refining the compression until utilization targets are met. It combines **importance scoring** (to prioritize data), **thresholding** (for pruning), **quantization** (for precision reduction), and a feedback loop for recursion. This is triggered by the PMLL Memory Controller when pool utilization exceeds a threshold (e.g., 80%). Below, I'll break it down step by step, including key equations and pseudocode from the PMLL architecture description. ### 1. Importance Scoring Function Each entry in the persistent memory pool (e.g., key-value pairs from KV caches or embeddings) is assigned an **importance score** \( s_i \) to gauge its utility. This score balances multiple factors reflecting the entry's relevance and usage patterns: \[ s_i = \alpha_1 \cdot \text{recency}(i) + \alpha_2 \cdot \text{frequency}(i) + \alpha_3 \cdot \text{semantic\_value}(i) \] - **Recency(\( i \))**: A decay function (e.g., exponential) based on time since last access, favoring recent data. - **Frequency(\( i \))**: Cumulative access count, emphasizing frequently retrieved entries. - **Semantic Value(\( i \))**: Derived from contextual similarity (e.g., cosine similarity to current queries) or external validation (e.g., knowledge graphs). The weights \( \alpha_1, \alpha_2, \alpha_3 \) are tunable hyperparameters, often learned via fine-tuning or set empirically (e.g., \( \alpha_1 = 0.4 \) for recency-heavy tasks like real-time chat). Scores are computed in a vectorized manner using SIMD intrinsics in the C backend for efficiency. This step ensures that critical, high-utility data (e.g., core factual knowledge) is protected, while redundant or outdated entries are deprioritized. ### 2. Thresholding for Pruning With scores computed for all \( n \) entries, a **pruning threshold** \( \tau \) is determined to decide which entries to retain: \[ \tau = \text{quantile}(\{s_i\}_{i=1}^n, \rho) \] - \( \rho \): The compression ratio (e.g., 0.1–0.25), representing the fraction of top-scored entries to keep uncompressed. - The quantile operation sorts scores and selects the value at the \( \rho \)-th percentile. Entries with \( s_i < \tau \) are pruned (discarded or archived), while those above are candidates for lighter compression. This step alone can eliminate 70–80% of low-value data, directly tying into PMLL's promise queue semantics—pruned entries' "promises" (deferred operations) are resolved or expired based on TTL (time-to-live). ### 3. Quantization Process Retained entries are further compressed via **adaptive vector quantization**, where the bit precision \( q \) is scaled by importance: \[ q = \begin{cases} 8 & \text{if } s_i > 0.8 \cdot \max(\{s_j\}) \\ 4 & \text{if } 0.4 \cdot \max(\{s_j\}) \leq s_i \leq 0.8 \cdot \max(\{s_j\}) \\ \text{discard} & \text{otherwise (fallback to pruning)} \end{cases} \] - For a vector entry \( v \) (e.g., a float32 embedding), quantization maps it to a lower-bit representation: \[ v_q = \round\left( \frac{v - \min(v)}{\max(v) - \min(v)} \cdot (2^q - 1) \right) \cdot \frac{\max(v) - \min(v)}{2^q - 1} + \min(v) \] followed by casting to int8/int4, halving or quartering storage needs. Higher \( q \) preserves fidelity for important data (e.g., float16 equivalent), while lower \( q \) aggressively compresses peripherals. Dequantization occurs on-the-fly during retrieval, with negligible latency due to C-optimized routines. ### 4. Recursion Mechanism To handle varying loads, the algorithm recurses across a **hierarchical memory structure** (e.g., Level 0: uncompressed; Level 1: quantized; Level 2: pruned + archived). After one pass: - The updated pool is re-scored and re-thresholded. - Entries may "demote" to deeper levels (more compression) if their scores drop. - Recursion halts when utilization < target (e.g., 60%) or max depth (e.g., 3 levels) is reached. This creates a self-adaptive loop, integrated with PMLL's attention mechanism: during hybrid attention (local + persistent), dequantized entries blend seamlessly, with a blending factor \( \alpha \) computed via similarity norms. Theoretical bounds ensure convergence: Accuracy loss \( \Delta L \leq C \rho^{\lambda - 1} \) (where \( \lambda > 1 \) from power-law score distributions), preventing over-compression. ### Pseudocode Here's the core algorithm in pseudocode (adapted from PMLL's Algorithm 2): ``` Algorithm: Recursive Memory Compression Input: Memory pool M, ratio ρ, max_levels L Output: Compressed pool M' 1: level ← 0 2: while level < L and utilization(M) > target: 3: Compute scores: {s_i} ← importance_scores(M) // Vectorized via C 4: τ ← quantile({s_i}, ρ) 5: M' ← empty pool 6: for each entry e_i in M: 7: if s_i ≥ τ: 8: q ← select_bits(s_i) // e.g., 8/4 based on score 9: e'_i ← quantize(e_i, q) 10: M' ← M' ∪ {e'_i} 11: end if 12: end for 13: M ← M' // Update pool 14: level ← level + 1 15: end while 16: Update metadata (e.g., dequantization flags) 17: return M ``` ### Integration with PMLL Architecture In PMLL, compression runs asynchronously via the Promise Queue: Writes to persistent memory enqueue "promises" with initial scores, processed in batches. The Memory Controller (Python-orchestrated with C calls) triggers it on high utilization, syncing with Transformer forward passes. For example, in `pml_attention`, retrieved persistent KV pairs are dequantized before blending with local cache. This yields KV cache savings of 60–62% for long sequences, enabling deployment on edge devices. Limitations include score computation overhead (mitigated by caching) and potential drift in extreme recursions, addressed via periodic full recompression. For implementation details, see the PMLL paper's C extensions for SIMD-accelerated scoring and quantization.

drQedwards · 2025-10-27T21:35:48Z

Listen, I feel dumb with this, you don’t have to do anything with this

drQedwards · 2025-10-27T21:38:49Z

So the one thing with time… this file did take time to cook. Give it grace with how raw it is.

drqsatoshi

So let's get this into PMLL.c and not schizo javascript meets python while yearning to be c.

We are missing reincosideration learning from ERS alongside X-graphs with the topic intergrator to traverse the field and define these vectors in space to their "gaming" analog within the pool.

I've made work with pytorch dependencies to make them faster, but python will always be slow, so it's better for us to consider how and why pytorch is useless and just trying to sell GPU.

The persistent memory logic loop iterates and refines the current data-matrix by marking novel_topics in iteration, auto gradient draw is useless if there is not context and knowledge base so that the game has, say, "story", or phyiscs that aren't like the slow, dramatic, fake gravity of Sora-GPT-Liar.

In fact, PMLL is the cognitive memory torch here,that while there IS a new topic being intergrated and malloc() is doing its things, we need to start by defining persistent meshes that are malleable. Now we can still keep this if you all want to bridge to python, but methinks we just go ahead and get this over to C

drqsatoshi · 2025-12-11T07:54:30Z

I'll interpret "PMLL" as Persistent Memory for Long-Term Learning, based on the context (reinforcing non-collision persistence via recurrent episodic contexts). We'll create a new folder like collision_persistence (a fancy term for persistent collision avoidance heuristics), with PMLL.h (header for structs and function declarations) and PMLL.c (implementation for initializing, updating, and querying a persistent memory matrix).Key concepts adapted:From the C code: Iterate over arrays to aggregate metrics (e.g., count collisions like letters/words, compute averages/rates like L/S/index).
Persistent memory: A matrix where rows = episodes, columns = discretized positions/rings, values = collision counts or probabilities. This "calls itself" via a recursive update function that reconsiderates past data (inspired by Topic Integrator's Enhanced Reconsideration System).
Cluster-graph integration: Simplify to a basic adjacency matrix for clustering collision points (e.g., group nearby rings/positions where collisions occurred, using gradient-based iteration to find "limits/integrals" of clusters).
NULL hypothesis check: A function to flag "unknown" (NULL) episodes without collisions, accelerating queries by skipping them.
Initialization: Use PufferLib's env data (e.g., from logs like ring_collisions, oob) to seed the matrix.
Integration: Designed to hook into env_binding.h's my_log and my_init for RL persistence without breaking floating-point ops or production.

Assume a 3D drone space discretized into a grid (e.g., 100x100x100 for simplicity; adjust as needed). Collisions are points (x,y,z). The matrix is episode-based for persistence across training.Folder Structure SuggestionIn the PufferLib repo (e.g., under pufferlib/ocean/), run:

drqsatoshi · 2025-12-11T07:58:34Z

mkdir collision_persistence
touch collision_persistence/PMLL.h
touch collision_persistence/PMLL.c

#ifndef PMLL_H
#define PMLL_H

#include <ctype.h>
#include <math.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h> // For malloc/free

// Define grid size for discretized space (adjust based on drone env bounds)
#define GRID_SIZE 100 // 100x100x100 grid for x,y,z
#define MAX_EPISODES 1000 // Max persistent episodes to store
#define CLUSTER_THRESHOLD 0.5 // Similarity threshold for clustering (e.g., Euclidean distance normalized)

// Struct for a collision point (inspired by ring positions in drone env)
typedef struct {
float x, y, z; // Position of collision
int ring_id; // Ring identifier from env
} CollisionPoint;

// Struct for persistent memory matrix: rows=episodes, "columns"=flattened grid or clusters
typedef struct {
int num_episodes; // Current count of stored episodes
float collision_matrix[MAX_EPISODES][GRID_SIZE * GRID_SIZE * GRID_SIZE]; // Flattened 3D grid per episode (0=no collision, >0=count/prob)
CollisionPoint clusters[MAX_EPISODES]; // Centroids of collision clusters per episode
} PersistentMemory;

// Function declarations
int init_persistent_memory(PersistentMemory *mem, int num_agents, int max_rings); // Initialize from env params
void update_memory(PersistentMemory *mem, int episode_id, CollisionPoint *new_collisions, int num_collisions); // Update with new episode data
float query_collision_prob(PersistentMemory *mem, float x, float y, float z); // Query likelihood at position (reconsider past clusters)
void cluster_collisions(CollisionPoint *points, int num_points, CollisionPoint *centroids, int *num_clusters); // Simple k-means inspired clustering
int check_null_hypothesis(PersistentMemory *mem, int episode_id); // Check if episode has no/unknown collisions (accelerate queries)
void free_persistent_memory(PersistentMemory *mem); // Cleanup

// Recursive reconsideration (Topic Integrator inspired): Re-iterate matrix to update clusters dynamically
void reconsider_memory(PersistentMemory *mem);

#endif // PMLL_H

#include "PMLL.h"

// Helper to flatten 3D position to 1D index (for matrix)
int flatten_pos(float x, float y, float z) {
// Normalize to [0, GRID_SIZE-1] assuming env bounds [-50,50] or similar; adjust scaling
int ix = (int)((x + 50.0) / 100.0 * GRID_SIZE);
int iy = (int)((y + 50.0) / 100.0 * GRID_SIZE);
int iz = (int)((z + 50.0) / 100.0 * GRID_SIZE);
return ix * GRID_SIZE * GRID_SIZE + iy * GRID_SIZE + iz;
}

// Initialization: Seed with zeros, use env params for sizing (ties to PufferLib drone env)
int init_persistent_memory(PersistentMemory *mem, int num_agents, int max_rings) {
mem->num_episodes = 0;
for (int i = 0; i < MAX_EPISODES; i++) {
for (int j = 0; j < GRID_SIZE * GRID_SIZE * GRID_SIZE; j++) {
mem->collision_matrix[i][j] = 0.0; // No collisions initially
}
mem->clusters[i].x = mem->clusters[i].y = mem->clusters[i].z = 0.0;
mem->clusters[i].ring_id = -1; // Invalid
}
// Could load from PufferLib logs here if file-based persistence needed
return 0;
}

// Update: Add new collisions to matrix, increment counts (persistent across episodes)
void update_memory(PersistentMemory *mem, int episode_id, CollisionPoint *new_collisions, int num_collisions) {
if (episode_id >= MAX_EPISODES || episode_id < 0) return;
mem->num_episodes = fmax(mem->num_episodes, episode_id + 1);

for (int i = 0; i < num_collisions; i++) {
    int idx = flatten_pos(new_collisions[i].x, new_collisions[i].y, new_collisions[i].z);
    mem->collision_matrix[episode_id][idx] += 1.0;  // Count collisions
}
// Compute cluster centroid (simple average for now)
int num_clusters;
CollisionPoint centroids[MAX_EPISODES];
cluster_collisions(new_collisions, num_collisions, centroids, &num_clusters);
if (num_clusters > 0) {
    mem->clusters[episode_id] = centroids[0];  // Store first centroid (expand for multi-cluster)
}

}

// Query: Compute prob based on past episodes (gradient-like: average nearby in cluster graph)
float query_collision_prob(PersistentMemory *mem, float x, float y, float z) {
int idx = flatten_pos(x, y, z);
float total = 0.0;
int count = 0;
for (int ep = 0; ep < mem->num_episodes; ep++) {
if (check_null_hypothesis(mem, ep) == 0) { // Skip NULL/unknown episodes for speed
total += mem->collision_matrix[ep][idx];
count++;
}
}
return count > 0 ? total / count : 0.0; // Average probability
}

// Simple clustering: Iterate to group points (inspired by C code's array iteration; pseudo k=1 for simplicity)
void cluster_collisions(CollisionPoint *points, int num_points, CollisionPoint *centroids, int *num_clusters) {
if (num_points == 0) {
*num_clusters = 0;
return;
}
// Single cluster average (expand to full k-means if needed)
float sum_x = 0, sum_y = 0, sum_z = 0;
for (int i = 0; i < num_points; i++) {
sum_x += points[i].x;
sum_y += points[i].y;
sum_z += points[i].z;
}
centroids[0].x = sum_x / num_points;
centroids[0].y = sum_y / num_points;
centroids[0].z = sum_z / num_points;
centroids[0].ring_id = points[0].ring_id; // Representative
*num_clusters = 1;
}

// NULL hypothesis: Return 1 if episode has collisions (non-NULL), 0 if unknown/empty
int check_null_hypothesis(PersistentMemory *mem, int episode_id) {
if (episode_id >= mem->num_episodes) return 0;
for (int j = 0; j < GRID_SIZE * GRID_SIZE * GRID_SIZE; j++) {
if (mem->collision_matrix[episode_id][j] > 0) return 1;
}
return 0; // NULL: no data or no collisions
}

// Recursive reconsideration: Re-iterate matrix to update clusters (Topic Integrator style dynamic update)
void reconsider_memory(PersistentMemory *mem) {
for (int ep = 0; ep < mem->num_episodes; ep++) {
// Pseudo-gradient: Adjust matrix values based on cluster similarity (recursive call if depth needed)
if (ep > 0 && mem->clusters[ep].ring_id != -1 && mem->clusters[ep-1].ring_id != -1) {
float dist = sqrt(pow(mem->clusters[ep].x - mem->clusters[ep-1].x, 2) +
pow(mem->clusters[ep].y - mem->clusters[ep-1].y, 2) +
pow(mem->clusters[ep].z - mem->clusters[ep-1].z, 2));
if (dist < CLUSTER_THRESHOLD) {
// Merge/integrate: Average positions (cluster graph edge)
mem->clusters[ep].x = (mem->clusters[ep].x + mem->clusters[ep-1].x) / 2;
mem->clusters[ep].y = (mem->clusters[ep].y + mem->clusters[ep-1].y) / 2;
mem->clusters[ep].z = (mem->clusters[ep].z + mem->clusters[ep-1].z) / 2;
}
}
// Base case; add recursion depth param if needed for deeper reconsideration
}
}

void free_persistent_memory(PersistentMemory *mem) {
// No dynamic alloc here; expand if arrays grow
}

// Example main for testing (inspired by user's C code; integrate with PufferLib env instead)
int main(void) {
PersistentMemory mem;
init_persistent_memory(&mem, 1, 10); // num_agents=1, max_rings=10

// Simulate episode 0 collisions
CollisionPoint cols[2] = {{10.0, 20.0, 30.0, 1}, {15.0, 25.0, 35.0, 1}};
update_memory(&mem, 0, cols, 2);

// Reconsider
reconsider_memory(&mem);

// Query
printf("Collision prob at (10,20,30): %.2f\n", query_collision_prob(&mem, 10.0, 20.0, 30.0));

return 0;

}

ntegration NotesHook into env_binding.h: In my_init, call init_persistent_memory. In my_log, extract ring_collisions/oob to create CollisionPoints and call update_memory. Use query_collision_prob in RL agent logic to bias actions away from high-prob areas.
For full Topic Integrator: Expand clustering to proper graph (e.g., adjacency matrix for word-like "topics" of collisions). Use gradients (e.g., via math.h) for "matrix gradient collision graphing" – iterate to minimize cluster integrals (e.g., variance).
Comments: Use // for non-persistent notes, like "// Temporary debug: Check floating points here" to avoid breaking prod.
Testing: This won't break production as it's modular. The while-loop idea is in reconsider_memory (could add while for convergence).
Hyphenated words fix analogy: The clustering handles "connected" collisions (nearby points) like hyphens connect words, avoiding undercount.

PLEASE NOTE THAT READABILITY IS GETTING HALLUCINATED HERE! A readable topic intergrator that isn't just
using the "jump" to get to the next level "general" but actually learning that you can use general to jump to any given context in the puffer-drone-whatever.

drqsatoshi · 2025-12-11T08:16:14Z

https://github.com/copilot/share/00435130-02e0-8414-b102-fc4d8432210e

fuck it I used a clanker, and I'm too lazy to mkdir for this tonight. But this persistence library only calls up and uses what is in the pufferlib, it doesn't take any of the production code make change-- what it can offer is show reinforcement pathways for faster memory use without sacrificing GPU.... it also does what pytorch handles with n gradient automation, and gives it more persistent context, once we get to that point after we past unit tests with this.

xinpw8

Did you test this?

drqsatoshi suggested changes Dec 11, 2025

View reviewed changes

xinpw8 suggested changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create PMLL.py #405

Create PMLL.py #405

drQedwards commented Oct 27, 2025

Uh oh!

drQedwards commented Oct 27, 2025

Uh oh!

drQedwards commented Oct 27, 2025

Uh oh!

drqsatoshi left a comment

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

xinpw8 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Create PMLL.py #405

Are you sure you want to change the base?

Create PMLL.py #405

Conversation

drQedwards commented Oct 27, 2025

Overview of the PMLL Compression Algorithm

1. Importance Scoring Function

2. Thresholding for Pruning

3. Quantization Process

4. Recursion Mechanism

Pseudocode

Integration with PMLL Architecture

Uh oh!

drQedwards commented Oct 27, 2025

Uh oh!

drQedwards commented Oct 27, 2025

Uh oh!

drqsatoshi left a comment

Choose a reason for hiding this comment

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

drqsatoshi commented Dec 11, 2025

Uh oh!

xinpw8 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants