Skip to content

Commit 62d6f29

Browse files
fix install
1 parent 379de74 commit 62d6f29

File tree

2,456 files changed

+817996
-28377
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,456 files changed

+817996
-28377
lines changed

Agent/README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Agent Plan: Make Fork a Drop-in Replacement for Official CTranslate2 (pip install)
2+
3+
## Objective
4+
Ensure that this fork of CTranslate2 can be installed via pip (from Git or local path) and behaves as a true drop-in replacement for the official package, including building the Python native extension and matching the official Python API and directory layout.
5+
6+
---
7+
8+
## Step-by-Step Plan
9+
10+
### 1. **Audit and Sync Python Packaging**
11+
- [ ] Compare `python/` directory structure and contents with the official CTranslate2 repo.
12+
- [ ] Ensure all packaging files (`setup.py`, `pyproject.toml`, `CMakeLists.txt`) match upstream or are compatible.
13+
- [ ] Confirm all required `.cc` files and Python modules are present.
14+
15+
### 2. **Test Clean pip Install**
16+
- [ ] In a fresh virtual environment, run:
17+
```sh
18+
pip install git+https://github.com/NADOOIT/CTranslate2.git
19+
python -c "import ctranslate2; print(ctranslate2.__version__)"
20+
```
21+
- [ ] Confirm that `ctranslate2` imports and the native extension (`_ext.*.so`) is present in `site-packages/ctranslate2`.
22+
23+
### 3. **Fix Any Build or Packaging Issues**
24+
- [ ] If the build fails or no `.so` is installed, check:
25+
- CMake errors or missing dependencies
26+
- Extension sources and install paths
27+
- `setup.py`/`pyproject.toml` configuration
28+
- [ ] Update packaging files or CMake as needed for compatibility.
29+
30+
### 4. **Automate and Document**
31+
- [ ] Add or update a `README.md` in the fork with clear install/build instructions.
32+
- [ ] Optionally, add a GitHub Actions workflow to test pip install and import on push/PR.
33+
34+
### 5. **(Optional) Build and Upload Wheels**
35+
- [ ] Build a wheel (`pip wheel .` or `python -m build`) for your fork.
36+
- [ ] Upload to a private index or GitHub Release for even easier installs (optional).
37+
38+
---
39+
40+
## Deliverables
41+
- This plan and all supporting scripts/docs in the `Agent/` folder.
42+
- A fork that can be installed via pip as a drop-in replacement for the official CTranslate2.
43+
- Documentation and (optionally) CI for ongoing compatibility.
44+
45+
---
46+
47+
## Progress Tracking
48+
- [ ] Audit complete
49+
- [ ] Clean pip install tested
50+
- [ ] Issues fixed
51+
- [ ] Documentation updated
52+
- [ ] (Optional) Wheel built and/or CI added

Agent/plan_steps.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Agent Execution Steps: Drop-in CTranslate2 Fork
2+
3+
This file tracks the concrete steps and findings as we execute the plan from README.md.
4+
5+
---
6+
7+
## 1. Audit and Sync Python Packaging
8+
9+
### a. Directory Structure
10+
- [x] `python/` exists with `setup.py`, `pyproject.toml`, and `cpp/` sources.
11+
- [x] `python/ctranslate2/` exists with `__init__.py`, `_ext.*.so`, and all required submodules.
12+
13+
### b. Packaging Files
14+
- [x] `setup.py` present, custom build logic for C++/Metal extension.
15+
- [x] `pyproject.toml` present, declares build dependencies.
16+
- [ ] Compare these files line-by-line with upstream for subtle differences.
17+
18+
### c. CMake Integration
19+
- [x] Top-level `CMakeLists.txt` present.
20+
- [ ] Confirm Python extension is built and installed to correct location by CMake.
21+
22+
---
23+
24+
## 2. Test Clean pip Install
25+
- [ ] Create a fresh virtual environment.
26+
- [ ] Run `pip install .` from the `python/` directory.
27+
- [ ] Verify `ctranslate2` imports and `_ext.*.so` is present in `site-packages`.
28+
29+
---
30+
31+
## 3. Fix Any Build/Packaging Issues
32+
- [ ] If build fails, capture error logs and diagnose.
33+
- [ ] If `.so` not installed, check `setup.py`/CMake install logic.
34+
- [ ] Update packaging or CMake files as needed.
35+
36+
---
37+
38+
## 4. Automate and Document
39+
- [ ] Update `README.md` in root or `python/` with install/build instructions.
40+
- [ ] (Optional) Add GitHub Actions workflow for install/import test.
41+
42+
---
43+
44+
## 5. (Optional) Build and Upload Wheels
45+
- [ ] Build a wheel for the fork.
46+
- [ ] (Optional) Upload to private index or GitHub Release.
47+
48+
---
49+
50+
## Progress
51+
- [x] Initial audit complete
52+
- [ ] Upstream comparison
53+
- [ ] Clean install tested
54+
- [ ] Issues fixed (if any)
55+
- [ ] Documentation/automation
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
---

Testing/Temporary/LastTest.log

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Start testing: Apr 26 21:01 CEST
2+
----------------------------------------------------------
3+
End testing: Apr 26 21:01 CEST

maybe.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
| Enhancement Name | Probable Speed-up (×) |
2+
|-----------------------------------------------|-------------------------------|
3+
| GPU Offload (Metal / MPS) | 10–100× |
4+
| FP16 (Half-Precision) Compute ||
5+
| Hybrid CPU + GPU + ANE | 1.5–2× |
6+
| ANE-Only Offload | 3–5× (for supported ops) |
7+
| Data-Transfer Minimization (shared MTLBuffer) | 1.1–2× |
8+
| Kernel Tile & Threadgroup Tuning | 1.1–1.5× |
9+
| Batching & Operation Fusion | 1.1–2× |
10+
| Auto-Tune Backend Selection | 1.1–1.5× |
11+
| *Strassen / Coppersmith–Winograd Algorithm* | *1.2–2×* |
12+
| *Block Floating-Point Quantization* | *2–4×* |
13+
| *Spiking Neuromorphic Approximate GEMM* | *5–10× (very high risk)* |
14+
| *Optical Co-processor Offload* | *100–1000× (theoretical)* |
15+
| *Quantum Matrix Multiplication* | *1 000–10 000× (theoretical)* |
16+
| *In-Memory Resistive Computing* | *100–1000× (early research)* |
17+
18+
**Quantum Matrix Multiplication**
19+
This refers to using quantum computing algorithms (e.g., the Harrow-Hassidim-Lloyd algorithm) to perform matrix multiplication exponentially faster than classical methods. In theory, a large-scale, fault-tolerant quantum computer could multiply matrices in polylogarithmic time, yielding speed-ups on the order of 1 000× to 10 000×. However, current quantum hardware is noisy, limited in qubit count, and lacks error correction, making practical quantum GEMM for real-world sizes unfeasible today.
20+
21+
**In-Memory Resistive Computing**
22+
Also known as analog crossbar computing, this approach uses arrays of resistive memory (e.g., PCM, RRAM) to perform multiply–accumulate operations directly in memory cells. It can compute entire matrix-vector products in one analog pass, potentially offering 100×–1 000× speed-ups and energy savings. Yet, prototype devices suffer from low precision, device variability, and integration challenges, so widescale, reliable in-memory GEMM remains early-stage research.

plot_benchmarks.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import pandas as pd
2+
import matplotlib.pyplot as plt
3+
import os
4+
5+
# Set paths to CSV files (update if needed)
6+
root = os.path.dirname(os.path.abspath(__file__))
7+
bench_ops_path = os.path.join(root, 'tests/metal/ops/build/benchmarks_ops.csv')
8+
gemm_cpu_path = os.path.join(root, 'tests/metal/ops/build/gemm_cpu_bench.csv')
9+
gemm_metal_path = os.path.join(root, 'tests/metal/ops/build/gemm_metal_bench.csv')
10+
11+
def plot_ops_benchmarks():
12+
df = pd.read_csv(bench_ops_path)
13+
# Plot GEMM speedup
14+
df_gemm = df[df['operator'] == 'GEMM']
15+
plt.figure(figsize=(8, 5))
16+
plt.title('GEMM: Metal vs CPU Speedup')
17+
plt.bar(df_gemm['size'], df_gemm['speedup'], color='royalblue')
18+
plt.xlabel('Matrix Size')
19+
plt.ylabel('Speedup (CPU ms / Metal ms)')
20+
plt.grid(True, axis='y')
21+
plt.tight_layout()
22+
plt.show()
23+
# Plot ReLU speedup
24+
df_relu = df[df['operator'] == 'ReLU']
25+
plt.figure(figsize=(8, 5))
26+
plt.title('ReLU: Metal vs CPU Speedup')
27+
plt.bar(df_relu['size'], df_relu['speedup'], color='orange')
28+
plt.xlabel('Input Size')
29+
plt.ylabel('Speedup (CPU ms / Metal ms)')
30+
plt.grid(True, axis='y')
31+
plt.tight_layout()
32+
plt.show()
33+
34+
def plot_gemm_cpu():
35+
df = pd.read_csv(gemm_cpu_path)
36+
# Only plot a subset for clarity
37+
sizes = df['size'].unique()
38+
for size in sizes:
39+
df_size = df[df['size'] == size]
40+
plt.plot(df_size['batch'], df_size['avg_ms'], marker='o', label=f'Size {size}')
41+
plt.title('CPU GEMM: Batch Size vs Time')
42+
plt.xlabel('Batch Size')
43+
plt.ylabel('Avg Time (ms)')
44+
plt.legend()
45+
plt.grid(True)
46+
plt.tight_layout()
47+
plt.show()
48+
49+
def main():
50+
plot_ops_benchmarks()
51+
plot_gemm_cpu()
52+
# You can add more plots for gemm_metal_bench.csv if data is available
53+
54+
if __name__ == '__main__':
55+
main()

0 commit comments

Comments
 (0)