You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor embedding size testing and migrate shell scripts to Rust
- Replaced the original shell script for embedding size testing with a Rust implementation in `src/core/embedding/embedding_bin.rs`.
- Added a new wrapper script `embedding_tool.sh` for embedding generation functionality using the Ollama API.
- Updated `test_embedding_size.sh` to ensure it calls the new Rust binary and handles command-line arguments for size testing.
- Implemented logging for embedding size tests and improved error handling.
- Created a verification script `verify_rust_migration.sh` to compare outputs of legacy shell scripts with new Rust implementations.
- Added test data file for migration verification and updated documentation regarding migration status and cleanup plans.
- Ensured Rust binaries are built before execution and added checks for the Ollama API status.
Copy file name to clipboardExpand all lines: docs/migration/INGEST-MIGRATION.md
+60-10Lines changed: 60 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
-
```markdown
1
+
````markdown
2
2
# Ingest Module Migration
3
3
4
-
This document records the migration of the Rust ingest code from `rust_ingest/` to
5
-
`src/ingest/`.
4
+
This document records the migration of the Rust ingest code from `rust_ingest/`
5
+
to `src/ingest/`.
6
6
7
7
## Migration Steps Completed
8
8
@@ -24,6 +24,7 @@ This document records the migration of the Rust ingest code from `rust_ingest/`
24
24
```bash
25
25
./scripts/dev/build-ingest.sh --test
26
26
```
27
+
````
27
28
28
29
2. Update any scripts that referenced the old `rust_ingest` directory
29
30
@@ -39,11 +40,14 @@ This document records the migration of the Rust ingest code from `rust_ingest/`
39
40
1. All source code is now in the `src` directory, following standard conventions
40
41
2. The code is organized by domain rather than technology
41
42
3. Module boundaries are clearer in the new structure
42
-
4. Future functionality can be added to the `src` directory with consistent organization
43
+
4. Future functionality can be added to the `src` directory with consistent
44
+
organization
43
45
44
46
## Shell Scripts Migration Plan
45
47
46
-
This section outlines the plan to migrate essential shell scripts to Rust. The goal is to replace critical bash scripts with more maintainable, performant, and type-safe Rust implementations.
48
+
This section outlines the plan to migrate essential shell scripts to Rust. The
49
+
goal is to replace critical bash scripts with more maintainable, performant, and
50
+
type-safe Rust implementations.
47
51
48
52
### Migration Candidates (Prioritized)
49
53
@@ -141,7 +145,7 @@ This section outlines the plan to migrate essential shell scripts to Rust. The g
141
145
)
142
146
// other subcommands
143
147
.get_matches();
144
-
148
+
145
149
// handle commands
146
150
}
147
151
```
@@ -172,13 +176,57 @@ This section outlines the plan to migrate essential shell scripts to Rust. The g
172
176
- ⏳ Create compatibility wrappers for all scripts
173
177
- ⏳ Update documentation
174
178
175
-
### Current Status (July 21, 2025)
179
+
### Current Status (Updated)
180
+
181
+
✅ **Successfully migrated text_chunker.sh to Rust**
176
182
177
-
- Successfully migrated text_chunker.sh to Rust
178
183
- Created a compatibility wrapper to maintain script interface
179
-
- Implemented both character-based and semantic chunking strategies
180
-
- Started work on the Ollama API client module
184
+
- Implemented character-based, size-based, and semantic chunking strategies
181
185
- Compiled and tested the text_chunker binary successfully
186
+
- Original script backed up as `text_chunker.sh.legacy`
187
+
- New implementation at `src/utils/chunking.rs` and `src/utils/chunker_bin.rs`
188
+
189
+
✅ **Successfully migrated test_embedding_size.sh to Rust**
190
+
191
+
- Implemented as part of a comprehensive embedding_tool CLI
192
+
- Added test-sizes command with flexible configuration
193
+
- Created detailed logging and reporting functionality
194
+
- Original script backed up as `test_embedding_size.sh.legacy`
195
+
- New implementation at `src/core/embedding/embedding_bin.rs`
196
+
197
+
✅ **Created a robust Ollama API client module**
198
+
199
+
- Implemented check_status functionality
200
+
- Added embedding generation with timeout handling
201
+
- Added support for chunked embeddings for long text
202
+
- Implemented proper error handling and reporting
203
+
- New implementation at `src/core/embedding/ollama_api.rs`
204
+
205
+
✅ **Created wrapper scripts for backwards compatibility**
206
+
207
+
-`text_chunker.sh` - Now a wrapper around the Rust implementation
208
+
-`test_embedding_size.sh` - Now a wrapper around the Rust implementation
209
+
- Automatic Rust binary rebuilding when source changes
210
+
- Error handling and fallback mechanisms
211
+
212
+
📝 **Created cleanup documentation**
213
+
214
+
- Migration tracking document at `docs/migration/SCRIPT-CLEANUP-PLAN.md`
215
+
- Implementation timeline and roadmap
216
+
- Testing and verification strategies
217
+
218
+
### Scripts Pending Migration
219
+
220
+
The following scripts are still pending migration to Rust:
221
+
222
+
1. 🔄 `ingest_chunked.sh` - Character-based chunking for data ingestion (High
223
+
Priority)
224
+
2. 🔄 `ingest_marvelai.sh` - Marvel AI data ingestion (Medium Priority)
0 commit comments