You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ALIGNMENT_INTEGRATION_WORKFLOW.md
-88Lines changed: 0 additions & 88 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -326,91 +326,3 @@ These allow users to:
326
326
- Track which features are aligned together via `alignment_group_id`
327
327
- Find the reference feature that was used for alignment
328
328
- Filter or analyze separately if needed
329
-
330
-
## Technical Implementation Details
331
-
332
-
### Precision Preservation for Large Feature IDs
333
-
334
-
Large integer feature IDs (e.g., `5,405,272,318,039,692,409`) require special handling to prevent precision loss during database operations and pandas DataFrame creation.
335
-
336
-
#### The Problem
337
-
- Feature IDs can exceed 2^53, the maximum integer that float64 can represent precisely
338
-
- When pandas reads INTEGER columns from databases without explicit typing, it may infer float64 dtype
339
-
- This causes precision loss: `5,405,272,318,039,692,409` → `5,405,272,318,039,692,288`
340
-
341
-
#### The Solution
342
-
SQL queries use explicit CAST operations in SELECT clauses (but NOT in JOIN conditions):
343
-
344
-
```sql
345
-
-- OSW (SQLite)
346
-
SELECT CAST(FEATURE.IDASINTEGER) AS id,
347
-
CAST(FEATURE_MS2_ALIGNMENT.REFERENCE_FEATURE_IDASINTEGER) AS alignment_reference_feature_id
348
-
FROM ...
349
-
350
-
-- Parquet (DuckDB)
351
-
SELECT CAST(fa.REFERENCE_FEATURE_IDASBIGINT) AS REFERENCE_FEATURE_ID
352
-
FROM ...
353
-
```
354
-
355
-
**Key Design Principles:**
356
-
1.**CAST in SELECT**: Ensures pandas reads columns as integers, preserving precision
357
-
2.**No CAST in JOIN**: Database can use indexes for fast lookups (~16 seconds vs 50 minutes)
358
-
3.**Post-query conversion**: After reading, convert to pandas Int64 dtype for nullable integer support
0 commit comments