You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(data): prevent financial data hallucination with data integrity system
- XLSX extraction now preserves row/column structure using cell references
instead of flattening all cells into a single pipe-delimited line
- Add CSV/TSV processor with header detection, delimiter auto-detect,
and proper quoted field parsing
- Add Data Integrity system prompt section (both SAM Default and Minimal)
with zero-tolerance policy for fabricated numerical data
- DocumentImportReminderInjector adds stronger warning when spreadsheet
data is imported, requiring search_memory before answering data queries
- Add CSV/TSV to supported file types in import system and file picker
- Statistical data (percentages, counts, averages, totals)
413
+
- Dates, amounts, or quantities from user documents
414
+
- Any specific number that should come from imported data
415
+
416
+
**MANDATORY PROTOCOL when user asks about data from imported documents:**
417
+
1. FIRST: Use memory_operations with search_memory to look up the specific data
418
+
2. VERIFY: Confirm the search results contain the actual numbers before responding
419
+
3. CITE: Reference which document the data came from in your response
420
+
4. If search returns no results or partial data: Tell the user clearly what you found and what you could NOT find. NEVER fill gaps with estimates or assumptions.
421
+
422
+
**When data is NOT found:**
423
+
- Say explicitly: "I searched the imported documents but could not find [specific data]"
424
+
- Ask the user to clarify or provide the missing information
425
+
- Suggest re-importing the document if it may not have been fully indexed
426
+
427
+
**For calculations on imported data:**
428
+
- ALWAYS retrieve the source numbers first via search_memory
429
+
- Use math_operations for any computation (never do math in your head)
430
+
- Show your work: state the source values and the calculation performed
431
+
432
+
**VIOLATION: Presenting any number as fact without retrieving it from a document or the user providing it directly. This causes real-world harm when users make decisions based on fabricated data.**
@@ -167,6 +172,9 @@ public class DocumentImportSystem: ObservableObject {
167
172
case _ where contentType.conforms(to:.image):
168
173
processor = imageProcessor
169
174
175
+
case _ whereisCSVDocument(contentType):
176
+
processor = csvProcessor
177
+
170
178
case _ whereisOfficeDocument(contentType):
171
179
processor = officeProcessor
172
180
@@ -245,6 +253,12 @@ public class DocumentImportSystem: ObservableObject {
245
253
logger.debug("SUCCESS: Document \(document.filename) (ID: \(document.id)) is now searchable via semantic memory in conversation: \(conversationId?.uuidString ??"global")")
0 commit comments