Skip to content

Commit 4f738b3

Browse files
committed
docs: add english-arabic search feature documentation
1 parent 28e33c8 commit 4f738b3

1 file changed

Lines changed: 133 additions & 0 deletions

File tree

docs/english-arabic-search.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# English to Arabic Translation Search
2+
3+
## Overview
4+
5+
This feature adds the ability to search the Quran using English words (and their synonyms), which are translated to Arabic roots for searching. The implementation leverages the library's existing root search capability.
6+
7+
## Data Structure
8+
9+
```typescript
10+
interface EnglishArabicConcept {
11+
english: string[]; // All English variants (synonyms) in one field
12+
arabic: string[]; // Arabic roots only - library handles the rest
13+
}
14+
```
15+
16+
### Example
17+
18+
```json
19+
{
20+
"english": ["truth", "verity", "trueness", "accuracy"],
21+
"arabic": ["حق", "صدق"]
22+
}
23+
```
24+
25+
### Why This Structure Works
26+
27+
1. **Reduced Data Complexity** - Group all English synonyms in one field instead of duplicating entries
28+
2. **Leverage Existing Features** - Store only roots in `arabic` field; the library's built-in root search automatically finds all derived words
29+
3. **Consistent with Phonetic Flow** - English translation lookup runs at the same point as phonetic, creating a unified non-Arabic query pipeline
30+
31+
## How It Works
32+
33+
### Query Processing Flow
34+
35+
```
36+
User Query: "truth verity"
37+
38+
Split into tokens: ["truth", "verity"]
39+
40+
For each token:
41+
42+
┌─────────────────────────────────────┐
43+
│ isArabic(token)? │
44+
├──────────────┬──────────────────────┤
45+
│ YES │ NO │
46+
│ ↓ │ ↓ │
47+
│ Pass │ 1. English→Arabic │
48+
│ through │ (NEW) │
49+
│ unchanged │ 2. Phonetic │
50+
│ │ (existing) │
51+
└──────────────┴──────────────────────┘
52+
53+
If English found in translation map:
54+
"truth" → ["حق", "صدق"]
55+
"verity" → ["حق", "صدق"]
56+
57+
Dedupe and combine: ["حق", "صدق"]
58+
59+
Run search with root:true enabled
60+
Library automatically finds:
61+
- "الحق", "بالحق", "حقا" (from root "حق")
62+
- "صدق", "صدقا", "بالصدق" (from root "صدق")
63+
```
64+
65+
## Implementation Details
66+
67+
### Integration Point
68+
69+
The English→Arabic translation is integrated at [search.ts#L158](file:///src/core/search.ts#L158), right before the existing phonetic lookup:
70+
71+
```typescript
72+
if (!isArabic(token)) {
73+
const cleanToken = token.toLowerCase().trim();
74+
75+
// 1. English → Arabic translation (NEW)
76+
let arabicRoots = englishArabicMap.get(cleanToken);
77+
if (arabicRoots) {
78+
return arabicRoots[0]; // Use roots, library handles rest
79+
}
80+
81+
// 2. Phonetic fallback (EXISTING)
82+
let arabicPossibilities = phoneticMap.get(cleanToken);
83+
// ...
84+
}
85+
```
86+
87+
### When Does Translation Run?
88+
89+
The English translation lookup **only runs when the query is NOT Arabic**:
90+
91+
```typescript
92+
if (token && !isArabic(token)) {
93+
// Translation or Phonetic lookup happens here
94+
}
95+
```
96+
97+
This means:
98+
- Arabic queries → Direct search (no translation)
99+
- English/Latin queries → English translation lookup first, then phonetic fallback
100+
101+
## Usage
102+
103+
```typescript
104+
import { search } from 'quran-search-engine';
105+
106+
// Search using English word
107+
const result = search('truth', quranData, morphologyMap, wordMap, {
108+
lemma: true,
109+
root: true, // Important: enables root-based search
110+
semantic: true
111+
});
112+
113+
// The library will:
114+
// 1. Look up "truth" in English→Arabic map
115+
// 2. Find roots ["حق", "صدق"]
116+
// 3. Search for all words derived from these roots
117+
// 4. Return matching verses
118+
```
119+
120+
## Comparison: Phonetic vs English Translation
121+
122+
| Feature | Phonetic | English Translation |
123+
|---------|----------|---------------------|
124+
| Input | Latin letters mimicking Arabic pronunciation | English words |
125+
| Example | "bismillah" → "بسم الله" | "truth" → ["حق", "صدق"] |
126+
| Type | Transliteration (sound-based) | Translation (meaning-based) |
127+
| Data | Pre-computed phonetic mappings | English synonyms → Arabic roots |
128+
129+
## Future Enhancements
130+
131+
- Add `category` field for filtering semantic groups
132+
- Support for English phrase mappings
133+
- Integration with existing semantic search for concept expansion

0 commit comments

Comments
 (0)