|
4 | 4 | The pattern system uses a "smallest set priority" strategy for resolving conflicts between overlapping patterns. This applies to title patterns, division patterns, and document type patterns. The pattern that matches the smallest set of URLs takes precedence.
|
5 | 5 |
|
6 | 6 | ## How It Works
|
7 |
| - |
8 | 7 | When multiple patterns match a URL, the system:
|
9 | 8 | 1. Counts how many total URLs each pattern matches
|
10 | 9 | 2. Compares the counts
|
11 | 10 | 3. Applies the pattern that matches the fewest URLs
|
12 | 11 |
|
13 |
| -### Example |
| 12 | +### Example Pattern Hierarchy |
14 | 13 | ```
|
15 | 14 | Pattern A: */docs/* # Matches 100 URLs
|
16 | 15 | Pattern B: */docs/api/* # Matches 20 URLs
|
17 | 16 | Pattern C: */docs/api/v2/* # Matches 5 URLs
|
18 | 17 |
|
19 |
| -For URL "/docs/api/v2/users": |
20 |
| -- All patterns match |
21 |
| -- Pattern C wins (5 URLs < 20 URLs < 100 URLs) |
| 18 | +Example URLs and Which Patterns Apply: |
| 19 | +1. https://example.com/docs/overview.html |
| 20 | + ✓ Matches Pattern A |
| 21 | + ✗ Doesn't match Pattern B or C |
| 22 | + Result: Pattern A applies (only match) |
| 23 | +
|
| 24 | +2. https://example.com/docs/api/endpoints.html |
| 25 | + ✓ Matches Pattern A |
| 26 | + ✓ Matches Pattern B |
| 27 | + ✗ Doesn't match Pattern C |
| 28 | + Result: Pattern B applies (20 < 100 URLs) |
| 29 | +
|
| 30 | +3. https://example.com/docs/api/v2/users.html |
| 31 | + ✓ Matches Pattern A |
| 32 | + ✓ Matches Pattern B |
| 33 | + ✓ Matches Pattern C |
| 34 | + Result: Pattern C applies (5 < 20 < 100 URLs) |
22 | 35 | ```
|
23 | 36 |
|
24 | 37 | ## Pattern Types and Resolution
|
25 | 38 |
|
26 | 39 | ### Title Patterns
|
27 |
| -```python |
28 |
| -# More specific title pattern takes precedence |
29 |
| -Pattern A: */docs/* → title="Documentation" # 100 URLs |
30 |
| -Pattern B: */docs/api/* → title="API Reference" # 20 URLs |
31 |
| -Result: URL gets title "API Reference" |
| 40 | +``` |
| 41 | +Patterns: |
| 42 | +A: */docs/* → title="Documentation" # Matches 100 URLs |
| 43 | +B: */docs/api/* → title="API Reference" # Matches 20 URLs |
| 44 | +C: */docs/api/v2/* → title="V2 API Guide" # Matches 5 URLs |
| 45 | +
|
| 46 | +Example URLs: |
| 47 | +1. https://example.com/docs/getting-started.html |
| 48 | + • Matches: Pattern A |
| 49 | + • Result: title="Documentation" |
| 50 | +
|
| 51 | +2. https://example.com/docs/api/authentication.html |
| 52 | + • Matches: Patterns A, B |
| 53 | + • Result: title="API Reference" |
| 54 | +
|
| 55 | +3. https://example.com/docs/api/v2/oauth.html |
| 56 | + • Matches: Patterns A, B, C |
| 57 | + • Result: title="V2 API Guide" |
32 | 58 | ```
|
33 | 59 |
|
34 | 60 | ### Division Patterns
|
35 |
| -```python |
36 |
| -# More specific division assignment wins |
37 |
| -Pattern A: *.pdf → division="GENERAL" # 500 URLs |
38 |
| -Pattern B: */specs/*.pdf → division="ENGINEERING" # 50 URLs |
39 |
| -Result: URL gets division "ENGINEERING" |
| 61 | +``` |
| 62 | +Patterns: |
| 63 | +A: *.pdf → division="GENERAL" # Matches 500 URLs |
| 64 | +B: */specs/*.pdf → division="ENGINEERING" # Matches 50 URLs |
| 65 | +C: */specs/2024/*.pdf → division="RESEARCH" # Matches 10 URLs |
| 66 | +
|
| 67 | +Example URLs: |
| 68 | +1. https://example.com/docs/report.pdf |
| 69 | + • Matches: Pattern A |
| 70 | + • Result: division="GENERAL" |
| 71 | +
|
| 72 | +2. https://example.com/specs/architecture.pdf |
| 73 | + • Matches: Patterns A, B |
| 74 | + • Result: division="ENGINEERING" |
| 75 | +
|
| 76 | +3. https://example.com/specs/2024/roadmap.pdf |
| 77 | + • Matches: Patterns A, B, C |
| 78 | + • Result: division="RESEARCH" |
40 | 79 | ```
|
41 | 80 |
|
42 | 81 | ### Document Type Patterns
|
43 |
| -```python |
44 |
| -# Most specific document type classification applies |
45 |
| -Pattern A: */docs/* → type="DOCUMENTATION" # 200 URLs |
46 |
| -Pattern B: */docs/data/* → type="DATA" # 30 URLs |
47 |
| -Result: URL gets type "DATA" |
| 82 | +``` |
| 83 | +Patterns: |
| 84 | +A: */docs/* → type="DOCUMENTATION" # Matches 200 URLs |
| 85 | +B: */docs/data/* → type="DATA" # Matches 30 URLs |
| 86 | +C: */docs/data/schemas/* → type="SCHEMA" # Matches 8 URLs |
| 87 | +
|
| 88 | +Example URLs: |
| 89 | +1. https://example.com/docs/guide.html |
| 90 | + • Matches: Pattern A |
| 91 | + • Result: type="DOCUMENTATION" |
| 92 | +
|
| 93 | +2. https://example.com/docs/data/metrics.json |
| 94 | + • Matches: Patterns A, B |
| 95 | + • Result: type="DATA" |
| 96 | +
|
| 97 | +3. https://example.com/docs/data/schemas/user.json |
| 98 | + • Matches: Patterns A, B, C |
| 99 | + • Result: type="SCHEMA" |
| 100 | +``` |
| 101 | + |
| 102 | +## Special Cases |
| 103 | + |
| 104 | +### Mixed Pattern Types |
| 105 | +``` |
| 106 | +When different pattern types overlap, each is resolved independently: |
| 107 | +
|
| 108 | +URL: https://example.com/docs/api/v2/schema.json |
| 109 | +Matching Patterns: |
| 110 | +1. */docs/* → title="Documentation", 100 matches |
| 111 | +2. */docs/* → doc_type="DOCUMENTATION", 100 matches |
| 112 | +3. */docs/api/* → title="API Reference", 50 matches |
| 113 | +4. */docs/api/v2/* → division="ENGINEERING", 10 matches |
| 114 | +5. */docs/api/v2/*.json → doc_type="DATA", 3 matches |
| 115 | +
|
| 116 | +Final Result: |
| 117 | +• title="API Reference" (from pattern 3, most specific title pattern) |
| 118 | +• division="ENGINEERING" (from pattern 4, only matching division pattern) |
| 119 | +• doc_type="DATA" (from pattern 5, most specific doc_type pattern) |
48 | 120 | ```
|
0 commit comments