Skip to content

Commit 6dac075

Browse files
DOC-5804 updated spec based on implementation 'experience'
1 parent 01a38f8 commit 6dac075

File tree

1 file changed

+221
-34
lines changed

1 file changed

+221
-34
lines changed

build/tcedocs/SPECIFICATION.md

Lines changed: 221 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -345,32 +345,186 @@ This allows documentation to include "Try this in Jupyter" links that launch int
345345

346346
The parser should implement the following logic in `build/components/example.py`:
347347

348-
1. **Detection**: After reading the `EXAMPLE:` line, check subsequent lines for `BINDER_ID` marker
349-
- Look for the pattern: `{comment_prefix} BINDER_ID {hash}`
350-
- Example for Python: `# BINDER_ID 6bbed3da294e8de5a8c2ad99abf883731a50d4dd`
351-
- Example for JavaScript: `// BINDER_ID 6bbed3da294e8de5a8c2ad99abf883731a50d4dd`
352-
353-
2. **Extraction**: Parse the hash value
354-
- Strip the comment prefix and `BINDER_ID` keyword
355-
- Trim whitespace from the remaining string
356-
- The result should be a 40-character Git commit SHA (hexadecimal)
357-
- Example regex pattern: `{comment_prefix}\s*BINDER_ID\s+([a-f0-9]{40})`
358-
359-
3. **Validation** (optional but recommended):
360-
- Verify the hash is exactly 40 characters
361-
- Verify it contains only hexadecimal characters (0-9, a-f)
362-
- Log a warning if validation fails but continue processing
363-
364-
4. **Storage**: Add to metadata
365-
- Store in the language-specific metadata object as `binderId`
366-
- Type: string
367-
- Only include the field if `BINDER_ID` marker was found
368-
- Do not set to null or empty string if not found - omit the field entirely
369-
370-
5. **Line Processing**: The `BINDER_ID` line should be treated like `EXAMPLE:`
371-
- It should NOT appear in the processed output file
372-
- It should NOT affect line number calculations for highlight/hidden ranges
373-
- Remove it during processing (similar to how `EXAMPLE:` line is handled)
348+
**1. Add Constant and Class Attribute**:
349+
350+
First, add the constant at the top of the file with other marker constants:
351+
```python
352+
BINDER_ID = 'BINDER_ID'
353+
```
354+
355+
Add the attribute to the `Example` class:
356+
```python
357+
class Example(object):
358+
language = None
359+
path = None
360+
content = None
361+
hidden = None
362+
highlight = None
363+
named_steps = None
364+
binder_id = None # Add this
365+
```
366+
367+
Initialize in `__init__`:
368+
```python
369+
self.binder_id = None
370+
```
371+
372+
**2. Compile Regex Pattern**:
373+
374+
In the `make_ranges()` method, add the regex pattern compilation alongside other patterns (after `exid` pattern):
375+
```python
376+
exid = re.compile(f'{PREFIXES[self.language]}\\s?{EXAMPLE}')
377+
binder = re.compile(f'{PREFIXES[self.language]}\\s?{BINDER_ID}\\s+([a-f0-9]{{40}})')
378+
go_output = re.compile(f'{PREFIXES[self.language]}\\s?{GO_OUTPUT}')
379+
```
380+
381+
**Pattern explanation**:
382+
- `{PREFIXES[self.language]}` - Language-specific comment prefix (e.g., `#` or `//`)
383+
- `\\s?` - Optional whitespace after comment prefix
384+
- `{BINDER_ID}` - The literal string "BINDER_ID"
385+
- `\\s+` - Required whitespace before hash
386+
- `([a-f0-9]{40})` - Capture group for exactly 40 hexadecimal characters
387+
388+
**3. Detection and Extraction**:
389+
390+
Add detection logic in the main processing loop, **after** the `EXAMPLE:` check and **before** the `GO_OUTPUT` check:
391+
392+
```python
393+
elif re.search(exid, l):
394+
output = False
395+
pass
396+
elif re.search(binder, l):
397+
# Extract BINDER_ID hash value
398+
match = re.search(binder, l)
399+
if match:
400+
self.binder_id = match.group(1)
401+
logging.debug(f'Found BINDER_ID: {self.binder_id} in {self.path}:L{curr+1}')
402+
output = False # CRITICAL: Skip this line from output
403+
elif self.language == "go" and re.search(go_output, l):
404+
# ... rest of processing
405+
```
406+
407+
**Critical implementation details**:
408+
- **Must set `output = False`**: This prevents the line from being added to the `content` array
409+
- **Placement matters**: Must be in the `elif` chain, not a separate `if` statement
410+
- **No `content.append(l)`**: The line is skipped entirely, just like `EXAMPLE:` lines
411+
- **Extract before setting output**: Get the hash value before marking the line to skip
412+
413+
**4. Storage in Metadata**:
414+
415+
In `build/local_examples.py`, add the `binderId` field conditionally after creating the metadata dictionary:
416+
417+
```python
418+
example_metadata = {
419+
'source': source_file,
420+
'language': language,
421+
'target': target_file,
422+
'highlight': example.highlight,
423+
'hidden': example.hidden,
424+
'named_steps': example.named_steps,
425+
'sourceUrl': None
426+
}
427+
428+
# Add binderId only if it exists
429+
if example.binder_id:
430+
example_metadata['binderId'] = example.binder_id
431+
432+
examples_data[example_id][client_name] = example_metadata
433+
```
434+
435+
In `build/components/component.py`, add similarly after setting other metadata fields:
436+
437+
```python
438+
example_metadata['highlight'] = e.highlight
439+
example_metadata['hidden'] = e.hidden
440+
example_metadata['named_steps'] = e.named_steps
441+
example_metadata['sourceUrl'] = (
442+
f'{ex["git_uri"]}/tree/{default_branch}/{ex["path"]}/{os.path.basename(f)}'
443+
)
444+
445+
# Add binderId only if it exists
446+
if e.binder_id:
447+
example_metadata['binderId'] = e.binder_id
448+
449+
examples = self._root._examples
450+
```
451+
452+
**Why conditional addition**:
453+
- Only add the field if `binder_id` is not `None`
454+
- This keeps the JSON clean - examples without BinderHub links don't have the field
455+
- Avoids `null` or empty string values in the metadata
456+
457+
**5. Line Processing Behavior**:
458+
459+
The `BINDER_ID` line is removed from output through the same mechanism as other marker lines:
460+
461+
- **How it works**: Setting `output = False` prevents the line from reaching the `else` block that calls `content.append(l)`
462+
- **Line number impact**: Because the line is never added to `content`, it doesn't affect line number calculations for steps, highlights, or hidden ranges
463+
- **Result**: The processed file is clean, containing only the actual code without any marker comments
464+
465+
**Common Pitfalls**:
466+
1. **Forgetting `output = False`**: The line will appear in processed output
467+
2. **Wrong placement in elif chain**: May not be detected or may interfere with other markers
468+
3. **Using `if` instead of `elif`**: Could cause multiple conditions to match
469+
4. **Not checking `if match`**: Could cause AttributeError if regex doesn't match
470+
5. **Adding field unconditionally**: Results in `"binderId": null` in JSON for examples without the marker
471+
472+
**6. Complete Example Flow**:
473+
474+
Here's a complete example showing how a file is processed:
475+
476+
**Input file** (`local_examples/client-specific/redis-py/landing.py`):
477+
```python
478+
# EXAMPLE: landing
479+
# BINDER_ID 6bbed3da294e8de5a8c2ad99abf883731a50d4dd
480+
import redis
481+
482+
# STEP_START connect
483+
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
484+
# STEP_END
485+
```
486+
487+
**Processing steps**:
488+
1. Line 1: `EXAMPLE:` detected → `output = False` → line skipped
489+
2. Line 2: `BINDER_ID` detected → extract hash `6bbed3da294e8de5a8c2ad99abf883731a50d4dd``output = False` → line skipped
490+
3. Line 3: `import redis` → no marker → added to `content` array at index 0
491+
4. Line 4: Empty line → added to `content` array at index 1
492+
5. Line 5: `STEP_START` detected → record step start at line 3 (len(content) + 1) → line skipped
493+
6. Line 6: Code → added to `content` array at index 2
494+
7. Line 7: `STEP_END` detected → record step range "3-3" → line skipped
495+
496+
**Output file** (`examples/landing/local_client-specific_redis-py_landing.py`):
497+
```python
498+
import redis
499+
500+
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
501+
```
502+
503+
**Metadata** (`data/examples.json`):
504+
```json
505+
{
506+
"landing": {
507+
"Python": {
508+
"source": "local_examples/client-specific/redis-py/landing.py",
509+
"language": "python",
510+
"target": "examples/landing/local_client-specific_redis-py_landing.py",
511+
"highlight": ["1-3"],
512+
"hidden": [],
513+
"named_steps": {
514+
"connect": "3-3"
515+
},
516+
"sourceUrl": null,
517+
"binderId": "6bbed3da294e8de5a8c2ad99abf883731a50d4dd"
518+
}
519+
}
520+
}
521+
```
522+
523+
**Key observations**:
524+
- Both `EXAMPLE:` and `BINDER_ID` lines are removed from output
525+
- Line numbers in metadata refer to the processed file (after marker removal)
526+
- `binderId` is stored at the language level, not the example set level
527+
- The hash value is extracted cleanly without comment prefix or keyword
374528

375529
**Output Metadata** (stored in `examples.json`):
376530
- `highlight`: Line ranges to highlight (e.g., `["1-10", "15-20"]`)
@@ -1006,6 +1160,35 @@ ModuleNotFoundError: No module named 'pytoml'
10061160
2. Check `data/examples.json` has entry for that language
10071161
3. Ensure `label` field matches exactly (case-sensitive)
10081162

1163+
**BINDER_ID not extracted or appearing in output**:
1164+
- **Symptom 1**: `binderId` field missing from `data/examples.json`
1165+
- **Cause**: Regex pattern not matching the line
1166+
- **Debug**:
1167+
1. Check comment prefix matches language: `# BINDER_ID` for Python, `// BINDER_ID` for JavaScript
1168+
2. Verify hash is exactly 40 hexadecimal characters (lowercase a-f, 0-9)
1169+
3. Check for extra whitespace or special characters
1170+
4. Run with debug logging: `python3 build/local_examples.py --loglevel DEBUG`
1171+
5. Look for "Found BINDER_ID" message in logs
1172+
- **Fix**: Ensure format is exactly `{comment_prefix} BINDER_ID {40-char-hash}`
1173+
1174+
- **Symptom 2**: `BINDER_ID` line appears in processed output file
1175+
- **Cause**: `output = False` not set in detection logic
1176+
- **Fix**: Verify the `elif re.search(binder, l):` block sets `output = False`
1177+
- **Verify**: Check processed file in `examples/{example_id}/` - should not contain `BINDER_ID` line
1178+
1179+
- **Symptom 3**: `"binderId": null` in metadata
1180+
- **Cause**: Field added unconditionally instead of conditionally
1181+
- **Fix**: Only add field if `example.binder_id` is not None:
1182+
```python
1183+
if example.binder_id:
1184+
example_metadata['binderId'] = example.binder_id
1185+
```
1186+
1187+
- **Symptom 4**: Wrong hash value extracted
1188+
- **Cause**: Regex capture group not matching correctly
1189+
- **Debug**: Check the regex pattern includes capture group: `([a-f0-9]{40})`
1190+
- **Fix**: Ensure using `match.group(1)` to extract the captured hash
1191+
10091192
### Performance Issues
10101193

10111194
**Build takes too long**:
@@ -1269,16 +1452,20 @@ In Markdown files:
12691452

12701453
| Marker | Purpose | Example | Notes |
12711454
|--------|---------|---------|-------|
1272-
| `EXAMPLE: id` | Define example ID | `# EXAMPLE: home_vecsets` | Must be first line |
1273-
| `BINDER_ID hash` | Define BinderHub commit hash | `# BINDER_ID 6bbed3da294e8de5a8c2ad99abf883731a50d4dd` | Optional, typically line 2. Used to generate interactive notebook links. Hash is a Git commit SHA from binder-launchers repo. |
1274-
| `HIDE_START` | Start hidden block | `# HIDE_START` | Code hidden by default |
1455+
| `EXAMPLE: id` | Define example ID | `# EXAMPLE: home_vecsets` | **Required**. Must be first line. Removed from processed output. |
1456+
| `BINDER_ID hash` | Define BinderHub commit hash | `# BINDER_ID 6bbed3da294e8de5a8c2ad99abf883731a50d4dd` | **Optional**. Typically line 2 (after EXAMPLE). Hash must be exactly 40 hexadecimal characters (Git commit SHA). Removed from processed output. Stored as `binderId` in metadata. Used to generate interactive Jupyter notebook links. |
1457+
| `HIDE_START` | Start hidden block | `# HIDE_START` | Code hidden by default, revealed with eye button |
12751458
| `HIDE_END` | End hidden block | `# HIDE_END` | Must close HIDE_START |
1276-
| `REMOVE_START` | Start removed block | `# REMOVE_START` | Code completely removed |
1459+
| `REMOVE_START` | Start removed block | `# REMOVE_START` | Code completely removed from output |
12771460
| `REMOVE_END` | End removed block | `# REMOVE_END` | Must close REMOVE_START |
1278-
| `STEP_START name` | Start named step | `# STEP_START connect` | Name is lowercase |
1279-
| `STEP_END` | End named step | `# STEP_END` | Must close STEP_START |
1280-
1281-
**Important**: All markers must use the correct comment prefix for the language (see [Language Mappings](#language-mappings)).
1461+
| `STEP_START name` | Start named step | `# STEP_START connect` | Name is lowercase. Removed from output. |
1462+
| `STEP_END` | End named step | `# STEP_END` | Must close STEP_START. Removed from output. |
1463+
1464+
**Important**:
1465+
- All markers must use the correct comment prefix for the language (see [Language Mappings](#language-mappings))
1466+
- Marker lines (`EXAMPLE:`, `BINDER_ID`, `STEP_START`, `STEP_END`, `HIDE_START`, `HIDE_END`, `REMOVE_START`, `REMOVE_END`) are **removed** from the processed output file
1467+
- Only the code between markers appears in the final processed file
1468+
- Line numbers in metadata (highlight, hidden, named_steps) refer to the processed file, not the source file
12821469

12831470
### Shortcode Parameter Reference
12841471

0 commit comments

Comments
 (0)