Skip to content

Conversation

@rofe
Copy link

@rofe rofe commented Oct 23, 2025

ES6 Import Statement Handling Fix

Problem

Previously, the JavaScript writer in pywb would insert initialization code at the very beginning of JavaScript files via the first_buff mechanism. This broke ES6 modules because the ECMAScript specification requires that import statements must appear before any other code in a module.

Example of the Problem:

Before the fix:

// Injected code was inserted here, breaking ES6 modules
let window = _init('window');
import { foo } from 'module1';  // ERROR: import must come first!

Solution

Modified the StreamingRewriter class in pywb/rewrite/content_rewriter.py to detect ES6 import statements at the beginning of JavaScript files and insert the initialization code after all leading imports instead of before them.

Example After the Fix:

import { foo } from 'module1';  // Imports preserved at the top
import bar from 'module2';

// Injected code comes after imports
let window = _init('window');

console.log('test');

Changes Made

1. Added ES6 Import Detection Regex

Added a regex pattern to StreamingRewriter class that matches:

  • Leading comments (both // and /* */ style)
  • Leading whitespace
  • One or more ES6 import statements

2. New Method: _insert_with_import_check

Added a new method that:

  • Checks if the JavaScript content starts with import statements
  • If yes: inserts first_buff after all imports
  • If no: inserts first_buff at the beginning (original behavior)

3. Updated rewrite_complete Method

Modified to use the new _insert_with_import_check method for proper placement of injected code.

4. Updated rewrite_text_stream_to_gen Method

Enhanced the streaming version to handle ES6 imports:

  • Checks the first chunk for import statements
  • Only applies import detection for JavaScript files (not HTML, CSS, etc.)
  • Preserves backward compatibility for non-ES6 files

5. Added Test Cases

Added comprehensive test cases in pywb/rewrite/test/test_content_rewriter.py:

  • test_es6_imports_insertion_after_imports: Basic ES6 import handling
  • test_es6_imports_with_comments: Imports with leading comments
  • test_no_es6_imports_normal_insertion: Backward compatibility

Behavior Summary

Scenario Injection Point
ES6 imports at start After all imports
ES6 imports with comments After comments and imports
No imports At the beginning (unchanged)
Import in middle At the beginning (unchanged)
Inline JS attributes At the beginning (unchanged)

Files Modified

  1. pywb/rewrite/content_rewriter.py

    • Added IMPORT_REGEX class variable
    • Added _insert_with_import_check() method
    • Modified rewrite_complete() to use new method
    • Modified rewrite_text_stream_to_gen() to handle streaming with imports
  2. pywb/rewrite/test/test_content_rewriter.py

    • Added test_es6_imports_insertion_after_imports()
    • Added test_es6_imports_with_comments()
    • Added test_no_es6_imports_normal_insertion()

Testing

All test scenarios pass:

  • ES6 imports are correctly detected and preserved at the start
  • Injection code is placed after imports
  • Comments before imports are preserved
  • Files without imports continue to work as before
  • Imports in the middle of files are not treated specially

Impact

This fix ensures that pywb can properly handle modern JavaScript ES6 modules without breaking their syntax requirements, making pywb compatible with contemporary JavaScript development practices.

Fix #964

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES6 modules with imports broken

1 participant