-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Bug Report: URLs in docstrings are incorrectly parsed as argument definitions
Summary
When a docstring contains a URL with a protocol (e.g., https://
), lazydocs incorrectly interprets the colon in the protocol as an argument separator, leading to malformed markdown output with broken URLs and unclosed HTML tags.
Environment
- lazydocs version: Latest (as of October 2025)
- Python version: 3.9+
- OS: macOS/Linux
Description
The issue occurs in src/lazydocs/generation.py
where the _RE_ARGSTART
regex pattern (line 36) is too broad:
_RE_ARGSTART = re.compile(r"^(.+):[ ]+(.{2,})$", re.IGNORECASE)
This pattern matches any line containing a colon followed by spaces and at least 2 characters. When processing docstrings in an Args/Parameters section, this causes URLs to be incorrectly parsed.
Steps to Reproduce
- Create a Python function with a docstring containing a URL in the Args section:
def init(project_name: str):
"""Initialize the project.
Args:
project_name: The name of the project.
To find or update your default entity, refer to [User Settings](https://docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.
"""
pass
- Run lazydocs on this file
- Observe the generated markdown
Expected Behavior
The URL should remain intact in the generated markdown:
- **`project_name`**: The name of the project.
- To find or update your default entity, refer to [User Settings](https://docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.
Actual Behavior
The URL is broken and HTML tags are malformed:
- **`project_name`**: The name of the project.
- <b>`To find or update your default entity, refer to [User Settings](https`</b>: //docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.
Root Cause
In generation.py
lines 584-589, when a line matches _RE_ARGSTART
, it's processed as an argument:
elif arg_list and not literal_block and _RE_ARGSTART.match(line):
# start of an exception-type block
out.append(
"- "
+ _RE_ARGSTART.sub(r"<b>`\1`</b>: \2", line)
)
The substitution r"<b>
\1</b>: \2"
treats everything before the colon as the argument name (wrapped in bold backticks) and everything after as the description.
Proposed Fix
The _RE_ARGSTART
pattern should be more restrictive to avoid matching lines with URLs. Possible solutions:
- Exclude lines containing URLs: Check if the line contains
://
before applying the regex - Restrict argument names: Limit what can be considered an argument name (e.g., only allow valid Python identifiers)
- Look for indentation patterns: Real argument descriptions typically follow specific indentation patterns
Example fix for option 2:
# Only match valid Python identifier-like patterns as argument names
_RE_ARGSTART = re.compile(r"^([\w\[\]_]+):[ ]+(.{2,})$", re.IGNORECASE)
Impact
This bug affects any documentation that includes URLs in docstring parameter sections, resulting in:
- Broken links in generated documentation
- Invalid HTML/Markdown that fails to parse in some renderers
- Poor documentation readability
Workaround
Users can currently work around this by:
- Moving URLs outside of Args/Parameters sections
- Post-processing the generated markdown to fix broken URLs
- Using a different documentation format for URLs