Skip to content

Broken output when a docstring has a string like https://Β #95

@mdlinville

Description

@mdlinville

Bug Report: URLs in docstrings are incorrectly parsed as argument definitions

Summary

When a docstring contains a URL with a protocol (e.g., https://), lazydocs incorrectly interprets the colon in the protocol as an argument separator, leading to malformed markdown output with broken URLs and unclosed HTML tags.

Environment

  • lazydocs version: Latest (as of October 2025)
  • Python version: 3.9+
  • OS: macOS/Linux

Description

The issue occurs in src/lazydocs/generation.py where the _RE_ARGSTART regex pattern (line 36) is too broad:

_RE_ARGSTART = re.compile(r"^(.+):[ ]+(.{2,})$", re.IGNORECASE)

This pattern matches any line containing a colon followed by spaces and at least 2 characters. When processing docstrings in an Args/Parameters section, this causes URLs to be incorrectly parsed.

Steps to Reproduce

  1. Create a Python function with a docstring containing a URL in the Args section:
def init(project_name: str):
    """Initialize the project.
    
    Args:
        project_name: The name of the project.
        To find or update your default entity, refer to [User Settings](https://docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.
    """
    pass
  1. Run lazydocs on this file
  2. Observe the generated markdown

Expected Behavior

The URL should remain intact in the generated markdown:

- **`project_name`**: The name of the project.
- To find or update your default entity, refer to [User Settings](https://docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.

Actual Behavior

The URL is broken and HTML tags are malformed:

- **`project_name`**: The name of the project.
- <b>`To find or update your default entity, refer to [User Settings](https`</b>: //docs.wandb.ai/guides/models/app/settings-page/user-settings/#default-team) in the W&B Models documentation.

Root Cause

In generation.py lines 584-589, when a line matches _RE_ARGSTART, it's processed as an argument:

elif arg_list and not literal_block and _RE_ARGSTART.match(line):
    # start of an exception-type block
    out.append(
        "- "
        + _RE_ARGSTART.sub(r"<b>`\1`</b>: \2", line)
    )

The substitution r"<b>\1</b>: \2" treats everything before the colon as the argument name (wrapped in bold backticks) and everything after as the description.

Proposed Fix

The _RE_ARGSTART pattern should be more restrictive to avoid matching lines with URLs. Possible solutions:

  1. Exclude lines containing URLs: Check if the line contains :// before applying the regex
  2. Restrict argument names: Limit what can be considered an argument name (e.g., only allow valid Python identifiers)
  3. Look for indentation patterns: Real argument descriptions typically follow specific indentation patterns

Example fix for option 2:

# Only match valid Python identifier-like patterns as argument names
_RE_ARGSTART = re.compile(r"^([\w\[\]_]+):[ ]+(.{2,})$", re.IGNORECASE)

Impact

This bug affects any documentation that includes URLs in docstring parameter sections, resulting in:

  • Broken links in generated documentation
  • Invalid HTML/Markdown that fails to parse in some renderers
  • Poor documentation readability

Workaround

Users can currently work around this by:

  1. Moving URLs outside of Args/Parameters sections
  2. Post-processing the generated markdown to fix broken URLs
  3. Using a different documentation format for URLs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions