Skip to content

Conversation

@NarayanBavisetti
Copy link
Collaborator

@NarayanBavisetti NarayanBavisetti commented Aug 26, 2025

Description

introduced a new nh3 package to sanitize the HTML content in descriptions.|

Type of Change

  • Improvement (change that would cause existing functionality to not work as expected)

Summary by CodeRabbit

  • New Features
    • Automatic sanitization of description HTML across issues, projects, pages, workspaces, stickies, and drafts; sanitized content is persisted.
  • Bug Fixes
    • Consistent handling and clear errors for empty/oversized/invalid HTML and binary description data.
    • JSON-based description validation relaxed/removed across serializers.
  • Chores
    • Added an HTML sanitization library dependency.

Copilot AI review requested due to automatic review settings August 26, 2025 08:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Replaces custom regex-based HTML checks with NH3 sanitization in content_validator, changes validate_html_content to return (is_valid, error_msg, clean_html), removes JSON-content validation helpers, updates serializers to unpack the third return value and persist sanitized HTML when provided, and adds nh3 to requirements.

Changes

Cohort / File(s) Summary
HTML validation core
apps/api/plane/utils/content_validator.py
Replaces regex/tag checks with nh3.clean; signature changed to validate_html_content(html_content: str) returning (is_valid, error_message, clean_html); removes JSON validators and pattern constants; handles empty, oversize, exceptions and logs failures.
API serializers
apps/api/plane/api/serializers/issue.py, apps/api/plane/api/serializers/project.py
Remove validate_json_content usage; call validate_html_content and unpack (is_valid, error_msg, sanitized_html); on invalid raise generic ValidationError {"error": "html content is not valid"}; if sanitized_html present, write back to data["description_html"].
App serializers (draft/issue/page/project/workspace)
apps/api/plane/app/serializers/draft.py, apps/api/plane/app/serializers/issue.py, apps/api/plane/app/serializers/page.py, apps/api/plane/app/serializers/project.py, apps/api/plane/app/serializers/workspace.py
Remove JSON description validation; update HTML validation to unpack three-tuple from validate_html_content; on invalid raise generic errors; persist sanitized_html into attrs/return sanitized value; binary validation error messages standardized to "Invalid binary data".
Space serializer
apps/api/plane/space/serializer/issue.py
Remove JSON description validation; update HTML validation to unpack three-tuple and persist sanitized_html when provided; adjust binary error message handling.
Dependencies
apps/api/requirements/base.txt
Adds nh3==0.2.18 with comment # html sanitizer.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Serializer
  participant Validator as validate_html_content
  participant NH3

  Client->>Serializer: submit description_html
  Serializer->>Validator: validate_html_content(html)
  Validator->>NH3: nh3.clean(html)
  alt sanitization succeeds
    NH3-->>Validator: clean_html
    Validator-->>Serializer: (True, None, clean_html)
    alt clean_html present
      Serializer->>Serializer: set description_html = clean_html
    else
      Serializer->>Serializer: keep original description_html
    end
    Serializer-->>Client: proceed (validated)
  else sanitization fails
    Validator-->>Serializer: (False, error_msg, None)
    Serializer-->>Client: raise ValidationError({"error":"html content is not valid"})
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • dheeru0198
  • sriramveeraghanta

Poem

A nibble of markup, nibble and sweep,
NH3 makes the HTML tidy and neat.
Serializers hop, sanitize in a bound,
Clean fields saved — hooray for safe ground! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore-content-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

This comment was marked as outdated.

@NarayanBavisetti NarayanBavisetti changed the title chore: changed the html validation [WEB-4780] chore: changed the html validation Aug 26, 2025
@makeplane
Copy link

makeplane bot commented Aug 26, 2025

Pull Request Linked with Plane Work Items

Comment Automatically Generated by Plane

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/api/plane/app/serializers/page.py (1)

226-238: Good adoption of the 3-tuple; also mask internal sanitizer errors.

The nh3 integration and “prefer sanitized_html” logic look correct. However, bubbling the sanitizer’s raw error string to clients can expose internals. Return a generic message instead; log details server-side.

-        is_valid, error_message, sanitized_html = validate_html_content(value)
-        if not is_valid:
-            raise serializers.ValidationError(error_message)
+        is_valid, error_message, sanitized_html = validate_html_content(value)
+        if not is_valid:
+            # Avoid exposing internal sanitizer exceptions
+            raise serializers.ValidationError("Invalid or unsafe HTML content")
apps/api/plane/api/serializers/issue.py (1)

596-605: Comment HTML is parsed but not sanitized — add nh3 sanitization.

Issue comments accept HTML via comment_html. Without nh3, this path can store unsafe markup (XSS). Sanitize and prefer sanitized_html just like issues/pages.

     def validate(self, data):
-        try:
-            if data.get("comment_html", None) is not None:
-                parsed = html.fromstring(data["comment_html"])
-                parsed_str = html.tostring(parsed, encoding="unicode")
-                data["comment_html"] = parsed_str
-
-        except Exception:
-            raise serializers.ValidationError("Invalid HTML passed")
-        return data
+        # Sanitize first, then optionally normalize
+        try:
+            if data.get("comment_html") is not None:
+                is_valid, err, sanitized = validate_html_content(data["comment_html"])
+                if not is_valid:
+                    raise serializers.ValidationError({"comment_html": "Invalid or unsafe HTML content"})
+                if sanitized is not None:
+                    data["comment_html"] = sanitized
+                parsed = html.fromstring(data["comment_html"])
+                data["comment_html"] = html.tostring(parsed, encoding="unicode")
+        except Exception:
+            raise serializers.ValidationError({"comment_html": "Invalid HTML passed"})
+        return data
♻️ Duplicate comments (4)
apps/api/plane/space/serializer/issue.py (1)

302-303: Mask sanitizer internals in API errors

Same concern as other serializers: avoid surfacing raw sanitizer exception strings in API responses. Centralize via content_validator change (generic client message + server logging).

apps/api/plane/api/serializers/project.py (1)

225-226: Do not leak sanitizer exception details

Ensure clients see a generic error (size violations can remain specific), with details logged server-side via content_validator as suggested elsewhere.

To catch any other occurrences where assignment precedes validation, you can run:

#!/bin/bash
# Find all validate_html_content call sites with nearby writes and raises
rg -nC3 --type=py 'validate_html_content\s*\(' apps/api | sed -n '1,200p'
apps/api/plane/app/serializers/draft.py (1)

88-89: Harden error message exposure

As with other serializers, route detailed exceptions to logs via content_validator and keep client-facing messages generic.

apps/api/plane/app/serializers/workspace.py (1)

331-332: Prevent information disclosure in validation errors

Ensure content_validator returns a generic message on sanitizer failures; raise that here to avoid leaking internal exception text.

🧹 Nitpick comments (8)
apps/api/plane/utils/content_validator.py (1)

90-97: Make the signature accept Optional[str] (test passes None).

validate_html_content is annotated to accept str, but your new test feeds None and the function handles falsy inputs. Tighten the contract to Optional[str] and align the docstring for clarity and type-checkers.

-def validate_html_content(html_content: str):
+from typing import Optional, Tuple
+
+def validate_html_content(html_content: Optional[str]) -> Tuple[bool, Optional[str], Optional[str]]:
     """
-    Sanitize HTML content using nh3.
-    Returns a tuple: (is_valid, error_message, clean_html)
+    Sanitize HTML content using nh3.
+    Returns a tuple: (is_valid, error_message, sanitized_html)
     """
     if not html_content:
         return True, None, None
apps/api/plane/utils/test_nh3.py (2)

10-19: Import from the package, drop sys.path hacks.

The sys.path manipulation makes this a script rather than a deterministic test and can mask packaging issues. Import directly from the package namespace.

-# Add the parent directory to the path so we can import the module
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-
-try:
-    from content_validator import validate_html_content
+try:
+    from plane.utils.content_validator import validate_html_content

22-89: Convert this script to real unit tests with assertions.

This file currently prints results; CI won’t catch regressions. Replace the ad-hoc runner with pytest tests (or Django TestCase) and assert both validity and the expected sanitized output for representative cases.

Example (pytest-style) replacement:

# supporting snippet (new test body)
import pytest
from plane.utils.content_validator import validate_html_content

@pytest.mark.parametrize(
    "html, expect_valid, expect_diff",
    [
        ("<p>Hello</p>", True, False),
        ("<script>alert(1)</script><p>Hi</p>", True, True),
        ("<p onclick='alert(1)'>X</p>", True, True),
        ("", True, False),
        (None, True, False),
    ],
)
def test_validate_html_content(html, expect_valid, expect_diff):
    is_valid, err, sanitized = validate_html_content(html)
    assert is_valid is expect_valid
    assert err is None
    if html:
        assert (sanitized != html) is expect_diff
    else:
        assert sanitized in (None, "")

If you’d like, I can translate this to your preferred test runner and file layout.

apps/api/plane/app/serializers/project.py (1)

84-90: LGTM; sanitized_html is persisted when available.

The triple-unpack and in-place data update are correct. Optionally, standardize the error surface by mapping sanitizer failures to a generic message (consistent with Page/Issue serializers) and logging details internally.

apps/api/plane/api/serializers/issue.py (1)

82-90: Sanitize before parsing/round-tripping via lxml.

Parsing first is not a sanitizer and can preserve unsafe nodes/attrs. Prefer nh3 sanitization first, then lxml normalization if you still want canonical HTML.

-        try:
-            if data.get("description_html", None) is not None:
-                parsed = html.fromstring(data["description_html"])
-                parsed_str = html.tostring(parsed, encoding="unicode")
-                data["description_html"] = parsed_str
-
-        except Exception:
-            raise serializers.ValidationError("Invalid HTML passed")
+        # Sanitize first to reduce attack surface for the parser
+        try:
+            if data.get("description_html") is not None:
+                is_valid, err, sanitized = validate_html_content(data["description_html"])
+                if not is_valid:
+                    raise serializers.ValidationError({"description_html": "Invalid or unsafe HTML content"})
+                if sanitized is not None:
+                    data["description_html"] = sanitized
+                # Optional: normalize with lxml after sanitization
+                parsed = html.fromstring(data["description_html"])
+                data["description_html"] = html.tostring(parsed, encoding="unicode")
+        except Exception:
+            raise serializers.ValidationError({"description_html": "Invalid HTML passed"})
apps/api/plane/space/serializer/issue.py (1)

298-307: Consider consolidating repeated sanitization logic

The validate block repeats across multiple serializers. A shared helper/mixin would reduce drift and simplify future changes.

Example mixin you can adopt across serializers:

class HtmlSanitizeMixin:
    GENERIC_HTML_ERROR = "Invalid HTML content."
    def sanitize_html_field(self, data: dict, field: str):
        if field in data and data[field]:
            is_valid, error_msg, clean = validate_html_content(str(data[field]))
            if not is_valid:
                # rely on content_validator to return generic messages
                raise serializers.ValidationError({field: error_msg or self.GENERIC_HTML_ERROR})
            if clean is not None:
                data[field] = clean

Then call: self.sanitize_html_field(data, "description_html").

apps/api/plane/api/serializers/project.py (2)

219-225: Minor ordering nit: assign sanitized HTML only after passing validation

Assigning sanitized_html before checking is_valid could mutate data in a future partial-failure case. Swap the order for clarity and safety.

Apply:

-                is_valid, error_msg, sanitized_html = validate_html_content(
-                    str(data["description_html"])
-                )
-                # Update the data with sanitized HTML if available
-                if sanitized_html is not None:
-                    data["description_html"] = sanitized_html
-            if not is_valid:
-                raise serializers.ValidationError({"description_html": error_msg})
+                is_valid, error_msg, sanitized_html = validate_html_content(
+                    str(data["description_html"])
+                )
+                if not is_valid:
+                    raise serializers.ValidationError({"description_html": error_msg})
+                if sanitized_html is not None:
+                    data["description_html"] = sanitized_html

215-227: Optional: centralize sanitization across serializers

Multiple serializers replicate the same pattern. Consider introducing a shared mixin/helper (see prior suggestion) to avoid drift and ease future changes to the sanitizer or messages.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0fe7da6 and d9c9e1b.

📒 Files selected for processing (10)
  • apps/api/plane/api/serializers/issue.py (1 hunks)
  • apps/api/plane/api/serializers/project.py (1 hunks)
  • apps/api/plane/app/serializers/draft.py (1 hunks)
  • apps/api/plane/app/serializers/issue.py (1 hunks)
  • apps/api/plane/app/serializers/page.py (1 hunks)
  • apps/api/plane/app/serializers/project.py (1 hunks)
  • apps/api/plane/app/serializers/workspace.py (1 hunks)
  • apps/api/plane/space/serializer/issue.py (1 hunks)
  • apps/api/plane/utils/content_validator.py (2 hunks)
  • apps/api/plane/utils/test_nh3.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-23T18:18:06.875Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7460
File: apps/api/plane/app/serializers/draft.py:112-122
Timestamp: 2025-07-23T18:18:06.875Z
Learning: In the Plane codebase serializers, workspace_id is not consistently passed in serializer context, so parent issue validation in DraftIssueCreateSerializer only checks project_id rather than both workspace_id and project_id. The existing project member authentication system already validates that users can only access projects they belong to, providing sufficient security without risking breaking functionality by adding workspace_id validation where the context might not be available.

Applied to files:

  • apps/api/plane/space/serializer/issue.py
  • apps/api/plane/app/serializers/draft.py
  • apps/api/plane/app/serializers/workspace.py
🧬 Code graph analysis (9)
apps/api/plane/app/serializers/page.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/utils/test_nh3.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/space/serializer/issue.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/app/serializers/issue.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/app/serializers/draft.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/api/serializers/project.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/app/serializers/workspace.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/app/serializers/project.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
apps/api/plane/api/serializers/issue.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (90-106)
🪛 GitHub Check: CodeQL
apps/api/plane/app/serializers/page.py

[warning] 234-234: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

apps/api/plane/space/serializer/issue.py

[warning] 303-303: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.

apps/api/plane/app/serializers/issue.py

[warning] 141-141: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.

apps/api/plane/app/serializers/draft.py

[warning] 89-89: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.

apps/api/plane/app/serializers/workspace.py

[warning] 332-332: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
Stack trace information flows to this location and may be exposed to an external user.

apps/api/plane/api/serializers/issue.py

[warning] 102-102: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (5)
apps/api/plane/app/serializers/page.py (1)

204-225: Model field type validated for description_binary

I’ve confirmed that in apps/api/plane/db/models/page.py (line 30), description_binary is defined as a models.BinaryField(null=True). Since validate_description_binary returns raw bytes, assigning it directly to that field is correct—no charset or mojibake issues will occur. No further changes are needed here.

apps/api/plane/app/serializers/issue.py (1)

137-139: NH3 sanitization adoption looks good

Correctly unpacks (is_valid, error_msg, sanitized_html) and persists sanitized_html back to attrs only when provided. Ordering (validate → raise on invalid → assign sanitized) is safe and consistent.

Also applies to: 143-145

apps/api/plane/space/serializer/issue.py (1)

299-301: NH3 sanitization flow is correct

Triple-unpack and conditional persistence of sanitized_html are implemented correctly and align with other serializers.

Also applies to: 305-306

apps/api/plane/app/serializers/draft.py (1)

85-87: Looks good: safe sanitize-then-assign pattern

The triple-unpack, invalid check, and conditional assignment mirror the new contract correctly.

Also applies to: 91-92

apps/api/plane/app/serializers/workspace.py (1)

328-330: Sanitization handling is correct and consistent

Triple return handled properly; sanitized_html is persisted only when present.

Also applies to: 334-335

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
apps/api/plane/utils/content_validator.py (1)

89-96: Allow Optional[str] input and clarify empty-vs-None semantics.

Today the guard treats any falsy value (including None) as valid and returns (True, None, None), while the type hint says str. This mismatch can confuse static analysis and callers. Prefer Optional[str] and make the behavior explicit for None vs empty string.

Apply:

-def validate_html_content(html_content: str):
+def validate_html_content(html_content: "Optional[str]") -> "Tuple[bool, Optional[str], Optional[str]]":
@@
-    if not html_content:
-        return True, None, None
+    # None: treat as "not provided" → no sanitized_html
+    if html_content is None:
+        return True, None, None
+    # Empty string: explicitly return empty sanitized_html
+    if html_content == "":
+        return True, None, ""

Add these imports at the top of the file:

from typing import Optional, Tuple

Optionally, update the docstring to document the above semantics.

🧹 Nitpick comments (2)
apps/api/plane/utils/content_validator.py (2)

98-99: Size check is correct; avoid double work if upstream already bounds payloads.

Encoding to UTF-8 to measure bytes is fine. If upstream request parsing enforces a lower limit, consider reusing that constant to avoid drift. No action needed if this is your SSOT.


101-106: Add targeted tests for sanitizer coverage (XSS, attributes, link rel, size limit).

To prevent regressions, add unit tests asserting:

  • script/style/svg/math elements are removed (or escaped, depending on policy).
  • javascript: and data:text/html URLs are dropped or neutralized.
  • target="_blank" gets safe rel attributes.
  • Large (>10MB) input returns the size error.
  • Empty string and None behavior matches the documented contract.

I can scaffold pytest cases aligned with your chosen Cleaner policy.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d9c9e1b and 41e35c7.

📒 Files selected for processing (2)
  • apps/api/plane/utils/content_validator.py (2 hunks)
  • apps/api/requirements/base.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/utils/content_validator.py (1)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
🔇 Additional comments (1)
apps/api/plane/utils/content_validator.py (1)

91-93: All validate_html_content call sites now handle the 3-tuple return; no further action needed.

Verified via a repository-wide search that:

  • No remaining two-variable unpacking of validate_html_content exists.
  • All call sites unpack three values (is_valid, error_message, sanitized_html) appropriately.
  • There are no direct calls expecting only a boolean+message (e.g., is_valid, error_msg = …).

No edits required.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces HTML sanitization using the nh3 package to replace the previous pattern-matching validation approach for HTML content validation across the application.

Key changes:

  • Replaced regex-based HTML validation with nh3 library sanitization
  • Updated all serializers to handle sanitized HTML content and persist the cleaned version
  • Removed extensive malicious pattern detection code in favor of a trusted sanitization library

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
apps/api/requirements/base.txt Added nh3 HTML sanitization library dependency
apps/api/plane/utils/content_validator.py Replaced pattern-based validation with nh3 sanitization, updated function signature
apps/api/plane/space/serializer/issue.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/app/serializers/workspace.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/app/serializers/project.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/app/serializers/page.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/app/serializers/issue.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/app/serializers/draft.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/api/serializers/project.py Updated to handle sanitized HTML and persist cleaned content
apps/api/plane/api/serializers/issue.py Updated to handle sanitized HTML and persist cleaned content

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/api/plane/api/serializers/issue.py (1)

595-603: Ensure HTML is sanitized before normalization in IssueCommentSerializer

The current implementation in apps/api/plane/api/serializers/issue.py (lines 595–603) only parses and re-serializes incoming HTML but does not sanitize it, leaving an XSS risk if untrusted markup reaches the client. Before normalizing with lxml, you should run NH3 sanitization to strip or reject unsafe content.

Locations to update:

  • apps/api/plane/api/serializers/issue.py, within def validate(self, data): just before the existing try: block.

Proposed diff (insert before the try:):

     def validate(self, data):
-        try:
+        # Sanitize incoming HTML with NH3 first
+        if data.get("comment_html"):
+            is_valid, error_msg, sanitized_html = validate_html_content(data["comment_html"])
+            if not is_valid:
+                raise serializers.ValidationError(
+                    {"comment_html": "Invalid or unsafe HTML content"}
+                )
+            if sanitized_html is not None:
+                data["comment_html"] = sanitized_html
+
+        try:
             if data.get("comment_html", None) is not None:
                 parsed = html.fromstring(data["comment_html"])
                 parsed_str = html.tostring(parsed, encoding="unicode")
                 data["comment_html"] = parsed_str

Additional suggestions:

  • Consolidate error messaging so that all HTML validation errors use the same field-scoped format.
  • Scan other serializers for *_html fields and apply the same NH3 sanitization pattern to prevent similar gaps.
♻️ Duplicate comments (1)
apps/api/plane/utils/content_validator.py (1)

63-81: Good baseline; consider a reusable nh3.Cleaner with an explicit allowlist

Current nh3.clean defaults work, but an explicit Cleaner improves XSS guarantees, avoids policy drift across versions, and is marginally faster by reusing the sanitizer instance.

Apply within this function:

-        clean_html = nh3.clean(html_content)
+        clean_html = _HTML_CLEANER.clean(html_content)
         return True, None, clean_html

Add once at module scope:

# Reusable sanitizer with explicit policy
_HTML_CLEANER = nh3.Cleaner(
    tags={
        "a","abbr","b","blockquote","br","code","div","em","i","img","li","ol",
        "p","pre","span","strong","sub","sup","u","ul","h1","h2","h3","h4","h5","h6",
        "table","thead","tbody","tr","th","td","hr"
    },
    attributes={
        "*": {"title", "class"},
        "a": {"href", "title", "target", "rel"},
        "img": {"src", "alt", "title", "width", "height", "loading"},
        "div": {"data-*"},
        "span": {"data-*"},
        "table": {"border", "cellpadding", "cellspacing"},
    },
    url_schemes={"http", "https", "mailto"},
    strip=True,
    link_rel=True,
    allow_comments=False,
    # If inline styles are needed, explicitly constrain:
    # styles=nh3.CSSSanitizer(allowed_css_properties={"color","background-color","text-align"})
)
🧹 Nitpick comments (5)
apps/api/plane/space/serializer/issue.py (1)

293-303: NH3 sanitization wiring looks correct; consider field-scoped error for better UX

Triple-unpack and sanitization write-back are good. To surface the error on the specific field (useful for form validation and API clients), prefer a field-scoped error key.

-            if not is_valid:
-                raise serializers.ValidationError(
-                    {"error": "html content is not valid"}
-                )
+            if not is_valid:
+                raise serializers.ValidationError(
+                    {"description_html": "Invalid or unsafe HTML content"}
+                )
apps/api/plane/app/serializers/draft.py (1)

79-89: Good adoption of NH3; align error response to field for consistency

Sanitization flow is correct. Recommend switching to a field-scoped error to match binary validation and improve client-side error handling.

-            if not is_valid:
-                raise serializers.ValidationError(
-                    {"error": "html content is not valid"}
-                )
+            if not is_valid:
+                raise serializers.ValidationError(
+                    {"description_html": "Invalid or unsafe HTML content"}
+                )
apps/api/plane/api/serializers/issue.py (1)

92-101: Sanitization flow is correct; prefer field-scoped error for consistency

Logic and write-back look good. Align the error payload to the field for consistency with other validators.

-            if not is_valid:
-                raise serializers.ValidationError(
-                    {"error": "html content is not valid"}
-                )
+            if not is_valid:
+                raise serializers.ValidationError(
+                    {"description_html": "Invalid or unsafe HTML content"}
+                )
apps/api/plane/app/serializers/workspace.py (1)

322-331: NH3 integration is correct; switch to field-scoped error for parity

Implementation matches the new validator contract. Recommend field-level error mapping.

-            if not is_valid:
-                raise serializers.ValidationError(
-                    {"error": "html content is not valid"}
-                )
+            if not is_valid:
+                raise serializers.ValidationError(
+                    {"description_html": "Invalid or unsafe HTML content"}
+                )
apps/api/plane/app/serializers/issue.py (1)

131-141: Sanitization and persistence look good; drop the unused error_msg binding

You’re correctly sanitizing and persisting sanitized HTML. Minor nit: bind the unused error message to _ to signal intent.

Apply:

-            is_valid, error_msg, sanitized_html = validate_html_content(
+            is_valid, _, sanitized_html = validate_html_content(
                 attrs["description_html"]
             )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 342886f and 5a7225d.

📒 Files selected for processing (9)
  • apps/api/plane/api/serializers/issue.py (1 hunks)
  • apps/api/plane/api/serializers/project.py (1 hunks)
  • apps/api/plane/app/serializers/draft.py (1 hunks)
  • apps/api/plane/app/serializers/issue.py (1 hunks)
  • apps/api/plane/app/serializers/page.py (1 hunks)
  • apps/api/plane/app/serializers/project.py (1 hunks)
  • apps/api/plane/app/serializers/workspace.py (1 hunks)
  • apps/api/plane/space/serializer/issue.py (1 hunks)
  • apps/api/plane/utils/content_validator.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/plane/app/serializers/page.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-23T18:18:06.875Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7460
File: apps/api/plane/app/serializers/draft.py:112-122
Timestamp: 2025-07-23T18:18:06.875Z
Learning: In the Plane codebase serializers, workspace_id is not consistently passed in serializer context, so parent issue validation in DraftIssueCreateSerializer only checks project_id rather than both workspace_id and project_id. The existing project member authentication system already validates that users can only access projects they belong to, providing sufficient security without risking breaking functionality by adding workspace_id validation where the context might not be available.

Applied to files:

  • apps/api/plane/space/serializer/issue.py
  • apps/api/plane/app/serializers/workspace.py
  • apps/api/plane/app/serializers/project.py
🧬 Code graph analysis (8)
apps/api/plane/app/serializers/issue.py (1)
apps/api/plane/utils/content_validator.py (2)
  • validate_html_content (63-80)
  • validate_binary_data (20-60)
apps/api/plane/api/serializers/issue.py (1)
apps/api/plane/utils/content_validator.py (2)
  • validate_html_content (63-80)
  • validate_binary_data (20-60)
apps/api/plane/space/serializer/issue.py (1)
apps/api/plane/utils/content_validator.py (2)
  • validate_html_content (63-80)
  • validate_binary_data (20-60)
apps/api/plane/app/serializers/workspace.py (1)
apps/api/plane/utils/content_validator.py (2)
  • validate_html_content (63-80)
  • validate_binary_data (20-60)
apps/api/plane/app/serializers/draft.py (1)
apps/api/plane/utils/content_validator.py (2)
  • validate_html_content (63-80)
  • validate_binary_data (20-60)
apps/api/plane/api/serializers/project.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (63-80)
apps/api/plane/utils/content_validator.py (1)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
apps/api/plane/app/serializers/project.py (1)
apps/api/plane/utils/content_validator.py (1)
  • validate_html_content (63-80)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (8)
apps/api/plane/space/serializer/issue.py (1)

305-308: Binary validation path is consistent and safe

Fixed, field-scoped error message is appropriate and aligned with the rest of the API. No changes needed.

apps/api/plane/app/serializers/draft.py (1)

93-95: Binary validation path LGTM

Consistent with shared validator behavior; returns a clear, field-scoped error. No changes required.

apps/api/plane/api/serializers/issue.py (1)

106-108: Binary validation path LGTM

Field-scoped error and clear message. No changes needed.

apps/api/plane/app/serializers/workspace.py (1)

336-338: Binary validation is consistent and clear

No issues found.

apps/api/plane/app/serializers/issue.py (2)

135-137: Generic error surface is appropriate here

Returning a generic message avoids leaking sanitizer details to clients and matches the project-wide pattern. LGTM.


142-147: Binary validation flow LGTM

Generic message on failure and no detail leakage. Consistent with the HTML path.

apps/api/plane/api/serializers/project.py (1)

201-214: No JSON/dict-style assignments to description_html found

I ran a repo-wide search for any dict or array literal being assigned to description_html and found zero matches. All frontend payloads refer to description_html as a string variable (never as an object or array literal). You can safely proceed—there are no remaining callers sending JSON/dict for description_html.

apps/api/plane/utils/content_validator.py (1)

68-70: Empty input handling is consistent with serializers

Returning (True, None, None) lets callers skip overwriting fields when input is falsy. This matches current serializer checks. LGTM.

@sriramveeraghanta sriramveeraghanta merged commit 0af7589 into preview Aug 26, 2025
5 of 7 checks passed
@sriramveeraghanta sriramveeraghanta deleted the chore-content-validation branch August 26, 2025 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants