Skip to content

Fix problem with Document class and html() function#19

Merged
retlehs merged 5 commits intoroots:mainfrom
dustingrofthenumber:problem-with-document-class-html-function
Jan 20, 2026
Merged

Fix problem with Document class and html() function#19
retlehs merged 5 commits intoroots:mainfrom
dustingrofthenumber:problem-with-document-class-html-function

Conversation

@dustingrofthenumber
Copy link
Contributor

Resolves #18

@Log1x Log1x self-requested a review December 5, 2025 02:15
Copy link
Member

@Log1x Log1x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the PR! Seeing an initial issue with the Regex for comments. Can we double check these handle every condition?

@davideprevosto
Copy link

I encountered the same problem; the code in the PR seems to solve my problem. Have you got any update on this?
Thank you!

@Log1x Log1x self-requested a review January 4, 2026 03:59
@Log1x Log1x requested review from QWp6t and retlehs January 4, 2026 03:59
@QWp6t
Copy link
Member

QWp6t commented Jan 7, 2026

I believe you should be able to resolve this without regex just by using LIBXML_NOXMLDECL.

You should also be able to suppress errors using a flag as well.

I haven't tested this, but it should theoretically work as a drop-in replacement for this class cc @Log1x @retlehs

class Document
{
    /**
     * The DOMDocument instance.
     */
    protected DOMDocument $document;

    /**
     * Initialize the Document instance.
     */
    public function __construct(string $html)
    {
        $this->document = new DOMDocument(encoding: 'UTF-8');

        $this->document->loadHTML($html,
            LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD | LIBXML_NOXMLDECL | LIBXML_NOWARNING | LIBXML_NOERROR
        );
    }

    /**
     * Make a new instance of the Document.
     */
    public static function make(string $html): self
    {
        return new static($html);
    }

    /**
     * Loop through each node in the document and execute the provided callback.
     */
    public function each(callable $callback): self
    {
        foreach ($this->xpath('//*') as $node) {
            $callback($node);
        }

        return $this;
    }

    /**
     * Evaluate the given XPath expression.
     */
    public function xpath(string $expression): DOMNodeList
    {
        return (new DOMXPath($this->document))->query($expression);
    }

    /**
     * Get the saved document HTML.
     */
    public function html(): string
    {
        return trim($this->document->saveHTML());
    }

    /**
     * Call the given method on the root document.
     */
    public function __call(string $name, array $arguments): mixed
    {
        return $this->document->{$name}(...$arguments);
    }

    /**
     * Get the given property from the root document.
     */
    public function __get(string $name): mixed
    {
        return $this->document->{$name};
    }
}

@davideprevosto
Copy link

I believe you should be able to resolve this without regex just by using LIBXML_NOXMLDECL.

@QWp6t I've tried your code and it's fixing the issue for me. I've also removed suppress() and clear() using the suggested flags.

@KristoferN
Copy link

KristoferN commented Jan 9, 2026

Same problem on macOS with Herd PHP 8.3. libxml2 Version => 2.15.1

@dustingrofthenumber
Copy link
Contributor Author

The method @QWp6t mentioned does work for me also. @Log1x should I switch approaches and commit it instead?

@Log1x
Copy link
Member

Log1x commented Jan 12, 2026

@Log1x should I switch approaches and commit it instead?

Sounds good to me! Appreciate it.

@dustingrofthenumber
Copy link
Contributor Author

Done! @Log1x I hope this does it :)

Copy link

@davideprevosto davideprevosto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure @dustingrofthenumber, but I think you should remove:

use Illuminate\Support\Str;

@dustingrofthenumber
Copy link
Contributor Author

dustingrofthenumber commented Jan 13, 2026

Yep, you are right, it is unused, removed!

Copy link
Member

@QWp6t QWp6t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@retlehs retlehs merged commit a6db950 into roots:main Jan 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with Document class and html() function

6 participants