Replies: 17 comments 6 replies
-
|
tricked:
jarred:
|
Beta Was this translation helpful? Give feedback.
-
|
Deno has https://deno.land/manual@v1.36.4/advanced/jsx_dom/deno_dom |
Beta Was this translation helpful? Give feedback.
-
|
I was using HTMLRewriter (LOLHTML) to work with an XML document and ran into problems when the document had Style tags. Children of Style were processed as raw text instead of as elements. I would like to work with LOLHTML to add support for xml documents. |
Beta Was this translation helpful? Give feedback.
-
|
@vjpr did you find a solution for this as I need to use parseFromString from DOMParser but cannot as it doesn't exist in bun. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
How would I use JSDOM to parse a html string into html elements
…On Sat, 9 Dec 2023, 18:15 guest271314, ***@***.***> wrote:
There is JSDOM https://github.com/guest271314/jsdom-extension.
—
Reply to this email directly, view it on GitHub
<#1522 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOFWX5UPVMGA27DWTI6S3LYISTDPAVCNFSM6AAAAAASEPLCL6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TQMBYGI2TK>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
In the browser, I have a rich text editor, that requires html code but it
stores the data as a string of that code, so I need to parse e.g. "<p>some
text<p>" to an actual html element. Which is what DOMParser.parseFromString
does, but DOMParser isn't part of bun yet, so I need some way to do this.
…On Sat, 9 Dec 2023, 23:02 guest271314, ***@***.***> wrote:
HTML elements where, in what environment?
—
Reply to this email directly, view it on GitHub
<#1522 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOFWX3KUL43QFEQQZX2MPTYITUZDAVCNFSM6AAAAAASEPLCL6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TQMBZGI2TK>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
I need a browser based solution not a node one.
…On Sun, 10 Dec 2023, 12:39 guest271314, ***@***.***> wrote:
@The-Code-Monkey <https://github.com/The-Code-Monkey> Have you tried
using the JSDOM code that I compiled for use in a browser extension
ServiceWorker, linked above? Should work.
—
Reply to this email directly, view it on GitHub
<#1522 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHOFWXZZ6VZ3CSF23VHTEY3YIWURXAVCNFSM6AAAAAASEPLCL6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TQMJRGM4TO>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Upgrading Lego to V2 we are trying to minimize external dependencies. |
Beta Was this translation helpful? Give feedback.
-
|
+1 would love this |
Beta Was this translation helpful? Give feedback.
-
I was thinking whether, since the CSS feature was implemented with Bun 1.2, we could think about a DOM parser. |
Beta Was this translation helpful? Give feedback.
-
|
+1! Would be nice! |
Beta Was this translation helpful? Give feedback.
-
|
+1 |
Beta Was this translation helpful? Give feedback.
-
|
For a modern native dom/xpath/xslt implementation instead of libxml2 which supports only xpath 1.0 see About an html5 parser the latest versions of PHP are using Also adding a builtin json query language will be really cool, the most powerfull one (also used in the aws stack in some place) is: This additions will be really cool imho, Not only for web scraping but are a super powerfull toolbox for a wide range of common requirements nowdays, in one way or another. |
Beta Was this translation helpful? Give feedback.
-
|
If using dom and HTML parser from WebKit i think you will inherit the dom event system, the mutation observer etc in a more native way than the one exposed in browsers to js... This could open to implementing in bun some interesting GUIs (os native or Not) that have the dom as underlying markup and source of truth/state/GUI/game scene graph...something like xaml and qml (Qt) |
Beta Was this translation helpful? Give feedback.
-
|
as |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wonder if we could offer built-in support for an HTML parser and the DOM.
Use cases
Why Bun should do this?
Bun already has WebKit in its source tree. We could just re-use the existing implementations there. E.g. https://github.com/WebKit/webkit/blob/main/Source/WebCore/dom/DOMImplementation.cpp / https://github.com/WebKit/webkit/blob/main/Source/WebCore/html/parser/HTMLDocumentParser.cpp
Demonstrating perf benefits for web scraping and DOM testing could push adoption.
Seamless access to web platform from server. Currently you have to think: "am I in browser env or node env?" when doing web stuff.
Existing HTML/DOM Libraries
A lot is built around htmlparser2: https://github.com/fb55/htmlparser2/#ecosystem
https://cheerio.js.org/ - parse html and run jquery-like selectors
https://github.com/jsdom/jsdom - subset of web browser for testing and scraping and can execute js
https://github.com/fb55/htmlparser2
See: https://github.com/fb55/htmlparser2/#performance
https://github.com/inikulin/parse5
Benchmarks
Here are the benchmarks to beat:
Existing Testing Libraries
Jest
Custom environments
Jest allows you to choose an env (node vs jsdom) or use a custom one
https://jestjs.io/docs/configuration#testenvironment-string
Sandboxing
Many testing libraries do this which could be quite processor intensive - hence more opportunities for perf wins.
Renderer
If we bring in the parser and DOM now, in addition to JS engine, the only thing left now is the renderer.
After bringing in parsing/dom, the two parts left would be:
I guess the advantage of doing a browser in Bun would be to offer seamless interaction with the entire web platform. At present it's quite patchwork - there is a lot of existing native WebKit code that has been re-implmented (slower) in userland JS. And to access browser environments you need a lot of additonal libraries.
Beta Was this translation helpful? Give feedback.
All reactions