Skip to content

RFC: Switch to disable String-to-HTML-parsing #268

@SmithChart

Description

@SmithChart

Cheers,
the real world hit my Lona project again. Before starting with a PR I would like to hear your opinion on how to go forward here.

tl;dr: I would like to add a settings-switch that disables the auto HTML-parsing in HTML() -objects.
IMHO auto-parsing is a footgun. With it enabled the developer has to escape each and every user input that could be passed (directly) into an HTML-object at some point. If the developers fails to do so a user can either break the application or inject custom HTML into the frontend.
And: With this feature enabled HTML-objects behave differently than other nodes (where there is no auto-parsing).

Consider the following example:

from lona.html import HTML, H1, Div
from lona import LonaApp, LonaView


app = LonaApp(__file__)


@app.route('/1')
class MyView(LonaView):
    def handle_request(self, request):
        html = HTML(
            H1('Hello World'),
            '<p>We can create HTML by simply writing some string here...</p>',
        )
        print(html.nodes)
        return html


@app.route('/2')
class MyView(LonaView):
    def handle_request(self, request):
        user_input = 'Somebody <somebody@example.com>'
        html = HTML(
            H1('Hello World'),
            user_input,
        )
        print(html)
        return html


@app.route('/3')
class MyView(LonaView):
    def handle_request(self, request):
        user_input = 'Somebody <somebody@example.com>'
        html = HTML(
            H1('Hello World'),
            Div(user_input),
        )
        print(html)
        return html


app.run(port=8080)

In /1 Lona will take the String containing the <p> and will construct an actual Node from that. The output will be:

<h1 data-lona-node-id="1">
  Hello World
</h1>
<!--lona-widget:7-->
<p data-lona-node-id="5">
  We can create HTML by simply writing some string here...
</p>
<!--end-lona-widget:7-->

Now in /2 we don't have a nice developer formatted string but some user input. (In our case this most often is an E-Mail message-ID we paste in a comment field for future reference).
Lona will assume that something containing < or > must be HTML and tries to parse it. This will fail:

LonaRuntimeWorker_0            ERROR    15:26:16.171305 lona.view_runtime Exception raised while running <__main__.MyView object at 0x7f9c782153d0>
  Traceback (most recent call last):
    File "/home/chris/work/Projects/intern/lag-intranet/env/lib/python3.9/site-packages/lona/view_runtime.py", line 318, in start
      raw_response_dict = self.view.handle_request(self.request) or ''
    File "/home/chris/work/Projects/intern/lag-intranet/demo.py", line 23, in handle_request
      html = HTML(
    File "/home/chris/work/Projects/intern/lag-intranet/env/lib/python3.9/site-packages/lona/html/widgets.py", line 22, in __init__
      self.nodes.append(HTML(node))
    File "/home/chris/work/Projects/intern/lag-intranet/env/lib/python3.9/site-packages/lona/html/widgets.py", line 25, in __init__
      self.nodes = html_string_to_node_list(
    File "/home/chris/work/Projects/intern/lag-intranet/env/lib/python3.9/site-packages/lona/html/parsing.py", line 205, in html_string_to_node_list
      raise ValueError(
  ValueError: Invalid html: missing end tag </somebody@somedomain.com>

In /3 the user input will now be passed to the frontend without being parsed as HTML by neither Lona nor the Browser.

I am not sure if anybody is using Lona for the reason that one can build HTML-trees from strings. At least I / we are not using it that way.
I would suggest to add a switch to the settings that would disable the isinstance(node, str) check here:

if isinstance(node, str):
. This way any string that is passed into a HTML would be directly passed on to the client and user input could not break the application in this way.

By leaving parsing enabled by default the behaviour of the application would not change. But disabling it would still allow an developer to pass user input to the frontend inside a HTML and not only inside a Node.

On the other hand one could also argue that having different behaviour in parsing between HTML and Node is not desired. But I don't have made a decision if I would like to go that far.

So: What's your opinion: Add that switch? Should the feature stay enabled by default?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions