Skip to content

Updated regular expression to avoid errors when converting to Float()#1378

Closed
WhatWasThatt wants to merge 1 commit intoprawnpdf:masterfrom
WhatWasThatt:patch-1
Closed

Updated regular expression to avoid errors when converting to Float()#1378
WhatWasThatt wants to merge 1 commit intoprawnpdf:masterfrom
WhatWasThatt:patch-1

Conversation

@WhatWasThatt
Copy link
Copy Markdown

No description provided.

matches = /size="([^"]*)"/.match(token) ||
/size='([^']*)'/.match(token)
matches = /size=".*?(\d+(?:\.\d+)?).*?"/.match(token) ||
/size='.*?(\d+(?:\.\d+)?).*?'/.match(token)
Copy link
Copy Markdown
Contributor

@johnnyshields johnnyshields May 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that broke here was an arbitrary change of .to_f to Float() (the former returns 0 on error; the later raises an error.)

Technically there are other valid cases like Float('32e11') #=> 3200000000000.0. I'm not sure if these are in PDFs.

I think the safest thing to do would be to simply begin / rescue the Float() calls, and suppress any errors.

@pointlessone we are seeing a few random prod errors on this line since upgrading. It was your style upgrade that caused the breakage, granted, there were probably silent failures (returning 0 instead of error) before. How do you recommend to proceed with a fix here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken this is the code for inline text formatting. It's not a part of PDF spec, it's just a convenience provided by Prawn. I have to remind that it is not HTML even if it looks very similar. You're making a mistake if you're feeding some random HTML into Prawn. There never was a goal to make it compatible with HTML.

I think that raising an error is better than silently ignoring provided value. If you're having errors here you should validate the input. I don not recommend exposing this functionality to user input. Treat it as a mini-DSL to use in code. If you need formatting in user-provided content, I suggest you implement it separately.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WhatWasThatt Regarding the change. The match is by design. The goal is to pick markup in as many cases as possible. It should not be ignored if the value is incorrect. Instead it should report the issue.

Consider <font size='really big'>HUGE</font>. With this change the markup wouldn't match and would be silently ignored. I bet that would be frustrating to debug.

I admit that the code is not exactly robust in achieving this goal right now but this change makes it even les so.

Copy link
Copy Markdown
Contributor

@johnnyshields johnnyshields May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here is that some of our users are doing <font size="4"> where 4 is a non-ascii character. People make this mistake often in Japan 🤦‍♂️

As a compromise, @pointlessone will you accept to do Float(string) rescue nil so that we suppress errors, like the old version did? To us it's far more annoying to have this be a hard failure than it is to simply ignore it. HTML is by nature garbage-in-garbage-out; browsers like Firefox do not crash because you have one attribute value wrong.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

I'm also not sure what that would accomplish for you. It would set font size to 0. How is that useful? Anyway, as I said before, this is not meant to be a user-facing feature. You should validate your input.

Copy link
Copy Markdown
Contributor

@johnnyshields johnnyshields May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not set font size to zero, it would simply skip setting any font-size at all, which is much closer to how HTML is parsed in the vast majority of applications.

Current:

sizes << Float(matches[1]) unless matches.nil?

Proposed:

size = (Float(matches[1]) rescue nil) unless matches.nil?
sizes << size unless size.nil?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what's in the PR or in your previous comments. I kninda enjoy this impromptu brainstorming session we have here but I'm a little busy ATM, sorry. How about you address your needs in your fork, figure out all the kinks, run it in production for a bit and then maybe come with a fully formed proposal? To make it easier to integrate please keep in mind that I'd rather keep it more strict with explicit feedback than silently skip markup.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, the new regex looks redundant
@johnnyshields, thanks, updated the solution with your suggestion

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WhatWasThatt I'll be upfront: I'm not merging this.

As a general advice: it'd be cool if you wrote some tests. In this case I suggest you try <font size="huge" character_spacing="4">test</font> or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants