Skip to content

Conversation

@sbulen
Copy link
Contributor

@sbulen sbulen commented Oct 25, 2025

Fixes #8993

Just placing this out there again for consideration. It's an update of #8991

@Sesquipedalian
Copy link
Member

I want to review this carefully. I will make time to do so in the next few days.

@jdarwood007 jdarwood007 added this to the 2.1.7 milestone Oct 26, 2025
@sbulen
Copy link
Contributor Author

sbulen commented Oct 27, 2025

Another note on these escapes... I initially thought - if php regex allowed it (uh... nope...) - of putting a 2nd lookbehind for mysql, so \\' would be understood as a single quote whereas \' would be understood as an escaped single quote...

The problem is that it can keep going... e.g., this is a single quote that terminates a string:
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'
...but this is an escaped single quote that does not terminate a string:
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'

So it'd need to check for an even vs odd # of chars in a lookbehind in regex... Which is why I ultimately gave up on using a regex for that. A similar problem exists in pg - the number of single quotes in a row determines whether it's escaped or not...

Copy link
Member

@Sesquipedalian Sesquipedalian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes for PostgreSQL work well, but the proposed changes for MySQL ran into issues with some complex strings. This is because the SQL standard '' syntax for escaping apostrophe characters is unambiguous and can therefore be handled accurately with a simple str_replace() call, whereas the C-style \' syntax that MySQL supports can create more complexity that requires taking the surrounding context into account when doing this job.

That said, I did find a case while testing for MySQL where the existing regex used in the preg_split() call wasn't quite good enough and needs a tweak.

Thus, I have made code suggestions in the comments for this review that revert the proposed changes made to the MySQL code but also adjust the regex used in the preg_split() call.

Copy link
Member

@Sesquipedalian Sesquipedalian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more important change to make.

@sbulen
Copy link
Contributor Author

sbulen commented Oct 27, 2025

whereas the C-style \' syntax that MySQL supports can create more complexity that requires taking the surrounding context into account when doing this job.

Can you provide an example of this? Where the PR would fail?

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Oct 27, 2025

whereas the C-style \' syntax that MySQL supports can create more complexity that requires taking the surrounding context into account when doing this job.

Can you provide an example of this? Where the PR would fail?

Start with this as the incoming query string passed to the $db_string parameter of smf_db_query():

SELECT 'hel''lo hel\\''lo /*'

... which uses a mix of '' syntax for escaping apostrophes and \\ syntax for escaping backslashes in order to escape the raw string hel'lo hel\'lo /*. Using a mix of syntaxes like that would be a weird thing to do, but it is perfectly legal in MySQL.

When the MySQL code proposed in this PR tries to make a cleaned version of this query string, it will produce this:

select %s %s lo /*'

... which is a problem.

Now, in theory such a situation isn't supposed to arise because people are supposed to be pass the raw string (hel'lo hel\'lo /* in this case) as a value in the $db_values parameter of smf_db_query() rather than writing it directly into $db_string. But someone could do that if they chose to, and our tests would be wrong to reject it since it is in fact a perfectly valid query string.

EDIT: Appended /* to the string literal in the test query in order to make the problem completely obvious.

@sbulen
Copy link
Contributor Author

sbulen commented Oct 27, 2025

The proposed change to the PR fails simple tests - e.g., this PM:
Subject: Hello World I\'m home
Body: I'll

Not unusual; folks share code.

If the only way to get the current PR to fail is by manually hacking the code &, further, not calling the appropriate $smcFunc... I don't think that's a problem. Don't hack the code, use SMF standards, you'll be fine.

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Oct 27, 2025

"Don't do that" isn't an adequate approach here.

I will test your example soon. It is entirely possible that the preg_split() approach remains insufficient. If so, we will need to find something else.

Also, calling $smcFunc['db_query'] vs. smf_db_query makes no difference regarding my comments above. I referred to the latter for clarity, that's all.

@sbulen
Copy link
Contributor Author

sbulen commented Oct 27, 2025

If they're not calling smcFunc, they're not here at all, though... If they are calling smcFunc, it'll get escaped accordingly...

I've tested that, too.

I think we're covered with this PR.

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Oct 27, 2025

$request = $smcFunc['db_query']('', 'SELECT \'hel\'\'lo hel\\\\\'\'lo /*\'', array());

Merely using $smcFunc['db_query']() does not not guarantee that string literals in the query will be escaped consistently. If the string were passed in the $db_values parameter, then it would be escaped consistently, sure. But putting a manually escaped string literal in the $db_string parameter is entirely possible with $smcFunc['db_query']().

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Oct 27, 2025

The proposed change to the PR fails simple tests - e.g., this PM: Subject: Hello World I\'m home Body: I'll

Yup, I was able to reproduce that. Somewhat surprisingly, though, the PM still sent and the error appeared when trying to write the session data to the database after the PM had been sent. At any rate, that error message did allow me to find the issue with the regular expression I was using in the pre_split() call.

Instead of this:

$clean = preg_split('/(?<![\'\\\\])\'(?![\'])|(?<=[\\\\]{2})\'/', $db_string);

... we need this:

$clean = preg_split('/(?:\\\\{2})*\K(?<![\'\\\\])\'(?![\'])/', $db_string);

The (?:\\\\{2})* part matches zero or more escaped backslashes (i.e. zero or more sets of two backslashes). Then the \K drops those escaped backslashes from the match and proceeds to look for an unescaped apostrophe. If it finds an unescaped apostrophe, it splits the string around it.

This approach (using (?:\\\\{2})*\K) is the way to accomplish this:

So it'd need to check for an even vs odd # of chars in a lookbehind in regex...

I have updated my suggestion in my review comments to use this new regex.

If you can find any other cases where my latest suggestion fails, please let me know, @sbulen.

@Sesquipedalian Sesquipedalian merged commit f820d70 into SimpleMachines:release-2.1 Oct 31, 2025
10 checks passed
Sesquipedalian added a commit to Sesquipedalian/SMF that referenced this pull request Oct 31, 2025
@sbulen sbulen deleted the 21_query_check_fix2 branch November 6, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants