Skip to content

Conversation

nigels-com
Copy link
Contributor

As a follow-up for PR #1703 and PR #1757 and as an alternative to PR #2287.

Extend the "Tiny UTF-8 Encoder" library API to optionally handle invalid input characters.
A UTF-8 encoded string can be specified for substituting invalid input characters, rather than erroring.
If the invalid parameter is NULL, invalid input characters will not be tolerated.


Testing

Valgrind of test harness is clean on Ubuntu 18.04.

Documentation

Documentation updates not required for this.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@bluebike
Copy link
Contributor

bluebike commented Jul 7, 2020

Well.. seems to be working (tests in High Sierra macOS).
Invalid handling is different than in my (forgotten) PR.

Got rid of extra function layer. Good.

One thing I don't really like at all is tutf8e_encoder function.
This using strcmp with every single encoding is not very efficient.

We could live this just to get something done.. probably too late for 1.5 anyway.

@nigels-com
Copy link
Contributor Author

Thanks for looking.

My time on this was a bit limited, tried to stick to the essentials.
The string-based encoder lookup is once-only, so not so performance critical?

bluebike added a commit to bluebike/fluent-bit that referenced this pull request Aug 4, 2020
Add flb_encoding functions for charset encodings to utf8.

* Uses lib/tutf8e-library.
* Only 8-bit source charsets are supported.
* Encoding options (//OPTION) :
 //IGNORE, //QUESTION, //REPLACEMENT ///<text>
* This commit doesn't add support to any input plugin.
* Depends on fluent#2326

Signed-off-by: Jukka Pihl <[email protected]>
@nigels-com
Copy link
Contributor Author

@bluebike you'd like this merged to fluent-bit master branch, as is? Any issues you came across in the course of #2420 ?

@bluebike
Copy link
Contributor

bluebike commented Aug 5, 2020

@nigels-com well... this is good enough. Seems to be working.
I haven't seen any major problems with this...

@nigels-com nigels-com force-pushed the tutf8e-invalid-characters branch from a3c018d to 133d80c Compare August 5, 2020 11:39
@nigels-com
Copy link
Contributor Author

@edsiper Could this be merged for the purposes of #2420 ?

@edsiper edsiper merged commit 987e1c0 into fluent:master Aug 7, 2020
@edsiper
Copy link
Member

edsiper commented Aug 7, 2020

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants