-
Notifications
You must be signed in to change notification settings - Fork 420
Pull Request: Added Support for Custom String Encoding #1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
forget to add the type serialiser .. update incoming |
|
From the specs only UTF-8 is allowed. Spec |
|
What should i say. I currently work with a Siemens OPC UA Server that runs a windows encoding. The encoding issues in this repo that pop up over the time show that german umlauts are a serious problem in encoding - but this should not be the case right? utf-8 has german umlauts / mutated vowel in it. The only way this problem could happen in the first place is that the server using different encoding. That was verified our server runs on cp1552 encoding. The server sends and expects strings encoded using this encoding. In this and the old repo multiple issues got raised due to the german umlaut encoding. To address this, I'm trying to add the ability to change the encoding in the low-level API. Currently, I'm focusing on node ID parsing, as it's impossible to subscribe to a node that has been surrogate-encoded or replaced and saved to a file (as currently implemented). However, to achieve this, I've had to make more changes than I initially expected. I hope to push a fully functional version by the end of the week, which will use UTF-8 as the standard encoding but also allow for other encodings to be used if necessary. One concern I have is whether the incorrect encoding is also affecting the implemented XML protocol, aside from the subscription issues / nodeid issues. From what I can tell, the binary protocol doesn't seem to be affected by this issue. I'd appreciate any feedback or guidance on how to proceed with this change. |
|
add the possiblity to change string encoding with a lowapi config, instead of the first try to alter all methods and tests, which is really cumbersome and error prone. Additionally, added a surrogateescape for encoding/decoding of strings if a wrong encoding is used, the communication to and from the server at runtime is possible without throwing any errors. |
asyncua/ua/ua_binary.py
Outdated
| return data.read(length) | ||
|
|
||
|
|
||
| class LowApiConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that name is so strange... What is supposed to mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name is open to debate, feel free for another suggestion. This is the least effort, without intervene with your great design of the library, and I added the String_Encoding as first element. In the future, maybe there are other configuration parameters which should be altered specifically e.g. for custom objects that have a different array encoding / byte encoding. The config is there to change the parameters on the low-level communcation. Therefore, my suggestion of a "lowApiConfig" for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also open, if this is right place for such a config. I will be happy to refactor my PR to your needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove that class and use directly the member String_Encoding...
also StringEncoding... mixing up camel case and underscore is not allowed by convention in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lkaupp you still need to fix that. LowApiConfig really means nothing to my brain.
Remove that calss and keep the config as a property in that file
string_encoding: st = SSSSS
and follow PEP8 for the naming
|
Any concerns left, that i can implement for you to merge this into the master? I am willing to assist at any point. I want to give some of our findings back to the community. This lib is great and I want to contribute my part to the success story. |
asyncua/client/client.py
Outdated
| strip_url_credentials: bool = True | ||
|
|
||
| def __init__(self, url: str, timeout: float = 4, watchdog_intervall: float = 1.0): | ||
| def __init__(self, url: str, timeout: float = 4, watchdog_intervall: float = 1.0, encoding="utf-8"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tihs will be a very rarely used option, leave it as property, no need to be part of constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how we should set the variable, using a subsequent function? Defining the encoding in the constructor as with any other client/server library would make more sense to me as a seasoned programmer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I understand correctly but in python all attributes are public by default. and that class has already many many atrtibutes...
so in python you would do
c = Client()
c.encoding = "whatever" # for the very few special cases
I suppose in java/C++ you would do
c = Client()
c.set_encoding(whater)
and in rust
let c = Client::new().encoding("whatewerr");
asyncua/server/server.py
Outdated
| """ | ||
|
|
||
| def __init__(self, iserver: InternalServer = None, user_manager=None): | ||
| def __init__(self, iserver: InternalServer = None, user_manager=None, encoding="utf-8"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as for the client.
Also do we want that in the server? Why making a server not supporting the standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super simple. If you have to mirror the servers behaviour to track the faults in implementations, you need the same encoding as the original server. Otherwise xml exported values cannot be reimported and debugged using the client. The information extracted cannot be automatically casted into the new encoding - not feasable with thousands of variables. Here the server needs to speak the same language as the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be valid for testing but definitely use property and nor constructor argument
I wrote some comments. Fix them and we can merge that if that is needed in the real world |
Done :). If it should be set in a subsequest function request, np. I could also add it as a kwargs argument. |
@lkaupp if that is done, it is not pushed... |
c056169 to
b6a4a18
Compare
|
@oroulet I made the requested changes. It would be amazing if you could check it out. EDIT: the failed Python 3.11 test doesn't make sense to me for the provided changes. |
|
I added a getter and a setter to update the variable at runtime; otherwise, all components in the library would always use the default value. |
|
@oroulet is it ok with the changes Alex made? |
Pull Request: Added Support for Custom String Encoding
Description
This pull request introduces the ability to set a custom string encoding for communication with non-standard servers. This change addresses numerous issues related to vowel mutations and umlauts in the German-speaking area.
Changes
Benefits
Example Use Case
To use a custom encoding like CP1252, simply instruct the client or server to use this encoding for strings, including node IDs. This can be particularly useful when communicating with servers that do not support UTF-8.