Skip to content

Conversation

@lkaupp
Copy link
Contributor

@lkaupp lkaupp commented Apr 28, 2025

Pull Request: Added Support for Custom String Encoding

Description

This pull request introduces the ability to set a custom string encoding for communication with non-standard servers. This change addresses numerous issues related to vowel mutations and umlauts in the German-speaking area.

Changes

  • Added an option to specify a custom string encoding, with UTF-8 as the default value.
  • Updated the client and server to accept instructions for using alternative encodings for strings, including node IDs.
  • Provided support for encodings like CP1252, which can be used as an alternative to UTF-8.
  • Added fallback for readonly communication with surrogates

Benefits

  • Resolves issues with vowel mutations and umlauts in the German-speaking area.
  • Enhances compatibility with non-standard servers that use custom string encodings.
  • Allows for more flexibility in communication between the client and server.

Example Use Case

To use a custom encoding like CP1252, simply instruct the client or server to use this encoding for strings, including node IDs. This can be particularly useful when communicating with servers that do not support UTF-8.

@lkaupp
Copy link
Contributor Author

lkaupp commented Apr 28, 2025

forget to add the type serialiser .. update incoming

@schroeder-
Copy link
Contributor

From the specs only UTF-8 is allowed. Spec
If we are the problem we should make sure we are always sending UTF-8.
If the problem is another server/server, tell them to comply to the spec.

@lkaupp
Copy link
Contributor Author

lkaupp commented Apr 29, 2025

What should i say. I currently work with a Siemens OPC UA Server that runs a windows encoding. The encoding issues in this repo that pop up over the time show that german umlauts are a serious problem in encoding - but this should not be the case right? utf-8 has german umlauts / mutated vowel in it. The only way this problem could happen in the first place is that the server using different encoding. That was verified our server runs on cp1552 encoding. The server sends and expects strings encoded using this encoding. In this and the old repo multiple issues got raised due to the german umlaut encoding.

To address this, I'm trying to add the ability to change the encoding in the low-level API. Currently, I'm focusing on node ID parsing, as it's impossible to subscribe to a node that has been surrogate-encoded or replaced and saved to a file (as currently implemented). However, to achieve this, I've had to make more changes than I initially expected. I hope to push a fully functional version by the end of the week, which will use UTF-8 as the standard encoding but also allow for other encodings to be used if necessary.

One concern I have is whether the incorrect encoding is also affecting the implemented XML protocol, aside from the subscription issues / nodeid issues. From what I can tell, the binary protocol doesn't seem to be affected by this issue. I'd appreciate any feedback or guidance on how to proceed with this change.

@lkaupp
Copy link
Contributor Author

lkaupp commented Apr 29, 2025

add the possiblity to change string encoding with a lowapi config, instead of the first try to alter all methods and tests, which is really cumbersome and error prone. Additionally, added a surrogateescape for encoding/decoding of strings if a wrong encoding is used, the communication to and from the server at runtime is possible without throwing any errors.

return data.read(length)


class LowApiConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that name is so strange... What is supposed to mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is open to debate, feel free for another suggestion. This is the least effort, without intervene with your great design of the library, and I added the String_Encoding as first element. In the future, maybe there are other configuration parameters which should be altered specifically e.g. for custom objects that have a different array encoding / byte encoding. The config is there to change the parameters on the low-level communcation. Therefore, my suggestion of a "lowApiConfig" for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also open, if this is right place for such a config. I will be happy to refactor my PR to your needs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just remove that class and use directly the member String_Encoding...
also StringEncoding... mixing up camel case and underscore is not allowed by convention in Python

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkaupp you still need to fix that. LowApiConfig really means nothing to my brain.
Remove that calss and keep the config as a property in that file

string_encoding: st = SSSSS

and follow PEP8 for the naming

@lkaupp
Copy link
Contributor Author

lkaupp commented May 20, 2025

Any concerns left, that i can implement for you to merge this into the master? I am willing to assist at any point. I want to give some of our findings back to the community. This lib is great and I want to contribute my part to the success story.

strip_url_credentials: bool = True

def __init__(self, url: str, timeout: float = 4, watchdog_intervall: float = 1.0):
def __init__(self, url: str, timeout: float = 4, watchdog_intervall: float = 1.0, encoding="utf-8"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tihs will be a very rarely used option, leave it as property, no need to be part of constructor

Copy link
Contributor Author

@lkaupp lkaupp May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how we should set the variable, using a subsequent function? Defining the encoding in the constructor as with any other client/server library would make more sense to me as a seasoned programmer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I understand correctly but in python all attributes are public by default. and that class has already many many atrtibutes...

so in python you would do

c = Client()
c.encoding = "whatever" # for the very few special cases

I suppose in java/C++ you would do

c = Client()
c.set_encoding(whater)

and in rust

let c = Client::new().encoding("whatewerr");

"""

def __init__(self, iserver: InternalServer = None, user_manager=None):
def __init__(self, iserver: InternalServer = None, user_manager=None, encoding="utf-8"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the client.
Also do we want that in the server? Why making a server not supporting the standard?

Copy link
Contributor Author

@lkaupp lkaupp May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super simple. If you have to mirror the servers behaviour to track the faults in implementations, you need the same encoding as the original server. Otherwise xml exported values cannot be reimported and debugged using the client. The information extracted cannot be automatically casted into the new encoding - not feasable with thousands of variables. Here the server needs to speak the same language as the client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be valid for testing but definitely use property and nor constructor argument

@oroulet
Copy link
Member

oroulet commented May 20, 2025

Any concerns left, that i can implement for you to merge this into the master? I am willing to assist at any point. I want to give some of our findings back to the community. This lib is great and I want to contribute my part to the success story.

I wrote some comments. Fix them and we can merge that if that is needed in the real world

@lkaupp
Copy link
Contributor Author

lkaupp commented May 20, 2025

Any concerns left, that i can implement for you to merge this into the master? I am willing to assist at any point. I want to give some of our findings back to the community. This lib is great and I want to contribute my part to the success story.

I wrote some comments. Fix them and we can merge that if that is needed in the real world

Done :). If it should be set in a subsequest function request, np. I could also add it as a kwargs argument.

@oroulet
Copy link
Member

oroulet commented Jul 20, 2025

Any concerns left, that i can implement for you to merge this into the master? I am willing to assist at any point. I want to give some of our findings back to the community. This lib is great and I want to contribute my part to the success story.

I wrote some comments. Fix them and we can merge that if that is needed in the real world

Done :). If it should be set in a subsequest function request, np. I could also add it as a kwargs argument.

@lkaupp if that is done, it is not pushed...

@Alexander-Zender
Copy link
Contributor

Alexander-Zender commented Jan 12, 2026

@oroulet I made the requested changes. It would be amazing if you could check it out.

EDIT: the failed Python 3.11 test doesn't make sense to me for the provided changes.
Nothing was touched in that area of the connection that would affect the test, Bug?.

@Alexander-Zender
Copy link
Contributor

I added a getter and a setter to update the variable at runtime; otherwise, all components in the library would always use the default value.
Additionally, I changed string_encoding to use context variables. So it is possible to subscribe to different OPC UA servers that use different encodings. We have a situation where we need to subscribe to two servers simultaneously, and they use different encodings at the moment.

@lkaupp
Copy link
Contributor Author

lkaupp commented Jan 13, 2026

@oroulet is it ok with the changes Alex made?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants