-
Notifications
You must be signed in to change notification settings - Fork 17
AGW: fixes a few issues with the AX.25 header when using an AGW modem. #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
wb8tyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following up your ticket with a PR.
I am not the original author of this module, so do not know its origin.
I also do not currently have a local way to test AGW against installed AGW.
The pylint failures are from changes in what the new version of pylint is checking. The import outside of top level in self test is something that should just be suppressed, which I can do in a later PR.
I would have to figure out how to properly fix the issue about raise needing a from, so that is not something that needs to be fixed in this PR either.
| # left one bit | ||
| dst_str = "".join([chr(ord(x) << 1) for x in call]) | ||
| dst_str += encode_ssid(sid) | ||
| dst = dst_str.encode('utf-8', 'replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this encoding change? It may need to be made in other places.
The existing code will not throw an exception / crash the program if there is invalid data / garbage, it will pass it through.
We want to avoid exceptions being thrown that are not caught and properly reported to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On lines 684 and 685 dst_str is used to convert the destination call in the expected format for AX.25 (1 bit shift to the left). For further technical details, please refer to Section 3.12 and 3.12.1 in the AX.25 specification.
Since dst is a bytes array, and not a string, line 686 is used to convert the data format, while copying those bytes as is ("CQ" 0x43 0x51 would be converted to 0x86 0xA2, but L686 mistakenly converted it to UTF-8, and the resulting callsign would be unexpectedly transformed into 0xC2 0x86 0xC2 0xA2).
That UTF-8 conversion is thus pointless in our context, since callsigns are only composed of ASCII letters and digits, in the first place. Latin1 conversion is used, since we only want to cast a string into bytes, while preserving its binary content as is (especially the bytes with their 8th bit set). I'm not fluent in python, but please feel free to perform that operation in another way, if a more elegant solution exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python3 forces the use of UTF-8 here. This is a significant change from Python2. UTF-8 includes an extended ASCII subset as does the latin1 encoding.
I see the problem now:
call is a utf-8 string, and we should not be doing ord() modify in place chr() operations on it as that will corrupt the UTF-8 string beyond usability.
(I wonder how many cases of this are present in the the code now). A search shows duplicate and shared code in the comm.py module for use with hardware TNCs. More stuff to clean up.
This is where the change from Python2 to python3 is a problem as it seems like it can get very complicated to do what really needs to be done.
This is untested of course, but is the logic that is needed, and there may be a more pythonic way of doing this with list comprehension.
This needs to be fixed every where this "pattern" is used in agw, it is also needed to be changed in "comm.py". And probably the rest of the code needs to be scanned to for bugs like this.
# convert UTF-8 to bytes here.
call_bytes = bytearrry(call.encode('utf-8', 'replace'))
# build up the dst value.
dst = b"".join([x << 1) for x in call bytes])
# this also means that encode_sid is wrong in returning a one character utf-8 str.
# it needs to return a single byte instead, I would have to leave that to you.
dst += encode_ssid(sid)
`
If you can figure this out from here, that would be great, as I am not sure I will have time to do more than that for at least a week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I'm working on a few notes on how the D-Rats Data Transport protocol is put together and on a Wireshark Lua dissector . A lot of work is remaining, but I'm leaving it there in the case it would help you to troubleshoot things once I'm done (especially with the file transfer logic, which tends to segfault).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love a full description of the file and form (e-mail is considered a form) protocol I have not been able to fully figure it out by tracing the logic that is in D-Rats. I have only figured out part of it.
I have not seen segfaults with the most resent releases of drats with normal file transfers.
With (mostly accidental) fault injection testing I have discovered that there are significant problems with the state machine's handling of files and forms. Both share common code. It is easy for something to put D-Rats into a mode where it were it will be frozen in a state where a file or form is stuck.
Older versions of D-RATS have a bug in the serial port in the handling of X-ON/X-OFF. Originally D-Rats assumed that if it had no buffered data from the radio, that the radio had sent it an X-OFF. So D-Rats assumed that the next byte that came in from the serial port would be an X-ON. And if it was not an X-ON, it would log it and discard the character resulting in a frame that it could not parse. Apparently you could get away with that on older computers, but not on newer computers it would reliably trip on file transfers.
One thing to note on file/form transfers is that the data length is sent in little endian, while most everything else is big-endian. This is a bug where D-Rats was using native endian instead of network endian, and I do not know a way of fixing it to be network endian and stay compatible with older clients.
All existing pip installable D-Rats packages are out of date. The Windows users, the majority of the community found it too hard to setup a linux emulation environment and then install the packages. They want to install just one download file. And we currently have no one on the tiny d-rats development team that knows how maintain that type of packaging and keep it up to date for security issues.
Our install process right now is to run a script that does a git clone/update of the repository using MobaXterm on windows and leaving other platforms mostly to figure things out for themselves.
Basically I am trying to start completely over with the 'drats2' project, but it is still at the very early stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some notes I've been able to take so far: https://github.com/dscp46/dissectors/blob/main/doc/ddt2.md. D-Rats is an inheritance and signal hell, but reading traces and code side by side has shown to be of great help to get a rough gist of things.
Please keep in mind it's still a WIP, I need to test other code paths and to collect further traces and to confirm some assumptions.
For the record, on the crashes: my AGWPE setup is built with direwolf acting as the modem. The test rig is a pair of Debian PCs with a dedicated back-to-back soundcard path. I'm intercepting the network traffic between D-Rats and Direwolf on the loopback network interface.
I have a few hypotheses I need to dig on, relating to the crashes:
- It may be linked to the fact I've experienced crosstalk on the sound path, and D-Rats is not detecting and discarding loopback packets in some situations. I'm working on eliminating loopback packets on this weekend to check if I'm able to reproduce that.
- It also may linked to the AGWPE code path for some obscure reason. I'll try some transfers using a pair of d-star radios to confirm this.
In either cases, I'll open separate issues if I find something fishy.
73 de F4HOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test with #302 as that incorporates fixes for the issues in both of your PRs so far, If you find those fixes acceptable, then I can close this PR.
One of the bugs that I have not been able to isolate is that if you send a form or a file to your self (Seems an obvious unit test) d-rats code detects it as a loop and just drops the packet. This is a bug because it leaves the file locked in an infinite loop of trying to send the message, and the only way that I can break that loop is to manually delete lock file and the file in the sending directory. Nothing cancels the transfer because D-Rats is discarding the packet not rejecting it to the sender. A similar problem can occur if one station or Ratflector goes offline just a transfer starts.
I need to extend the D-Rats protocol in the future, but not sure the best way to do it for backwards compatability.
-
Need a keepalive between the client and repeaters, but do not want it sent over radio links for obvious reasons.
I do not know enough yet about tcpip socket programming to know if there is just a simple way of doing this. -
I want to add some type of message sender verification, such as a signature using public key verification, so that we can verify that the who the message came from with out using passwords.
| 0x00, # Space for flag (?) | ||
| dst, # Dest Call | ||
| src, # Source Path | ||
| 0x3E, # Info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future enhancement, all of these numeric values should be a constant variables that can have comments with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, I'd leave it in your hands, since I'm not familiar with this project's variable/constants naming convention. That byte is the AX.25 Control field. The null byte at line 696 is the end of the AGW User field (I'm slowly writing Lua dissectors for amateur radio protocols on Wireshark, please feel free to use them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not something that would need to be fixed in this PR.
This project previously had no variable/constants naming convention. This was quickly converted from python2 to python3, and since then there have been a lot of changes in what is best programing practices.
One of them the conventions is no magic constant values as pure numbers or strings, so all values should be assigned to a named constant symbol and then the symbol is used. For python there are many ways of doing this.
| src_str += "".join([chr(ord(x) << 1) for x in call]) | ||
| src_str += encode_ssid(sid, spath[-1] == scall) | ||
| src = src_str.encode('utf-8', 'replace') | ||
| src = bytes(src_str, 'latin1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyLint yelled about re-raising that change, but it is pointless to re-run that line on every iteration (and a waste of CPU cycles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed the indentation change, and that looks like an improvement, but the scall and spath are stored as UTF-8 per python3 not latin1.
I do not see anything in the Github actions pylint about this.
The pylint issue is about a change in how pylint maintainers believe that exceptions should be raised and that is not something that needs to be updated with this PR.
It does not re-raise the exception, it just can provide more diagnostic information in some cases.
wb8tyw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the problem better now, thanks to your detailed explanation.
I think a different solution is needed and that the comm.py needs a similar fix.
| # left one bit | ||
| dst_str = "".join([chr(ord(x) << 1) for x in call]) | ||
| dst_str += encode_ssid(sid) | ||
| dst = dst_str.encode('utf-8', 'replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python3 forces the use of UTF-8 here. This is a significant change from Python2. UTF-8 includes an extended ASCII subset as does the latin1 encoding.
I see the problem now:
call is a utf-8 string, and we should not be doing ord() modify in place chr() operations on it as that will corrupt the UTF-8 string beyond usability.
(I wonder how many cases of this are present in the the code now). A search shows duplicate and shared code in the comm.py module for use with hardware TNCs. More stuff to clean up.
This is where the change from Python2 to python3 is a problem as it seems like it can get very complicated to do what really needs to be done.
This is untested of course, but is the logic that is needed, and there may be a more pythonic way of doing this with list comprehension.
This needs to be fixed every where this "pattern" is used in agw, it is also needed to be changed in "comm.py". And probably the rest of the code needs to be scanned to for bugs like this.
# convert UTF-8 to bytes here.
call_bytes = bytearrry(call.encode('utf-8', 'replace'))
# build up the dst value.
dst = b"".join([x << 1) for x in call bytes])
# this also means that encode_sid is wrong in returning a one character utf-8 str.
# it needs to return a single byte instead, I would have to leave that to you.
dst += encode_ssid(sid)
`
If you can figure this out from here, that would be great, as I am not sure I will have time to do more than that for at least a week.
| src_str += "".join([chr(ord(x) << 1) for x in call]) | ||
| src_str += encode_ssid(sid, spath[-1] == scall) | ||
| src = src_str.encode('utf-8', 'replace') | ||
| src = bytes(src_str, 'latin1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed the indentation change, and that looks like an improvement, but the scall and spath are stored as UTF-8 per python3 not latin1.
I do not see anything in the Github actions pylint about this.
The pylint issue is about a change in how pylint maintainers believe that exceptions should be raised and that is not something that needs to be updated with this PR.
It does not re-raise the exception, it just can provide more diagnostic information in some cases.
| 0x00, # Space for flag (?) | ||
| dst, # Dest Call | ||
| src, # Source Path | ||
| 0x3E, # Info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not something that would need to be fixed in this PR.
This project previously had no variable/constants naming convention. This was quickly converted from python2 to python3, and since then there have been a lot of changes in what is best programing practices.
One of them the conventions is no magic constant values as pure numbers or strings, so all values should be assigned to a named constant symbol and then the symbol is used. For python there are many ways of doing this.
|
#302 should address the issues in this PR. |
Hi,
This PR fixes the AX.25 address / path encoding issues, and sets the frame type as Unnumbered Information (closes #298 ).
Locally tested.
Best regards,