Skip to content

Fix for Dynamic Payloads#1036

Merged
TMRh20 merged 1 commit intomasterfrom
Fix-DynamicPayloadCorruption
May 2, 2025
Merged

Fix for Dynamic Payloads#1036
TMRh20 merged 1 commit intomasterfrom
Fix-DynamicPayloadCorruption

Conversation

@TMRh20
Copy link
Copy Markdown
Member

@TMRh20 TMRh20 commented May 2, 2025

  • The rx buffer also needs to be flushed if R_RX_PL_WID returns 0
  • No need for the delay

nRF24/RF24Network#249

- The rx buffer also needs to be flushed if R_RX_PL_WID returns 0
- No need for the delay

nRF24/RF24Network#249
@TMRh20 TMRh20 requested a review from 2bndy5 May 2, 2025 13:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 2, 2025

Memory usage change @ 4c3e2fc

Board flash % RAM for global variables %
arduino:avr:nano 💚 -8 - 0 -0.03 - 0.0 0 - 0 0.0 - 0.0
arduino:samd:mkrzero 0 - 0 0.0 - 0.0 0 - 0 0.0 - 0.0
Click for full report table
Board examples/GettingStarted
flash
% examples/GettingStarted
RAM for global variables
% examples/AcknowledgementPayloads
flash
% examples/AcknowledgementPayloads
RAM for global variables
% examples/ManualAcknowledgements
flash
% examples/ManualAcknowledgements
RAM for global variables
% examples/StreamingData
flash
% examples/StreamingData
RAM for global variables
% examples/MulticeiverDemo
flash
% examples/MulticeiverDemo
RAM for global variables
% examples/InterruptConfigure
flash
% examples/InterruptConfigure
RAM for global variables
% examples/scanner
flash
% examples/scanner
RAM for global variables
% examples/encodeRadioDetails
flash
% examples/encodeRadioDetails
RAM for global variables
%
arduino:avr:nano 0 0.0 0 0.0 -8 -0.03 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
arduino:samd:mkrzero 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
Click for full report CSV
Board,examples/GettingStarted<br>flash,%,examples/GettingStarted<br>RAM for global variables,%,examples/AcknowledgementPayloads<br>flash,%,examples/AcknowledgementPayloads<br>RAM for global variables,%,examples/ManualAcknowledgements<br>flash,%,examples/ManualAcknowledgements<br>RAM for global variables,%,examples/StreamingData<br>flash,%,examples/StreamingData<br>RAM for global variables,%,examples/MulticeiverDemo<br>flash,%,examples/MulticeiverDemo<br>RAM for global variables,%,examples/InterruptConfigure<br>flash,%,examples/InterruptConfigure<br>RAM for global variables,%,examples/scanner<br>flash,%,examples/scanner<br>RAM for global variables,%,examples/encodeRadioDetails<br>flash,%,examples/encodeRadioDetails<br>RAM for global variables,%
arduino:avr:nano,0,0.0,0,0.0,-8,-0.03,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
arduino:samd:mkrzero,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 2, 2025

Just to elaborate on this, when a 0 length Dynamic Payload was received, it would leave the RX FIFO full, even after calling read_payload(). In this case, available() will just continuously return true unless flush_rx() is called, that being the only way to clear the corrupted data in the FIFO buffers.

I see a lot of scenarios where this will affect things:

  1. At the core RF24 Layer, even if FAILURE_HANDLING is used, it won't detect these errors, one would have needed to reset the radio
  2. At the Network Layer, FAILURE_HANDLING would catch these errors
  3. Mesh layer has built in code in the examples to reset the radio if unable to renew its address after a time.
  4. Gateway layer utilizes FAILURE_HANDLING from RF24Network

All in all this IS the root cause of the problem I've been debugging all week.

@TMRh20 TMRh20 merged commit 9de7fdd into master May 2, 2025
76 checks passed
@TMRh20 TMRh20 deleted the Fix-DynamicPayloadCorruption branch May 2, 2025 15:49
@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 2, 2025

From the nRF24L01+ v1.0 Datasheet:

Note: Always check if the packet width reported is 32 bytes or shorter when using the
R_RX_PL_WID command. If its width is longer than 32 bytes then the packet contains errors
and must be discarded. Discard the packet by using the Flush_RX command.

Why oh why didn't they mention a zero-length payload requiring flush_rx I'll probably never know. Maybe they didn't know. This one really bugs me.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

Been investigating this issue further, as to the reason why/how it happens exactly.
Just want to keep track of progress.

A couple notes:

  1. I have yet to recreate the issue on a regular Arduino Nano or Due. I am monitoring 3 of them for the error. This may take a while.
  2. I can recreate it easily using RF24Gateway & large continuous data transfers on RPi. All of my RPis have now gotten the error, RPi3, RPi4 & RPi5.
  3. I've tried everything related to SPI, delays, SPI speed etc.
  4. Upon getting this 0 byte error the following registers always display the same:
    a: FIFO_STATUS 0x10 : Data in RX FIFO, Available locations in RX FIFO, TX FIFO Empty
    b: STATUS 0xE : Available locations in TX FIFO, RX_FIFO Empty
  5. Again I've tried reading 32 bytes, 1 byte, 0 bytes when this happens to no avail. All registers return the expected info, except the STATUS byte indicates the RX FIFO is empty.

I find it really interesting that the STATUS byte indicates RX_FIFO_EMPTY at the same time FIFO_STATUS indicates Data in RX FIFO. The RX_DR flag is asserted as well when this happens.
Also finding the difficulty to replicate on Arduino a bit odd.

https://devzone.nordicsemi.com/f/nordic-q-a/121237/nrf24l01-radio-r_rx_pl_wid-returns-0

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

Hmm, If we could narrow it down to the SPIDEV driver, we'd have a starting point.

All Linux drivers use internal buffers (RF24::_spi_*x_buf) before shipping data over SPI. I have an experimental branch in which it uses just 1 internal buffer for both MISO and MOSI. But it isn't well tested, and I doubt the problem is in RF24.cpp.

I can recreate it easily using RF24Gateway & large continuous data transfers

What we need is a reproduction with just RF24 layer. If that isn't possible, then I'd look at higher layers. I suspect a buffer problem, but you know how I overthink things.

Are you also using interrupts with RF24Gateway when this error occurs?

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

According to the datasheet:

7.3.3.1 Payload length
This 6 bit field specifies the length of the payload in bytes. The length of the payload can be from 0 to 32 bytes.
Coding: 000000 = 0 byte (only used in empty ACK packets.) 100000 = 32 byte, 100001 = Don’t care.
This field is only used if the Dynamic Payload Length function is enabled.

I feel like R_RX_PL_WID = 0 is trying to say the RX FIFO is empty or it just processed an empty ACK packet (by mistake somehow?).

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

Hmm, If we could narrow it down to the SPIDEV driver, we'd have a starting point.

I replicated it with the PiGPIO driver

What we need is a reproduction with just RF24 layer. If that isn't possible, then I'd look at higher layers. I suspect a buffer problem, but you know how I overthink things.

Working on it.

Are you also using interrupts with RF24Gateway when this error occurs?

Yes or no, still getting the error. Running the gwNodeInt, ncurses & ncursesInt examples

The reason I suspect an NRF24 issue is because of the register states (one says FIFO has payload, one says no payload) and simply flushing the RX buffer clears the issue. All other registers are OK, and the NRF goes back to receiving again... Continuous reads of the FIFO_STATUS buffer return the same result.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

This is a complete shot in the dark. I wonder if just writing 0xE0 would help the radio refresh the FIFO state. It is an undocumented command byte, but anything starting with 0b1110xxxx seems related to FIFO commands (flush is 0xE1 or 0xE2 and reuse TX is 0xE3).

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

FWIW, my CirPy port does additional checks before returning the R_RX_PL_WID result:

    def any(self) -> int:
        """This function reports the next available payload's length (in bytes)."""
        last_dyn_size = self._reg_read(0x60)  # R_RX_PL_WID
        if self._in[0] >> 1 & 7 < 6:  # is RX pipe number valid?
            if self._features & 4:  # is ACK payloads and dynamic size enabled?
                return last_dyn_size  # return dynamic size
            return self._pl_len[(self._in[0] >> 1) & 7]  # return static size (for that pipe)
        return 0  # nothing to report

I did a quick test in pure python (CirPy) with ACK payloads (& dynamic size) is enabled.
After a successful transmission (using an empty ACK packet from RX side), the TX side showed that R_RX_PL_WID return 32, even though the RX FIFO is empty. This of course is bypassing1 the above checks in my any() function.

32 could just be a remnant from the last received payload. I'm still not sure what to make of the FIFO state being invalid.

Footnotes

  1. The advantage of python's "nothing is actually private" rule allows me to access the private SPI functions in the RF24 class (RF24._reg_read(int) -> int).

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

I can try the 0xE0 thing in a while.
What I'm finding with RF24Gateway, if I just send UDP packets at about 20KB/s (1-way transmission) there is no problem, cannot recreate the issue.
Its only with TCP/IP and the constant back-and-forth that I can recreate the error.
This is why its so difficult to recreate otherwise, you need the back and forth and constant switching between states.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

Maybe amongst the switching it picks up an empty ACK packet in RX mode, thinks it is regular incoming, and saves the ACK packet's payload size to R_RX_PL_WID field while trying to write 0 bytes into the RX FIFO (causing the RX FIFO state to think there's data to read).

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

Again I've tried reading 32 bytes, 1 byte, 0 bytes when this happens to no avail. All registers return the expected info, except the STATUS byte indicates the RX FIFO is empty.

If you read 1 byte from the RX FIFO when the dyn size is 0, does the FIFO_STATUS change?
I'm hoping the cleared RX_DR flag would cause the FIFO_STATUS to update.

Again, looking at the datasheet:

The RX_DR IRQ is asserted by a new packet arrival event. The procedure for handling this interrupt should be:

  1. read payload through SPI,
  2. clear RX_DR IRQ,
  3. read FIFO_STATUS to check if there are more payloads available in RX FIFO,
  4. if there are more data in RX FIFO, repeat from step 1).

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

Nope, that's why it gets stuck in an available() loop to begin with.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

Ok, what if we tweak available()?

It already reads the FIFO_STATUS. In doing so we can double check the RX_DR flag in the accompanied STATUS byte. If the 2 disagree, then return false.

In my experience you can clear the RX_DR flag, but without draining the FIFO at all, the flag gets reasserted almost immediately.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

The available() loop problem makes it hard to break the solution for when the pipe number is invalid (for a brief time).

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

The following patch would make available(void) just as fast as availavle(uint8_t*), but I can't think of a better way to avoid breaking the validity of the pipe number.

--- a/RF24.cpp
+++ b/RF24.cpp
@@ -1531,7 +1531,9 @@ uint8_t RF24::getDynamicPayloadSize(void)
 
 bool RF24::available(void)
 {
-    return (read_register(FIFO_STATUS) & 1) == 0;
+    if ((read_register(FIFO_STATUS) & 1) == 0)
+        return ((update() >> RX_P_NO) & 0x07) < 6;
+    return false;
 }

 /****************************************************************************/
@@ -1539,7 +1541,7 @@ bool RF24::available(void)
 bool RF24::available(uint8_t* pipe_num)
 {
     if (available()) { // if RX FIFO is not empty
-        *pipe_num = (update() >> RX_P_NO) & 0x07;
+        *pipe_num = (status >> RX_P_NO) & 0x07;
         return 1;
     }
     return 0;

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

I think the solution we have in place is the most elegant. I'm just trying to understand why this happens and if it can be prevented. I'm like 99.99% sure its a RF24 issue that may be related to RPi specific behaviour. That's because the radio is not breaking, its working normally, except for one erroneous register, and that behavior is exactly repeatable. It also works after a flush of the one erroneous packet indicated.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 6, 2025

Ok, turning back to the Linux driver then...

I'm not sure exactly how the pigpio source works, but this function is responsible for primary SPI bus transfers. Maybe we can gleam something... I don't know.

I fear if the Linux kernel buffers SPI transactions in a queue (for DMA behavior), then we'd be sacrificing speed when we try to circumvent that.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 6, 2025

Well if anybody will have an answer its Nordic. Not sure if you noticed (it was added in an edit) but I posted a question about this:
https://devzone.nordicsemi.com/f/nordic-q-a/121237/nrf24l01-radio-r_rx_pl_wid-returns-0

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 7, 2025

Getting closer, I just replicated it with RF24Network, simply sending 1500 byte payloads over and over.

TX Code

#include <RF24/RF24.h>
#include <RF24Network/RF24Network.h>
#include <iostream>
#include <ctime>
#include <stdio.h>
#include <time.h>

using namespace std;

RF24 radio(22, 0); // (CE Pin, CSN Pin, [SPI Speed (in Hz)])

RF24Network network(radio);

// Address of our node in Octal format (01,021, etc)
const uint16_t this_node = 01;

// Address of the other node
const uint16_t other_node = 00;

// How often (in milliseconds) to send a message to the `other_node`
const unsigned long interval = 2000;

uint32_t last_sent;    // When did we last send?
uint32_t packets_sent; // How many have we sent already

struct payload_t
{ // Structure of our payload
    uint8_t test[1500];
};

int main(int argc, char** argv)
{
    // Refer to RF24 docs or nRF24L01 Datasheet for settings

    if (!radio.begin()) {
        printf("Radio hardware not responding!\n");
        return 0;
    }

    delay(5);
    radio.setChannel(90);
    network.begin(/*node address*/ this_node);
    radio.printDetails();
    payload_t payload;
    
    while (1) {

        network.update();

        RF24NetworkHeader header(/*to node*/ other_node);
        bool ok = network.write(header, &payload, sizeof(payload));
        printf("%s.\n", ok ? "ok" : "failed");
    }
    

    return 0;
}

RX Code

#include <RF24/RF24.h>
#include <RF24Network/RF24Network.h>
#include <iostream>
#include <ctime>
#include <stdio.h>
#include <time.h>

// CE Pin, CSN Pin, SPI Speed (Hz)
RF24 radio(22, 0);

RF24Network network(radio);

// Address of our node in Octal format
const uint16_t this_node = 00;

// Address of the other node in Octal format (01, 021, etc)
const uint16_t other_node = 01;

struct payload_t
{ // Structure of our payload
    uint8_t test[1500];
};

int main(int argc, char** argv)
{
    // Refer to RF24 docs or nRF24L01 Datasheet for settings

    if (!radio.begin()) {
        printf("Radio hardware not responding!\n");
        return 0;
    }

    delay(5);
    radio.setChannel(90);
    network.begin(/*node address*/ this_node);
    radio.printDetails();

    while (1) {

        network.update();
        while (network.available()) { // Is there anything ready for us?

            RF24NetworkHeader header; // If so, grab it and print it out
            payload_t payload;
            network.read(header, &payload, sizeof(payload));
        }
        delay(1);
    }

    return 0;
}

Looks like you don't need much back-and-forth...

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 7, 2025

Just replicated on Arduino Due using the helloworld_rx example and a 1514 MAX_PAYLOAD_SIZE!

Here's my observations so far:

  1. I have both the RPi4 and Due listening and Auto-acking on the same address. (I know bad practice, but I'm trying to recreate an issue here)
  2. The Due works fine over short periods, (still errors out over longer times) but as soon as I fire up the RPi4 running AA on the same address, I get errors on both devices. This is leading me to believe more-so its a problem with RF24 & ACKS.

To recreate:

  1. Run the helloworld_tx sketch I posted above, then add two receivers on the same address set to receive 1500 byte payloads.

Now to see about recreating at RF24 level.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 7, 2025

That fits what I said before except you're purposely sending rogue ACK packets.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 7, 2025

Yup, and I just replicated at the RF24 layer on the RPis. The Due is still going strong, but we all know now how that will end...

Lol, I think I just proved it out as either being at the RF24 layer or an nrf24l01 radio issue like I've been thinking.

There is a reason my devices wouldn't stay working for the weeks and months on end that I crave without FAILURE_HANDLING or Mesh resets. 😁

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 7, 2025

Have you tried lowering the radio's auto-retry-count? 3 seems sensible. The current 5 (used in network layer) might be a bit high for the nodes that have a higher auto-retry-delay.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 7, 2025

Have you tried lowering the radio's auto-retry-count?

Nope, that will affect things for sure though. It will just take longer for the issue to appear. The issue will still exist.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 7, 2025

Here is the RF24 code I use to replicate the issue:

RPI Transmitter:

#include <ctime>       // time()
#include <iostream>    // cin, cout, endl
#include <string>      // string, getline()
#include <time.h>      // CLOCK_MONOTONIC_RAW, timespec, clock_gettime()
#include <RF24/RF24.h> // RF24, RF24_PA_LOW, delay()

using namespace std;

/****************** Linux ***********************/
// Radio CE Pin, CSN Pin, SPI Speed
// CE Pin uses GPIO number with BCM and SPIDEV drivers, other platforms use their own pin numbering
// CS Pin addresses the SPI bus number at /dev/spidev<a>.<b>
// ie: RF24 radio(<ce_pin>, <a>*10+<b>); spidev1.0 is 10, spidev1.1 is 11 etc..
#define CSN_PIN 0
#ifdef MRAA
    #define CE_PIN 15 // GPIO22
#elif defined(RF24_WIRINGPI)
    #define CE_PIN 3 // GPIO22
#else
    #define CE_PIN 22
#endif
// Generic:
RF24 radio(CE_PIN, CSN_PIN);
/****************** Linux (BBB,x86,etc) ***********************/
// See http://nRF24.github.io/RF24/pages.html for more information on usage
// See https://github.com/eclipse/mraa/ for more information on MRAA
// See https://www.kernel.org/doc/Documentation/spi/spidev for more information on SPIDEV

// For this example, we'll be using a payload containing
// a single float number that will be incremented
// on every successful transmission
uint8_t payload[32] = {1,2};//;

void setRole(); // prototype to set the node's role
void master();  // prototype of the TX node's behavior
void slave();   // prototype of the RX node's behavior

// custom defined timer for evaluating transmission time in microseconds
struct timespec startTimer, endTimer;
uint32_t getMicros(); // prototype to get elapsed time in microseconds

int main(int argc, char** argv)
{

    // perform hardware check
    if (!radio.begin()) {
        cout << "radio hardware is not responding!!" << endl;
        return 0; // quit now
    }

    // to use different addresses on a pair of radios, we need a variable to
    // uniquely identify which address this radio will use to transmit
    bool radioNumber = 1; // 0 uses address[0] to transmit, 1 uses address[1] to transmit

    // print example's name
    cout << argv[0] << endl;

    // Let these addresses be used for the pair
    uint8_t address[2][6] = {"1Node", "2Node"};
    // It is very helpful to think of an address as a path instead of as
    // an identifying device destination

    // Set the radioNumber via the terminal on startup
    cout << "Which radio is this? Enter '0' or '1'. Defaults to '0' ";
    string input;
    getline(cin, input);
    radioNumber = input.length() > 0 && (uint8_t)input[0] == 49;

    // save on transmission time by setting the radio to only transmit the
    // number of bytes we need to transmit a float
    //radio.setPayloadSize(sizeof(payload)); // float datatype occupies 4 bytes
    radio.enableDynamicPayloads();
    radio.setAutoAck(1);
    //radio.setAutoAck(0, 0);
    // Set the PA Level low to try preventing power supply related problems
    // because these examples are likely run with nodes in close proximity to
    // each other.
    radio.setPALevel(RF24_PA_LOW); // RF24_PA_MAX is default.

    // set the TX address of the RX node into the TX pipe
    radio.openWritingPipe(address[radioNumber]); // always uses pipe 0

    // set the RX address of the TX node into a RX pipe
    radio.openReadingPipe(1, address[!radioNumber]); // using pipe 1

    // For debugging info
    // radio.printDetails();       // (smaller) function that prints raw register values
    // radio.printPrettyDetails(); // (larger) function that prints human readable data

    // ready to execute program now
    setRole(); // calls master() or slave() based on user input
    return 0;
}

/**
 * set this node's role from stdin stream.
 * this only considers the first char as input.
 */
void setRole()
{
    string input = "";
    while (!input.length()) {
        cout << "*** PRESS 'T' to begin transmitting to the other node\n";
        cout << "*** PRESS 'R' to begin receiving from the other node\n";
        cout << "*** PRESS 'Q' to exit" << endl;
        getline(cin, input);
        if (input.length() >= 1) {
            if (input[0] == 'T' || input[0] == 't')
                master();
            else if (input[0] == 'R' || input[0] == 'r')
                slave();
            else if (input[0] == 'Q' || input[0] == 'q')
                break;
            else
                cout << input[0] << " is an invalid input. Please try again." << endl;
        }
        input = ""; // stay in the while loop
    }               // while
} // setRole()

/**
 * make this node act as the transmitter
 */

uint32_t timer = 0;
uint32_t counter = 0;

void master()
{

    srand((unsigned) time(NULL));
    unsigned int failure = 0; // keep track of failures
    while (1) {
            radio.stopListening(); // put radio in TX mode
        clock_gettime(CLOCK_MONOTONIC_RAW, &startTimer);    // start the timer
        bool report = 0; 
        payload[0] = 3;
        for(int i=0;i < 65; i++){
            uint8_t payloadSize = rand() % sizeof(payload);
            radio.writeFast(&payload, payloadSize); // transmit & save the report
            counter += payloadSize;
        }
        payload[0] = 0xFF;
        radio.writeFast(&payload, sizeof(payload));
        counter += sizeof(payload);
        report = radio.txStandBy();
        radio.startListening();
        uint32_t timerElapsed = getMicros();                // end the timer
        if (report) {
            failure = 0;
            // payload was delivered
            if(millis() - timer > 1000){
                timer = millis();
            cout << counter / 1000.0 << endl;//ransmission successful! Time to transmit = ";
            counter = 0;
            }
                        payload[0]++;// += 0.01;  
        }
        else {

            failure++;
        }

        uint32_t timer = getMicros() + 400;
        while(!radio.available()){
            if(getMicros() > timer){
                break;
            }
        }
        while(radio.available()){
            uint8_t pSize = radio.getDynamicPayloadSize();
            uint8_t data[pSize];
            //cout << "RX" << std::endl;
            radio.read(data,pSize);
        }
        


    }
    cout << failure << " failures detected. Leaving TX role." << endl;

}

/**
 * make this node act as the receiver
 */
void slave()
{
    //
  delay(1);

}

/**
 * Calculate the elapsed time in microseconds
 */
uint32_t getMicros()
{
    // this function assumes that the timer was started using
    // `clock_gettime(CLOCK_MONOTONIC_RAW, &startTimer);`

    clock_gettime(CLOCK_MONOTONIC_RAW, &endTimer);
    uint32_t seconds = endTimer.tv_sec - startTimer.tv_sec;
    uint32_t useconds = (endTimer.tv_nsec - startTimer.tv_nsec) / 1000;

    return ((seconds)*1000 + useconds) + 0.5;
}

RPi Receiver:

#include <ctime>       // time()
#include <iostream>    // cin, cout, endl
#include <string>      // string, getline()
#include <time.h>      // CLOCK_MONOTONIC_RAW, timespec, clock_gettime()
#include <RF24/RF24.h> // RF24, RF24_PA_LOW, delay()

using namespace std;

/****************** Linux ***********************/
// Radio CE Pin, CSN Pin, SPI Speed
// CE Pin uses GPIO number with BCM and SPIDEV drivers, other platforms use their own pin numbering
// CS Pin addresses the SPI bus number at /dev/spidev<a>.<b>
// ie: RF24 radio(<ce_pin>, <a>*10+<b>); spidev1.0 is 10, spidev1.1 is 11 etc..
#define CSN_PIN 0
#ifdef MRAA
    #define CE_PIN 15 // GPIO22
#elif defined(RF24_WIRINGPI)
    #define CE_PIN 3 // GPIO22
#else
    #define CE_PIN 22
#endif
// Generic:
RF24 radio(CE_PIN, CSN_PIN);
/****************** Linux (BBB,x86,etc) ***********************/
// See http://nRF24.github.io/RF24/pages.html for more information on usage
// See https://github.com/eclipse/mraa/ for more information on MRAA
// See https://www.kernel.org/doc/Documentation/spi/spidev for more information on SPIDEV

// For this example, we'll be using a payload containing
// a single float number that will be incremented
// on every successful transmission
uint8_t payload[32] = {1,2};

void setRole(); // prototype to set the node's role
void master();  // prototype of the TX node's behavior
void slave();   // prototype of the RX node's behavior

// custom defined timer for evaluating transmission time in microseconds
struct timespec startTimer, endTimer;
uint32_t getMicros(); // prototype to get elapsed time in microseconds

int main(int argc, char** argv)
{

    // perform hardware check
    if (!radio.begin()) {
        cout << "radio hardware is not responding!!" << endl;
        return 0; // quit now
    }

    // to use different addresses on a pair of radios, we need a variable to
    // uniquely identify which address this radio will use to transmit
    bool radioNumber = 1; // 0 uses address[0] to transmit, 1 uses address[1] to transmit

    // print example's name
    cout << argv[0] << endl;

    // Let these addresses be used for the pair
    uint8_t address[2][6] = {"1Node", "2Node"};
    // It is very helpful to think of an address as a path instead of as
    // an identifying device destination

    // Set the radioNumber via the terminal on startup
    cout << "Which radio is this? Enter '0' or '1'. Defaults to '0' ";
    string input;
    getline(cin, input);
    radioNumber = input.length() > 0 && (uint8_t)input[0] == 49;

    // save on transmission time by setting the radio to only transmit the
    // number of bytes we need to transmit a float
    //radio.setPayloadSize(sizeof(payload)); // float datatype occupies 4 bytes
    radio.enableDynamicPayloads();
    radio.setAutoAck(1);
    //radio.setAutoAck(0, 0);
    // Set the PA Level low to try preventing power supply related problems
    // because these examples are likely run with nodes in close proximity to
    // each other.
    radio.setPALevel(RF24_PA_LOW); // RF24_PA_MAX is default.

    // set the TX address of the RX node for use on the TX pipe (pipe 0)
    radio.stopListening(address[radioNumber]);

    // set the RX address of the TX node into a RX pipe
    radio.openReadingPipe(1, address[!radioNumber]); // using pipe 1

    // For debugging info
    // radio.printDetails();       // (smaller) function that prints raw register values
    // radio.printPrettyDetails(); // (larger) function that prints human readable data

    // ready to execute program now
    setRole(); // calls master() or slave() based on user input
    return 0;
}

/**
 * set this node's role from stdin stream.
 * this only considers the first char as input.
 */
void setRole()
{
    string input = "";
    while (!input.length()) {
        cout << "*** PRESS 'T' to begin transmitting to the other node\n";
        cout << "*** PRESS 'R' to begin receiving from the other node\n";
        cout << "*** PRESS 'Q' to exit" << endl;
        getline(cin, input);
        if (input.length() >= 1) {
            if (input[0] == 'T' || input[0] == 't')
                master();
            else if (input[0] == 'R' || input[0] == 'r')
                slave();
            else if (input[0] == 'Q' || input[0] == 'q')
                break;
            else
                cout << input[0] << " is an invalid input. Please try again." << endl;
        }
        input = ""; // stay in the while loop
    }               // while
} // setRole()

/**
 * make this node act as the transmitter
 */
void master()
{
 delay(1);
}

/**
 * make this node act as the receiver
 */
 float tester = 0.1;

uint32_t timer = 0;
uint32_t counter = 0;
uint32_t payloadCounter = 0;

void slave()
{
    
    if(millis() - timer > 1000){
        timer = millis();
        counter = 0;
    }

    radio.startListening(); // put radio in RX mode
    time_t startTimer = time(nullptr);       // start a timer
    while (time(nullptr) - startTimer < 6) { // use 6 second timeout
    
        uint8_t pipe;
        while (radio.available(&pipe)) {                        // is there a payload? get the pipe number that received it
            payloadCounter++;
            counter += 32;
            delayMicroseconds(100);
            uint8_t bytes = radio.getDynamicPayloadSize();          // get the size of the payload
            radio.read(&payload, bytes);                     // fetch payload from FIFO
            startTimer = time(nullptr);                      // reset timer
            
            if(payload[0] == 0xFF){
                payloadCounter = 0;
              radio.stopListening();
              delayMicroseconds(rand() % 100);
              radio.writeFast(&tester, sizeof(tester));
              radio.txStandBy();
              //cout << "rxtx" << std::endl;
              radio.startListening();
            }
            tester += 0.1;
        }
        
        
        //radio.setAutoAck(0, 0);
    }
    cout << "Nothing received in 6 seconds. Leaving RX role." << endl;
    radio.stopListening();
}

/**
 * Calculate the elapsed time in microseconds
 */
uint32_t getMicros()
{
    // this function assumes that the timer was started using
    // `clock_gettime(CLOCK_MONOTONIC_RAW, &startTimer);`

    clock_gettime(CLOCK_MONOTONIC_RAW, &endTimer);
    uint32_t seconds = endTimer.tv_sec - startTimer.tv_sec;
    uint32_t useconds = (endTimer.tv_nsec - startTimer.tv_nsec) / 1000;

    return ((seconds)*1000 + useconds) + 0.5;
}

Arduino Due Receiver:

/*
 * See documentation at https://nRF24.github.io/RF24
 * See License information at root directory of this library
 * Author: Brendan Doherty (2bndy5)
 */

/**
 * A simple example of sending data from 1 nRF24L01 transceiver to another.
 *
 * This example was written to be used on 2 devices acting as "nodes".
 * Use the Serial Monitor to change each node's behavior.
 */
#include <SPI.h>
#include "printf.h"
#include "RF24.h"

#define CE_PIN 7
#define CSN_PIN 8
// instantiate an object for the nRF24L01 transceiver
RF24 radio(CE_PIN, CSN_PIN);

// Let these addresses be used for the pair
uint8_t address[][6] = { "1Node", "2Node" };
// It is very helpful to think of an address as a path instead of as
// an identifying device destination

// to use different addresses on a pair of radios, we need a variable to
// uniquely identify which address this radio will use to transmit
bool radioNumber = 1;  // 0 uses address[0] to transmit, 1 uses address[1] to transmit

// Used to control whether this node is sending or receiving
bool role = false;  // true = TX role, false = RX role

// For this example, we'll be using a payload containing
// a single float number that will be incremented
// on every successful transmission
uint8_t payload[32];

void setup() {

  Serial.begin(115200);
  while (!Serial) {
    // some boards need to wait to ensure access to serial over USB
  }

  // initialize the transceiver on the SPI bus
  if (!radio.begin()) {
    Serial.println(F("radio hardware is not responding!!"));
    while (1) {}  // hold in infinite loop
  }

  // print example's introductory prompt
  Serial.println(F("RF24/examples/GettingStarted"));

  // To set the radioNumber via the Serial monitor on startup
  Serial.println(F("Which radio is this? Enter '0' or '1'. Defaults to '0'"));
  while (!Serial.available()) {
    // wait for user input
  }
  char input = Serial.parseInt();
  radioNumber = input == 1;
  Serial.print(F("radioNumber = "));
  Serial.println((int)radioNumber);

  // role variable is hardcoded to RX behavior, inform the user of this
  Serial.println(F("*** PRESS 'T' to begin transmitting to the other node"));

  // Set the PA Level low to try preventing power supply related problems
  // because these examples are likely run with nodes in close proximity to
  // each other.
  radio.setPALevel(RF24_PA_LOW);  // RF24_PA_MAX is default.

  // save on transmission time by setting the radio to only transmit the
  // number of bytes we need to transmit a float
    radio.enableDynamicPayloads();
    radio.setAutoAck(1);

  // set the TX address of the RX node for use on the TX pipe (pipe 0)
  radio.stopListening(address[radioNumber]);  // put radio in TX mode

  // set the RX address of the TX node into a RX pipe
  radio.openReadingPipe(1, address[!radioNumber]);  // using pipe 1

  // additional setup specific to the node's RX role
  if (!role) {
    radio.startListening();  // put radio in RX mode
  }

  // For debugging info
  // printf_begin();             // needed only once for printing details
  // radio.printDetails();       // (smaller) function that prints raw register values
  // radio.printPrettyDetails(); // (larger) function that prints human readable data

}  // setup
float tester = 0.1;

void loop() {

  if (role) {
    // This device is a TX node

    unsigned long start_timer = micros();                // start the timer
    bool report = radio.write(&payload, sizeof(float));  // transmit & save the report
    unsigned long end_timer = micros();                  // end the timer

    if (report) {
    } else {
      Serial.println(F("Transmission failed or timed out"));  // payload was not delivered
    }

    // to make this example readable in the serial monitor
    delay(1000);  // slow transmissions down by 1 second

  } else {
    // This device is a RX node

    uint8_t pipe;
    if (radio.available(&pipe)) {              // is there a payload? get the pipe number that received it
      uint8_t bytes = radio.getDynamicPayloadSize();  // get the size of the payload
      radio.read(&payload, bytes);             // fetch payload from FIFO
      //Serial.println("rx");

      if(payload[0] == 0xFF){
              radio.stopListening();
              delayMicroseconds(random(0,100));
              radio.writeFast(&tester, sizeof(tester));
              radio.txStandBy();
              radio.startListening();
      }
    }
  }  // role

  if (Serial.available()) {
    // change the role via the serial monitor

    char c = toupper(Serial.read());
    if (c == 'T' && !role) {
      // Become the TX node

      role = true;
      Serial.println(F("*** CHANGING TO TRANSMIT ROLE -- PRESS 'R' TO SWITCH BACK"));
      radio.stopListening();

    } else if (c == 'R' && role) {
      // Become the RX node

      role = false;
      Serial.println(F("*** CHANGING TO RECEIVE ROLE -- PRESS 'T' TO SWITCH BACK"));
      radio.startListening();
    }
  }

}  // loop

I can use normal writes or the writeFast & txStandBy() functions, it doesn't matter much.

I also have it logging the errors, and often they happen in bunches:

02:25:31.556 -> WTF OVER
02:25:38.259 -> WTF OVER
02:25:38.396 -> WTF OVER
02:25:38.655 -> WTF OVER
02:25:39.527 -> WTF OVER
02:25:39.572 -> WTF OVER
02:25:39.986 -> WTF OVER
02:25:45.895 -> WTF OVER
02:25:46.712 -> WTF OVER
02:30:17.206 -> WTF OVER
02:31:51.255 -> WTF OVER
02:32:45.424 -> WTF OVER
02:33:30.587 -> WTF OVER
02:43:56.670 -> WTF OVER
02:43:56.714 -> WTF OVER
02:43:56.714 -> WTF OVER
02:43:56.747 -> WTF OVER
02:43:58.840 -> WTF OVER

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 7, 2025

WTF OVER

🤣

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 8, 2025

Just a quick update, one Arduino device I'm monitoring, a RPi Pico (RP 2040) running RF24Ethernet & MQTT in my "Production Environment" is reporting the error. This particular device has been giving me troubles, going inactive after about a few weeks to a month or so of being restarted. I suspect this issue + the 1-second loop in RF24Network update() function prior to restarting the radio.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 10, 2025

Finally saw the issue on an AVR device, Arduino Pro Micro! This assures me slightly that the problem is with nRF24L01P radios, not our driver, but still waiting to hear back from Nordic. I'm also getting the error with what appear to be genuine E-Byte E01-2G4M27Ds on RPi too.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 10, 2025

The way I see it, it's a side effect of OTA collision resulting in tardy ACK packets processed as regular incoming.

Even if Nordic comes up with a different answer, I'd be surprised if they'll have a solution any more comprehensive than adjusting the auto-retry parameters.

I'm still interested to see what they come up with as to why the dynamic size could return a number greater than 32. I suspect distortion on MISO for that.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 10, 2025

I think you might be right. I have a feeling Nordic will take their time with this one. This is kind of a serious bug, affects a lot of devices & libraries.
I did manage to send 0-length payloads to an RF24 from nRF52840, and the '52x gets an ack, but the nRF24 just does nothing. No payload is presented to the AVR device, so no dynamic payload length...

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 10, 2025

It only took 6 years from initial verification to fixing of this bug: 🙄
nRF24/RF24Network@4a40d32

Add detection of certain failures
Certain failure modes of the radios results in available returning true
constantly. Break out of the while loop and set failureDetected to true
to enable users to reconfigure the radio.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 10, 2025

The hardest part about debugging embedded software is dealing with intangible problems. It took a lot of theorizing just to understand the cause.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 10, 2025

Me for the past 6 years on & off just to adjust a half line of code.

why-does-this-happen

me now:

why-does-this-happen

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 11, 2025

So I managed to recreate the >32 byte issue with every payload using nRF52840. The latest nrf_to_nrf PR uses the full 6-bit LFLEN field, so up to 63-byte dynamic payloads. I just sent over a few packets at 33 bytes, and got the error on every packet, along with an ACK. All it takes to recreate this issue is an invalid 6-bit payload length field (>32 bytes).

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 11, 2025

Interesting.

I just sent over a few packets at 33 bytes, and got the error on every packet, along with an ACK.

  1. Does this mean the nRF24L actually ACK'd the invalid packets from the nRF5x?
  2. Would reading 33 bytes from the RX FIFO clear the payload?

I'm surprised the nRF24L has the capability to verify the CRC of a payload longer than 32 bytes. I'm guessing the payload the RX FIFO is truncated though.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 12, 2025

  1. Yup
  2. Haven't got a chance to try that

Yeah I was surprised it worked.

@TMRh20
Copy link
Copy Markdown
Member Author

TMRh20 commented May 13, 2025

Would reading 33 bytes from the RX FIFO clear the payload?

Trying to read the FIFO in either case apparently has no effect, the FIFOs need to be flushed. Looks like its a DOS for any radios that don't validate R_RX_PL_WID size & flush accordingly.

@2bndy5
Copy link
Copy Markdown
Member

2bndy5 commented May 13, 2025

Pity. So, flushing the RX FIFO flush and adjusting the auto retry parameters are our only recourse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants