Skip to content

Conversation

@teburd
Copy link
Contributor

@teburd teburd commented Oct 28, 2022

Rather than a fixed size queue, and with it the possibility of running
out of pre-allocated spots, each iodev now holds a wait-free mpsc
queue head.

This changes the parameter of iodev submit to be a struct
containing the 3 pointers for the rtio context, the submission queue entry,
and the queue pointer needed for the mpsc queue.

This solves the problem involving busy iodevs working with real
devices. For example a busy SPI bus driver could enqueue, without locking,
a request to start once the current request is done.

The queue entries are expected to be owned and allocated by the
executor rather than the iodev. This helps simplify potential
tuning knobs to one place, the RTIO context and its executor an
application directly uses.

As the test case shows iodevs can operate effectively lock free
with the mpsc queue and a single atomic denoting the current task.

Signed-off-by: Tom Burdick [email protected]

@teburd teburd requested a review from nashif as a code owner October 28, 2022 21:03
@teburd teburd changed the title rtio: wait-free MPSC queue for iodevs rtio: MPSC queues for iodevs Oct 28, 2022
@teburd teburd mentioned this pull request Oct 28, 2022
19 tasks
@teburd teburd force-pushed the rtio_iodev_mpsc branch 4 times, most recently from 4b0e244 to 763115f Compare October 31, 2022 19:00
Copy link
Contributor

@nordic-krch nordic-krch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered to use ztress framework? It may be useful to validate thread safeness as it can force multiple preemptions. One thing to know when using ztress is to increase sys clock frequency for efficient stressing.
More info: https://docs.zephyrproject.org/latest/develop/test/ztest.html#stress-test-framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, I'll give it a whirl. Thanks for taking a look!

@nordic-krch
Copy link
Contributor

Few thoughts regarding rtio future features:

  • For i2c we probably need an option to combine multiple events into one. There is TX-RX or TX-TX with repeated start, etc. Currently there is a single mpsc sqe per rtio so it's not possible to schedule more than one. Maybe some chaining of events could be added (mpsc_sqe pointer to rtio_sqe which points to next rtio_sqe, etc.)

  • I would like to be able to get entry in cq on the last event. Let say i do an initialization of a sensor which contains 10 i2c writings. I don't want to wake up the thread on every completion (by the semaphore) but only on the last one or error. It could probably be achieved through some flags.

  • It would be nice to get some instrumentation for measuring spsc utilization.

  • I would like to be able to add a delay as one of the events to be able to execute something like trigger sensor, wait for some time, read the results (wake up when full sequence is completed see 2. ). This could be done through some iodev using k_timer or scheduled k_work that would sort delay requests.

  • How i2c dev id will be passed to the i2c iodev?

  • How iodev specific flags are passed?

@teburd
Copy link
Contributor Author

teburd commented Nov 7, 2022

Few thoughts regarding rtio future features:

* For i2c we probably need an option to combine multiple events into one. There is TX-RX or TX-TX with repeated start, etc. Currently there is a single mpsc sqe per rtio so it's not possible to schedule more than one. Maybe some chaining of events could be added (mpsc_sqe pointer to rtio_sqe which points to next rtio_sqe, etc.)

I agree, there's a few options here.

One option might be to add needed parameters to rtio_sqe to match up better with i2c_msg and a spi_tx+spi_rx buf pair in one. Then consider RTIO_SQE_CHAINED with operations on the same iodev in sequence to be a transaction of sorts implicitly. I don't think chaining across different iodevs while blocking the device from being used by others makes sense.

Another option, would be to add new op codes (RTIO_OP_I2C_TRANSFER, RTIO_OP_SPI_TRANSCIEVE) and take the same basic parameters that i2c_transfer() and spi_transceive() API calls take today. Meaning each op is a transactional element.

The benefit of the new op codes is it maybe makes it easier to convert existing API calls into RTIO queue entries so mixing and matching threads with blocking device calls and RTIO async chains is possible. This means too that the drivers already work with that, meaning less work involved potentially adding RTIO support. So lots of potential pluses in that direction.

The API is experimental, and the usage with SPI specifically next on my todo list. So feedback here is really appreciated. Let me know what makes sense from your perspective.

* I would like to be able to get entry in `cq` on the last event. Let say i do an initialization of a sensor which contains 10 i2c writings. I don't want to wake up the thread on every completion (by the semaphore) but only on the last one or error. It could probably be achieved through some flags.

You mentioned before you'd like to get a callback as well. The answer would be add RTIO op codes and chain your operations into the one you want to wait on. An RTIO_OP_SEM_GIVE could be available where the struct rtio_iodev is replaced with a struct k_sem *, the executors would need to understand how to do this more directly which isn't a bad thing. There's other OPs (delays, deadlines, callbacks) that would be useful in the same way. Some common code could be available.

The way the current rtio_submit() and rtio_cqe_consume_block() calls work today with a semaphore is less than ideal. But the API doesn't stipulate that it uses a semaphore internally, it just happened to be a quick and easy way of making it work.

Ideally struct rtio would have its own _wait_q and semantics around it for doing what rtio_submit does today. Alternatively you could do the above, not wait at all on the rtio context and add a RTIO_OP_SEM_GIVE with a semaphore. Some better logic on what to do when the previous SQE errors would need to happen to make that work the way we'd all I think expect it to.

* It would be nice to get some instrumentation for measuring spsc utilization.

What sort of instrumentation are you looking for, similar to k_thread stack usage with current/max kind of thing?

* I would like to be able to add a delay as one of the events to be able to execute something like trigger sensor, wait for some time, read the results (wake up when full sequence is completed see 2. ). This could be done through some iodev using k_timer or scheduled k_work that would sort delay requests.

Absolutely part of the known missing features. RTIO_OP_DELAY with a k_timer object rather than a struct rtio_iodev would be needed for that. The same is true of work queues. RTIO_OP_WORK_SUBMIT could exist, and happen at the tail end or start of a chain or both.

* How i2c dev id will be passed to the i2c iodev?

My basic plan was something like...

struct my_driver_data {
#ifdef CONFIG_RTIO
  struct rtio_iodev iodev;
#endif
};

int my_driver_iodev_submit(...) 
{
   struct my_driver_data *drv_data = CONTAINER_OF(..);

}
* How iodev specific flags are passed?

A member could be added to rtio_sqe to contain iodev specific flags if needed. The size of this struct, while using RAM isn't likely to be a huge deal in my opinion as its likely each context is going to have a small queue of them, with perhaps a few contexts per application for small devices. Still much smaller than a complete thread stack.

There is a flags entry already with only a single bit being used. Could easily set aside half of it for things like i2c flags if thats what was coming to mind.

@teburd teburd requested review from gmarull and yperess December 2, 2022 00:43
@teburd
Copy link
Contributor Author

teburd commented Jan 5, 2023

Some neat ideas came from a small discussion group.

Re-using the userdata pointer to point to the next sqe in the chain was one, and was really interesting. By doing that the device queue can have transactions atomically added in one single mpsc queue insert.

Some further thoughts to this... there is one potential issue in that not all rtio_sqe's in the chain are guaranteed to operate on the same device at the moment. That could be a restriction that is enforced though if it greatly simplifies the common uses, which I think it would.

@teburd teburd added the DNM This PR should not be merged (Do Not Merge) label Jan 12, 2023
@teburd teburd removed the DNM This PR should not be merged (Do Not Merge) label Jan 17, 2023
@teburd teburd force-pushed the rtio_iodev_mpsc branch 3 times, most recently from 33881c7 to 32d26e3 Compare February 9, 2023 18:43
@teburd
Copy link
Contributor Author

teburd commented Feb 9, 2023

I'm pushing off changes to enable polling and the transactional work mentioned. This gets a big step forward

I have a in progress branch with a real board and real sensor where I'm working towards enabling RTIO with SPI... this is the stable set of changes that I need for that.

See https://github.com/teburd/zephyr/tree/tdk_icm42688p_rtio for how this is used with a real sensor at the moment (thought with a thread to do SPI still since I haven't finished that bit just yet)

@teburd teburd added this to the v3.4.0 milestone Feb 9, 2023
@teburd teburd requested a review from nandojve February 9, 2023 19:56
Comment on lines -76 to +78
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice

yperess
yperess previously approved these changes Feb 21, 2023
SPSC initializer macro had unused parameters defined and documented. Remove
those as they are unused.

Signed-off-by: Tom Burdick <[email protected]>
Adds a lock free/wait free MPSC queue to the rtio subsystem.

While the SPSC ring queue is fast and cache friendly it doesn't
work for all scenarios. Particularly the case where multiple rtio contexts
are attempting to work with a single iodev. An MPSC queue works perfectly
in this scenario.

Signed-off-by: Tom Burdick <[email protected]>
By using an mpsc queue for each iodev, the iodev itself is shareable across
contexts. Since its lock free, submits may occur even from an ISR context.

Rather than a fixed size queue, and with it the possibility of running
out of pre-allocated spots, each iodev now holds a wait-free mpsc
queue head.

This changes the parameter of iodev submit to be a struct containing 4
pointers for the rtio context, the submission queue entry, and the mpsc
node for the iodevs submission queue.

This solves the problem involving busy iodevs working with real
devices. For example a busy SPI bus driver could enqueue, without locking,
a request to start once the current request is done.

The queue entries are expected to be owned and allocated by the
executor rather than the iodev. This helps simplify potential
tuning knobs to one place, the RTIO context and its executor an
application directly uses.

As the test case shows iodevs can operate effectively lock free
with the mpsc queue and a single atomic denoting the current task.

Signed-off-by: Tom Burdick <[email protected]>
MaureenHelm
MaureenHelm previously approved these changes Feb 22, 2023
Renode platform fails the test despite it working well on qemu riscv.
Ignore this particular platform for now.

Signed-off-by: Tom Burdick <[email protected]>
Noticed the tests were a bit verbose, saw a few stray printks. Drop those
as they aren't really needed and potentially cause testing issues, printk
is a potential synchronization point.

Signed-off-by: Tom Burdick <[email protected]>
@teburd
Copy link
Contributor Author

teburd commented Feb 22, 2023

I ended up excluding a renode platform that fails the test for now. Something to look at later at some point. Also cleaned up and removed a few printks in the tests.

@teburd teburd requested a review from MaureenHelm February 22, 2023 22:00
@teburd
Copy link
Contributor Author

teburd commented Feb 28, 2023

I have a set of changes waiting on this that adds transactions (for i2c/spi) and ongoing work adding SPI support and an initial driver with the sam spi driver on the tdk robokit1.

@yperess @nordic-krch please re-review when you have a chance, thanks!

struct rtio_iodev_sqe task;
};

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment to unchanged content but leaving it anyway.
Imo, this header should only have

struct rtio_simple_executor {
...
};
extern const struct rtio_execture_api z_rtio_simple_api;

And maybe initializer macro that would set proper API.
API variable should be created in c file as currently anyone including this header will get own copy. This way functions can be private in c file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, thanks! Will create a PR that fixes this.

@@ -0,0 +1,156 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this file be renamed and moved to more generic location? This is generic mpsc queue.

We can do this in the next step.

Copy link
Contributor Author

@teburd teburd Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely, the two atomic queues are useful beyond rtio. Will do in a follow up PR.

@carlescufi carlescufi merged commit e6596d7 into zephyrproject-rtos:main Mar 3, 2023
@teburd teburd deleted the rtio_iodev_mpsc branch March 3, 2023 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants