Skip to content

Conversation

@kulikjak
Copy link
Contributor

@kulikjak kulikjak commented Mar 7, 2023

When you set ulimit -n unlimited on Solaris and its derivatives (where /dev/poll is available) and import selectors, Python crashes with a MemoryError because there is no upper limit to the allocation size.

This fix adds an arbitrary limit of 2^18 (which results in roughly ~4MB of memory).

Fixes #102494

@arhadthedev arhadthedev added the stdlib Standard Library Python modules in the Lib/ directory label Apr 2, 2023
@arhadthedev
Copy link
Member

@jcea (as a Solaris expert)

@jcea
Copy link
Member

jcea commented Apr 3, 2023

For reference, documentation at https://docs.oracle.com/cd/E19253-01/816-5177/6mbbc4g9n/index.html for example.

Notice that current array plays two roles:

  1. Provide registration and removal of file descriptor information to the kernel. This operations don't require a huge array, you just flush descriptor information when the buffer is full or just right before the "poll" operation. This is already implemented in the "devpoll_flush()" function.

  2. Get the result information from the "poll" operation. This requires being able to receive, in the worst case, all the registered file descriptors. Solaris will give back information up to "n" file descriptors if not enough space is available, but I don't know what would happen if these fds become active again. Maybe other file descriptors will starve, begging for service, but not receiving it because other active fds are notified and there is not enough space to receive all pending notifications at the same time (*).

Instead of allocating a huge array and clap it to a max size, I would suggest to keep a count of active file descriptors (registrations minus unregistrations) and resize the array as needed (maybe only growing it if needed, never shrinking it).

I would suggest an initial size of 1024, growing 25% when needed.

Shrinking would be nice too, but probably unimportant.

(*) Would be quite interesting to investigate the kernel implementation. I guess that playing with a handful of fds would be enough to determine if the kernel gives back the active fds in a round robin, random or "start from the beginning until you fill the buffer" way. Some comments at https://github.com/illumos/illumos-gate/blob/2c76d75129011c98e79463bb84917b828f922a11/usr/src/uts/common/io/devpoll.c#L237 suggest that Solaris kernel gives back the active descriptor in a round robin way, so actually using a small (1024 entries?) buffer would be fine enough. See also code around line 293 and 629. This assumption needs to be tested, but the intention seems quite clear.

If this assumption holds true and you want a highly concurrent/performant implementation, you could resize the array when the returned list of active file descriptors use the full array, signaling that a bigger array could reduce syscall traffic, although if you are dealing with Python, this kind of optimizations are probably overkill.

PS: I would support a "port" interface, although Solaris is not a tier-1 platform nowadays for Python.

@kulikjak
Copy link
Contributor Author

kulikjak commented May 9, 2023

Thank you for the detailed notes; I will look into it.

I've also recently hit another issue with the Devpoll selector when working on Cheroot (cherrypy/cheroot#561), so it needs some love.

@serhiy-storchaka
Copy link
Member

Any progress? I support @jcea's idea.

@kulikjak
Copy link
Contributor Author

Uh, I am sorry - this one's gone missing from my todo list...

Thanks for the review and extensive notes. Originally, I was under the assumption (I think - it's quite some time since I filled this) that we need as big of an array as is the number of watched descriptors, but we indeed don't.

I played with devpoll a little and the active file descriptors are indeed returned in a round robin way (at least on Oracle Solaris), so we don't need to worry about starvation.

I wonder whether we even need to resize the array. The performance would probably be slightly better in cases where devpoll.poll hits the upper limit often, but then we would likely need shrinking as well, and we probably don't want to expand/shrink based on a single devpoll.poll so we would need to watch n previous polls for the number of returned descriptors and make the decision to expand or shrink based on some average. That said, it can certainly be done.

We can also provide a new method to resize the buffer - most people would likely be happy with 1024, and those with a huge number of expected active descriptors can control the size of the returned tuple themselves.

@kulikjak
Copy link
Contributor Author

And as for the port interface, I'll keep that in mind. It certainly doesn't have the highest priority, and thus I don't know when I'll have some time to look into that, but it would be a nice addition. Thanks!

@kulikjak
Copy link
Contributor Author

kulikjak commented Jul 30, 2025

So, I found the root cause of the problem I wrote about above (cherrypy/cheroot#561), and coincidentally it's relevant here.

In my testing, I saw internal_devpoll_register being called with one fd, but subsequent devpoll_flush registering a different one.

It happens when select_devpoll_poll_impl is called in one thread, then here:

        Py_BEGIN_ALLOW_THREADS
        errno = 0;
        poll_result = ioctl(self->fd_devpoll, DP_POLL, &dvp);
        Py_END_ALLOW_THREADS

right after Py_BEGIN_ALLOW_THREADS, a different thread calls internal_devpoll_register with a different descriptor, which is immediatelly overwritten with the result from the ioctl call. (I believe this can also happen in reverse - ioctl returning a descriptor, and before it reaches Py_END_ALLOW_THREADS, another thread overwrites it with one that is yet to be registered.)

Forcing a devpoll_flush after every register/unregister call seems to fix the issue, though that might still not be correct in the ioctl first, and register right after that case.

Because of that, I split the buffer into two - one for polling and one for registering/unregistering.

The one for polling now has a limit of min(limit.rlim_cur, 1024) - as mentioned in the existing comment "If we try to process more than getrlimit() fds, the kernel will give an error", which is why the limit.rlim_cur needs to be there. As I mentioned above, I am unsure whether we even need to have this buffer resizable - if so, I have a possible implementation with getter/setter for the maximum buffer size here: kulikjak@066314e

The register/unregister buffer has no specific requirements and will work no matter the size (128 seems like a good size ;)).

Let me know what you think. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review stdlib Standard Library Python modules in the Lib/ directory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MemoryError when using selectors on Solaris

5 participants