Skip to content

Attempt to to set our own soft limits#26

Open
jesterhodl wants to merge 2 commits intoOCEAN-xyz:masterfrom
jesterhodl:set-our-own-limits
Open

Attempt to to set our own soft limits#26
jesterhodl wants to merge 2 commits intoOCEAN-xyz:masterfrom
jesterhodl:set-our-own-limits

Conversation

@jesterhodl
Copy link
Contributor

  1. attempt to set out own soft limits
  2. align wording: "hard open file limit" and "soft open file limit"

We basically check if max_clients is higher than the soft limit and if so, set the soft limit to the max_clients or the hard limit, if max_clients is higher.

It looks like this in action (when I doubled the defaults):

2024-10-22 21:51:39.315 [              datum_stratum_v1_socket_server]  INFO: Stratum V1 Server Init complete.
2024-10-22 21:51:39.315 [              datum_stratum_v1_socket_server]  WARN: *** NOTE *** Max Stratum clients (2048) exceeds soft open file limit (Soft: 1024 / Hard: 1048576)
2024-10-22 21:51:39.315 [              datum_stratum_v1_socket_server]  WARN: *** NOTE *** Attempting to increase the soft limit...
2024-10-22 21:51:39.315 [              datum_stratum_v1_socket_server]  INFO: Successfully increased the soft open file limit to 2048

Now that I suggest this PR. Is setting the limits to max_clients sufficient? Surely not, because there are other files open.
This of course doesn't invalidate this PR, because it does what it is supposed to, but I suppose we should have more room.

@luke-jr luke-jr force-pushed the set-our-own-limits branch from db8a38f to e924352 Compare March 8, 2025 04:56
} else if (app->max_clients > rlimit.rlim_cur) {
DLOG_WARN("*** NOTE *** Max Stratum clients (%llu) exceeds open file soft limit (Soft: %llu / Hard: %llu)", (unsigned long long)app->max_clients, (unsigned long long)rlimit.rlim_cur, (unsigned long long)rlimit.rlim_max);
DLOG_WARN("*** NOTE *** You should increase the soft open file limit to prevent issues as you approach max clients!");
DLOG_WARN("*** NOTE *** Max Stratum clients (%llu) exceeds soft open file limit (Soft: %llu / Hard: %llu)", (unsigned long long)app->max_clients, (unsigned long long)rlimit.rlim_cur, (unsigned long long)rlimit.rlim_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"open file soft limit" sounded better IMO

DLOG_WARN("*** NOTE *** Max Stratum clients (%llu) exceeds open file soft limit (Soft: %llu / Hard: %llu)", (unsigned long long)app->max_clients, (unsigned long long)rlimit.rlim_cur, (unsigned long long)rlimit.rlim_max);
DLOG_WARN("*** NOTE *** You should increase the soft open file limit to prevent issues as you approach max clients!");
DLOG_WARN("*** NOTE *** Max Stratum clients (%llu) exceeds soft open file limit (Soft: %llu / Hard: %llu)", (unsigned long long)app->max_clients, (unsigned long long)rlimit.rlim_cur, (unsigned long long)rlimit.rlim_max);
DLOG_WARN("*** NOTE *** Attempting to increase the soft limit...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this extra line, considering we log if we succeed or fail a moment later

DLOG_WARN("*** NOTE *** Attempting to increase the soft limit...");

// Attempt to increase the soft limit to match the hard limit or max_clients, whichever is smaller
struct rlimit new_rlimit = rlimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just modify rlimit in place?


// Attempt to increase the soft limit to match the hard limit or max_clients, whichever is smaller
struct rlimit new_rlimit = rlimit;
new_rlimit.rlim_cur = (app->max_clients < rlimit.rlim_max) ? app->max_clients : rlimit.rlim_max;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rlim_max is too low, we would have errored above and never gotten here...

But we should probably refactor this to increase the soft limit even if the hard limit is too low.


// Attempt to increase the soft limit to match the hard limit or max_clients, whichever is smaller
struct rlimit new_rlimit = rlimit;
new_rlimit.rlim_cur = (app->max_clients < rlimit.rlim_max) ? app->max_clients : rlimit.rlim_max;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suspect we actually need more than max_clients!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I have 3 client connections and /proc shows 35 fds (that's /dev/pts, the log, some pipes, bitcoind connections (not sure why I have 2), pool host, clients, api listener, and a bunch of [eventpoll]. Any advice how much to account for this stuff?

0u      CHR    136,3      0t0        6 /dev/pts/3
1u      CHR    136,3      0t0        6 /dev/pts/3
2u      CHR    136,3      0t0        6 /dev/pts/3
3w      REG    259,2  4481639  7999203 /var/log/datum_gateway/dg.log
4u  a_inode     0,15        0       49 [eventpoll:5]
5u     IPv4 17508026      0t0      TCP *:23335 (LISTEN)
6u     IPv4 21423289      0t0      TCP redacted.lan:45608->137.61.196.35.bc.googleusercontent.com:28915 (ESTABLISHED)
7u  a_inode     0,15        0       49 [eventpoll:6]
8r     FIFO     0,14      0t0 17507241 pipe
9r     FIFO     0,14      0t0 17511024 pipe
10w     FIFO     0,14      0t0 17507241 pipe
11w     FIFO     0,14      0t0 17511024 pipe
12u     IPv6 22178442      0t0      TCP ip6-localhost:51958->ip6-localhost:8332 (ESTABLISHED)
13u     IPv6 19072439      0t0      TCP ip6-localhost:38598->ip6-localhost:8332 (ESTABLISHED)
14u     IPv4 17508031      0t0      TCP *:23334 (LISTEN)
15u  a_inode     0,15        0       49 [eventpoll:14]
16u     IPv4 21251850      0t0      TCP redacted.lan:23334->redacted.lan:46678 (ESTABLISHED)
17u  a_inode     0,15        0       49 [eventpoll:16]
18u     IPv4 21251851      0t0      TCP redacted.lan:23334->redacted.lan:58921 (ESTABLISHED)
19u  a_inode     0,15        0       49 [eventpoll:18]
20u     IPv4 21251852      0t0      TCP redacted.lan:23334->redacted.lan:55522 (ESTABLISHED)
21u  a_inode     0,15        0       49 [eventpoll:20]
23u  a_inode     0,15        0       49 [eventpoll]
24u  a_inode     0,15        0       49 [eventpoll]
25u  a_inode     0,15        0       49 [eventpoll]
26u  a_inode     0,15        0       49 [eventpoll]
27u  a_inode     0,15        0       49 [eventpoll]
28u  a_inode     0,15        0       49 [eventpoll]
29u  a_inode     0,15        0       49 [eventpoll]
30u  a_inode     0,15        0       49 [eventpoll]
31u  a_inode     0,15        0       49 [eventpoll]
32u  a_inode     0,15        0       49 [eventpoll]
33u  a_inode     0,15        0       49 [eventpoll]
34u  a_inode     0,15        0       49 [eventpoll]
35u  a_inode     0,15        0       49 [eventpoll]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good start, but from there, you probably need to identify in the code what each one is used for. eg, I bet there's an eventpoll per socket thread...

Copy link
Contributor Author

@jesterhodl jesterhodl Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 35 fds, with max_clients set to 16 and 3 clients connected is:

  • 3 fds used connections: 3*2 = 6 (socket + epoll)
  • 13 unused connections = 13 (just the epoll)
  • "overhead" = 16
    16+6+13 = 35

This "overhead" is:

  • 3 for stdin,stdout,stderr
  • 1 for log
  • 2 for datum listener + eventpoll
  • 2 for pool connection + eventpoll (if using pool)
  • 4 pipes
  • 2 sockets to bitcoind
  • 2 api listener + eventpoll

3+1+2+2+4+2+2 = 16

Naturally, a submit block json will open an fd, so +1 and we arrive at overhad = 17, so:

int overhead = 17;
required_fds = 2 * app->max_clients + overhead;

Now, this is too tight. I might've missed something and your IPv6 work might change things.

I would go for a #define DATUM_FD_OVERHEAD 32 and use that.
Thoughts?
I can do a commit consisting of all of the suggested changes for preview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we have an epoll per connection? I would expect it to be per thread and shared across all connections on that thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure, yeah, judging from calculations. I'll do more research and educate myself on epoll, how it's coded in datum and compare lsof results with /proc//fd (and fdinfo)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with:

  • DATUM thread: 1 epoll, ? DNS + 1 connection + 1*n curl
  • per threadpool thread: 1 epoll
  • per client: 1 for stratum
  • stratum listener thread: 1 extra to reject a client + 1 epoll + up to 2 listening
  • stdio: 3
  • cookie file: 1
  • log file: 1
  • RPC notify fallback: n curl
  • submit block: 1 for file, n curl, n curl for dedicated thread
  • API: 1 listener, 1? epoll, up to 128 client connections, 1 reject, 1 to write config
  • each curl: 1 pipe, 1 socket

I'm not sure how to explain the 13 eventpoll fds at the end of your list, though.

Copy link
Contributor Author

@jesterhodl jesterhodl Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know where the 13 are coming from. When you launch Datum it will open the expected number of FDs.
The results I got, where from an instance which was running for a long time and had clients connect and disconnect.
Observation:

  1. Launch Datum, observe the expected number of fds
  2. Make some telnet/nc connections to the 23334 port, it will start opening fds up to max_clients and equivalent number of eventpolls. I have max_clients at 16, so 16-3 miners = 13
  3. When those fds are closed, the eventpolls remain open and are reused for subsequent connections.
  4. Those lingering eventpolls are reused by datum connections (23334) as well as web/api (23335), but if you have, say, 13 eventpolls and make 50 web/api connections, it will use the 13 eventpolls but no more. Those two types of connections work differently.

This is based on observations and opening connections to ports and observing lsof.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to keep each category separated out for future reference. So something like:

static const unsigned int fds_for_datum_thread = 1 /* epoll */ + 1 /* socket */ + (1 * fds_per_curl) + fds_for_getaddrinfo;

And since we don't necessarily control libcurl or getaddrinfo, probably should be generous with fd assumptions there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our chat, I pushed a function which estimates the file descriptors with a bit of contingency.
It looks at: stdio, pipes, log, sv1 listener + epoll, clients + epools of each thread, GBT and the fallback, datum connection and API + epolls.

@luke-jr luke-jr added the enhancement New feature or request label Mar 18, 2025
@jesterhodl jesterhodl force-pushed the set-our-own-limits branch 2 times, most recently from 5ce1d72 to 6681bc2 Compare June 8, 2025 12:26
@luke-jr luke-jr added this to the v0.4.0 milestone Jun 9, 2025
@luke-jr luke-jr removed this from the v0.4.0 milestone Jul 16, 2025
1. attempt to set out own soft limits
2. align wording: "hard open file limit" and "soft open file limit"
define and use DATUM_API_CONNECTION_LIMIT in datum_api.c and .h
@jesterhodl
Copy link
Contributor Author

I'm wondering should I rework it, should we scrap it, or is it OK?

@luke-jr
Copy link
Contributor

luke-jr commented Dec 16, 2025

Looks okay, I just don't know how to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants