-
Notifications
You must be signed in to change notification settings - Fork 89
[agroal#662] Add PostgreSQL message logging #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@fluca1978 is this okay? |
|
@fluca1978 after the |
|
Uhm, your PR is fine @ankushT369 but there is some issue with |
|
@fluca1978 Did @Userfrom1995 sign-off on it ? Are we closer to finding the core issue ? |
I’ll test it today and run all three pipelines with the new log tool. I’ll share the logs and any findings if I spot something useful. |
An interesting observation from @ankushT369 is that, after enabling this message logging facility, some issues are reduced. I don't think we are any closer to the solution right now, but we have another tool in the bag that can surely help. Besides, there is #667 slowing down the merge of this work. |
|
What I’m seeing is that when I enable @fluca1978 said it looks like a critical section or something alike issue — the logging changes the execution timing and masks the real problem. One idea is to add a very targeted check instead of heavy logging, something like: This way we only log when the message type is not |
|
These are the logs from all three pipelines using io_uring, along with Please look specifically for WARN messages in these logs. Logs with io_uring_log_session_non_d.log I didn’t notice any change after adding these log statements. The issue still behaves the same. The assumption that adding these log statements improves something doesn’t seem correct to me. I’m not able to see any improvement at all. @ankushT369, are you sure that enabling
One more suggestion: add the calls |
|
Hey @Userfrom1995, The thing is, @jesperpedersen asked me to add a logging function to check the message type. This is the [ankush@cognitive pgbench] ./pgbench -i -h localhost -p 2345 -U ankush ankush
dropping old tables...
creating tables...
generating data (client-side)...
pgbench: error: PQputline failed
[ankush@cognitive pgbench] ./pgbench -i -h localhost -p 2345 -U ankush ankush
dropping old tables...
creating tables...
generating data (client-side)...
ERROR: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY pgbench_accounts, line 27931
pgbench: error: PQendcopy failed
What I mean is that you should not just remove Instead, inside the function pgagroal_log_postgres(msg)replace the existing line pgagroal_log_trace("Message type: %c len: %u", msg_type, msg_len);with if (msg_type != 'd')
{
pgagroal_log_trace("Message type: %c len: %u", msg_type, msg_len);
} |
|
This is a function that we can use to help pinpoint where the bug is. Once we are done the function will remain in the code base, but never called due to its overhead. So, it is a matter of finding the call-sites, and creating output that will help us |
|
@jesperpedersen one question is this happening because of race condition or similar?? |
|
@ankushT369 It is likely socket descriptor management which is the hardest area to debug |
That’s exactly what I did to produce the non_d.logs. Also, regarding your earlier comment where you mentioned that enabling debug and trace solves the issue.I’m particularly interested in that. What you’ve shared so far looks like the same standard issue we are already facing. Could you please explain what improvement you noticed after enabling the debug and trace level logging? Specifically, I’d like to understand what changes when the debug/trace is enabled and how it helps with the issue. |
u can see its not showing anymore [ankush@cognitive pgbench] ./pgbench -i -h localhost -p 2345 -U ankush ankush
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 0.30 s (drop tables 0.01 s, create tables 0.00 s, client-side generate 0.24 s, vacuum 0.03 s, primary keys 0.02 s).
[ankush@cognitive pgbench]heres the log file: i placed it into the performance pipeline : static void
performance_client(struct io_watcher* watcher)
{
int status = MESSAGE_STATUS_ERROR;
struct worker_io* wi = NULL;
struct message* msg = NULL;
struct main_configuration* config = (struct main_configuration*)shmem;
wi = (struct worker_io*)watcher;
status = pgagroal_recv_message(watcher, &msg);
pgagroal_log_postgres(msg);
if (likely(status == MESSAGE_STATUS_OK))
{
if (likely(msg->kind != 'X'))
{
status = pgagroal_send_message(watcher, msg);
if (unlikely(status != MESSAGE_STATUS_OK))
{
goto server_error; |
|
I'm wondering if does make sense to move the Besides, @jesperpedersen , why don't make a macro that expands into |
can tell me about this more like what should i name my macro? |
|
@fluca1978 Either that or an explicit |
I would rather add the macro and the test within the function itself, something like (pseudo-code): pros:
cons:
Otherwise, define something like (pseudo-code): so that using |
|
@fluca1978 In file included from /home/ankush/pgagroal/src/libpgagroal/pipeline_perf.c:32:
/home/ankush/pgagroal/src/libpgagroal/pipeline_perf.c: In function ‘performance_client’:
/home/ankush/pgagroal/src/include/logging.h:71:34: error: statement with no effect [-Werror=unused-value]
71 | #define PGAGROAL_LOG_POSTGRES(x) 1
| ^
/home/ankush/pgagroal/src/libpgagroal/pipeline_perf.c:123:4: note: in expansion of macro ‘PGAGROAL_LOG_POSTGRES’
123 | PGAGROAL_LOG_POSTGRES(msg);
| ^~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [src/CMakeFiles/pgagroal.dir/build.make:289: src/CMakeFiles/pgagroal.dir/libpgagroal/pipeline_perf.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1155: src/CMakeFiles/pgagroal.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
[ankush@cognitive build]can i use this ((void)(x)) in place of 1?? #ifdef DEBUG
#define PGAGROAL_LOG_POSTGRES(x) pgagroal_log_postgres(x)
#else
#define PGAGROAL_LOG_POSTGRES(x) ((void)(x))
#endif |
If it works, yes, please use it. |
|
@ankushT369 commit 0f96b0b is fine for me, except the problem with |
|
@fluca1978 Once @Userfrom1995 has signed-off on it, do a deep review, and then merge |
Ok for me, so far it seems ok right now, but I'll wait for any news from @Userfrom1995 . For example something like: and then force push the changes. |
|
@ankushT369 we were both using the wrong For ubuntu, I did the following to install |
|
@fluca1978 is this ok ?? do i have to squash ? |
|
@ankushT369 Yes, always squash |
|
@jesperpedersen this is off topic can I work on other existing issue or the tests you are talking about test cases for io_uring and epoll scenarios using one or more connections.? |
|
@ankushT369 The focus right now is to get epoll, io_uring and kqueue bug-free. So, write test cases that uses the the different pipelines and use your function to list the problems found |
|
@ankushT369 See CI |
|
I'm going to reiview this soon. |
|
@ankushT369 looks good to me, but:
@jesperpedersen once commit b53bc0e has been fixed, can we merge or do you want some extra review? |
|
@fluca1978 Yes, start merging - we need clang-format 21+ |
like i have to reset only ?? or like change any file too?? |
c831444 to
33cbaa9
Compare
|
@ankushT369 commit 33cbaa9 is almost fine, but:
Adjust the above and I'll merge today, thanks. |
|
Exactly! |
|
Merged with a clean commit message. Thanks for your contribution @ankushT369 |
I learned a lot .... Thanks to you people!!! I am looking forward for more contribution. |
|
@ankushT369 sorry, I just found you are not listed in |
Fixes #661
This PR adds the lightweight PostgreSQL wire-protocol message validation and logging to improve debugging of dump-related issues.
Changes
Introduced pgagroal_log_postgresql(void*, size_t) helper.
Added dynamic buffer appending functionality.
Logs when a valid message type is encountered (e.g. 'd').
Note
This functionality is indented for debugging purposes only.