Skip to content

Should the number of contexts correspond one-to-one with the number of threads? #759

@jin-zhengnan

Description

@jin-zhengnan

Describe the bug
In version 2.3.1 of Mercury, applications using the TensorFlow framework will create many threads, approximately around 100 within a process . If each thread corresponds to a context created in Mercury, the application crashes when it runs, displaying the following output.

[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.838843 WARNING: PRINT_BACKTRACE: get a signal(11), pid:2983066, tid:298456in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839184 WARNING: PRINT_BACKTRACE: symbols=0x7fb640080410 pid:2983066, tid:2984568 in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839222 WARNING: Call stack: in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839261 WARNING: /xxx/libxxx-client.so(releasexxxClient+0x82) [0x7fb8bc159882] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839290 WARNING: /usr/lib64/libc.so.6(+0x37400) [0x7fb8ba8ef400] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839319 WARNING: /xxx/mercurylib/libna.so.4(+0x1b6cb) [0x7fb8ba25e6cb] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839346 WARNING: /xxx/mercurylib/libmercury.so.2(+0x1084d) [0x7fb8ba48184d] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839373 WARNING: /xxx/mercurylib/libmercury.so.2(+0x1254a) [0x7fb8ba48354a] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839399 WARNING: /xxx/mercurylib/libmercury.so.2(HG_Core_progress+0x70) [0x7fb8ba48a840] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839433 WARNING: /xxx/mercurylib/libmercury.so.2(HG_Progress+0xe) [0x7fb8ba47a17e] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839460 WARNING: /xxx/libxxx-client.so(+0x5b301) [0x7fb8bc0ae301] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839486 WARNING: /xxx/mercurylib/libmercury_util.so.4(hg_request_wait+0xe6) [0x7fb8ba03e996] in client.c(488)
[0]:pid:2983066, tid:2984568, 08/21/24 15:26:31.839512 WARNING: /xxx/libxxx-client.so(send_cumemcpyhtodasync_v2+0x119) [0x7fb8bc0b3fa9] in client.c(488)

However, if only one context is created for all these threads, the application runs normally but with poor performance.
So, how many contexts should be created to be reasonable? Is there a recommended value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions