news-crawl 2.x Broken when using multiple workers (across multiple hosts) #62
Closed
alextechnology
started this conversation in
General
Replies: 1 comment
-
|
Have created an issue instead #63 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I spent today merging our local news-crawl codebase to the 2.x branch. We run only 1 nimbus and 2 supervisors. While the topology seems to work fine with a single worker, it does not work at all with two workers despite all the configuration being the same and working fine on news-crawl 1.2.4.
The first issue seems to be related to local worker not being able to connect to the remote supervisor/worker. Once the topology is submitted to storm, the worker log will scroll for over a minute with 140+ attempts like the following:
2023-12-08 21:14:17.456 o.a.s.m.n.Client client-worker-1 [ERROR] connection attempt 147 to Netty-Client-xxxxxxxxxx failed: org.apache.storm.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xxxxxxxxxxxxxxxxIt will eventually start just trying to do the crawl but there will be lots of the followings kinds of logs indicating failures:
2023-12-08 21:14:20.720 o.a.s.m.n.Client client-worker-1 [ERROR] failed to send 1 messages to Netty-Client-xxxxxxxxx java.nio.channels.ClosedChannelExceptionAND
The second issue (which I assume is related to the first) is this will inevitably lead to an exception with the java.util.ConcurrentModificationException error and crash the worker:
Thankfully, a single topology worker appears to run much faster in 2.x than 2 workers did in 1.2.4, so we're ok with that for the moment but would be nice to find out why this cannot run with 2 or more topology workers.
EDIT:
I forgot to mention that the remote worker/supervisor experiences similar errors, though it does receive the topology and tries to crawl the pages, but fails with dropped messages and during Worker-Transfer
Beta Was this translation helpful? Give feedback.
All reactions