-
Notifications
You must be signed in to change notification settings - Fork 362
Description
- SW cluster 1 started (multiple nodes) by user 1 with flow UI service on a certain port (for example, 000.000.000.001::54321).
- For some reason (could be timeout, oom etc.), the SW cluster 1 was dead and 000.000.000.001::54321 was released.
- In Spectrum Conductor, the status of the cluster 1 is still "started" with the flow UI link (000.000.000.001::54321).
- SW cluster 2 started by user 2 and it took 000.000.000.001::54321 and assigned flow UI service to this port.
- Now user 1 and user 2 will see the same cluster from Spectrum Conductor with flow UI service on 000.000.000.001::54321.
Sparkling Water Context:
- Sparkling Water Version: 3.40.0.1-1-2.4
- H2O name: k023042
- cluster size: 6
- list of used nodes:
(executorId, host, port)
(0,10.119.198.87,54323)
(1,10.119.198.87,54325)
(2,10.119.198.88,54323)
(3,10.119.198.88,54325)
(4,10.119.198.173,54325)
(5,10.119.198.173,54335)
Open H2O Flow in browser: https://ppvra00a0011.osds..net:54325 (CMD + click in Mac OSX)
I suspect Flow UI crashed for some reason and port 54323 is released at Feb/20 05:02:30.
H2OContext has been closed! Please create a new H2OContext to a healthy and reachable (web enabled)
H2O cluster.
at ai.h2o.sparkling.H2OContext$$anon$1.run(H2OContext.scala:359)
Caused by: ai.h2o.sparkling.backend.exceptions.RestApiNotReachableException: H2O node https://10.119.198.87:54323 is not reachable.
AIMD H2O notebook starts at Feb/21 08:11:31, UI Flow binds to freed port 54323.
Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:
- Sparkling Water/PySparkling/RSparkling version
- Hadoop Version & Distribution
- Execution mode
YARN-client,YARN-cluster, standalone, local .. - YARN logs in case of running on yarn. To collect such a logs you may run
yarn logs -applicationId <application ID>where the application ID is displayed when Sparkling Water is started - H2O & Spark logs if not running on YARN. You can find these logs in Spark work directory
- Are you using Windows/Linux/MAC?
- Spark & Sparkling Water configuration including the memory configuration
Please also provide us with the full and minimal reproducible code.