sparkdl.xgboost getting stuck trying to map partitions

I am running the following code to try to fit a model

```
from sparkdl.xgboost import XgboostClassifier
param = {
    'num_workers': 4, # number of workers on the cluster, adjust as needed
  'missing': 0,
    "objective": "binary:logistic",
    "eval_metric": "logloss",
      'featuresCol':"features", 
      'labelCol':"objective",
      'nthread':32 # equal to the number of cpus on each worker machine
}
  
train, test = data.randomSplit([0.001, 0.001])
xgb_classifier = XgboostClassifier(**param)
xgb_clf_model = xgb_classifier.fit(train)
```

When I run the model training on my databricks cluster is seems to be getting stuck when it is trying to map partitions. 
It is using almost zero cpu on each cluster but the memory usage is slowly increasing.

![image](https://user-images.githubusercontent.com/55962786/183850809-9350aa7a-1e9d-4437-9731-47632fc48ea9.png)

is there anything I can do to get around this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sparkdl.xgboost getting stuck trying to map partitions #248

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sparkdl.xgboost getting stuck trying to map partitions #248

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions