Saving and loading a trained model

I run this using `run.sh` and trained a classification model using Spark ML. After training, I wanted to save the model.

I tried `model.write().overwrite().save('spark-model')`. This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.

Then I checked if they are in workers' files and they were in `/home/jovyan/work` in workers' file system:
![image](https://github.com/RWaltersMA/mongo-spark-jupyter/assets/26221221/2c553a00-e8bd-4267-83d5-1af96a6d3d65)

When I collect the files into one place and tried to load the model using `PipelineModel.load`, I get this error: 
```
----> [3](vscode-notebook-cell:/home/emre/etiya/stuff/mongo-spark-jupyter/Untitled.ipynb#Y113sZmlsZQ%3D%3D?line=2) pipeline_model = PipelineModel.load('spark-model')

File [/usr/local/spark/python/pyspark/ml/util.py:332](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:332), in MLReadable.load(cls, path)
    329 @classmethod
    330 def load(cls, path):
    331     """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 332     return cls.read().load(path)

File [/usr/local/spark/python/pyspark/ml/pipeline.py:256](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/pipeline.py:256), in PipelineModelReader.load(self, path)
    255 def load(self, path):
--> 256     metadata = DefaultParamsReader.loadMetadata(path, self.sc)
    257     if 'language' not in metadata['paramMap'] or metadata['paramMap']['language'] != 'Python':
    258         return JavaMLReader(self.cls).load(path)

File [/usr/local/spark/python/pyspark/ml/util.py:525](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:525), in DefaultParamsReader.loadMetadata(path, sc, expectedClassName)
    514 """
    515 Load metadata saved using :py:meth:`DefaultParamsWriter.saveMetadata`
    516 
   (...)
    522     If non empty, this is checked against the loaded metadata.
    523 """
    524 metadataPath = os.path.join(path, "metadata")
--> 525 metadataStr = sc.textFile(metadataPath, 1).first()
    526 loadedVals = DefaultParamsReader._parseMetaData(metadataStr, expectedClassName)
    527 return loadedVals

File [/usr/local/spark/python/pyspark/rdd.py:1591](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/rdd.py:1591), in RDD.first(self)
   1589 if rs:
   1590     return rs[0]
-> 1591 raise ValueError("RDD is empty")

ValueError: RDD is empty
```

How can I save and load the models without issues? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving and loading a trained model #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Saving and loading a trained model #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions