-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
I run this using run.sh and trained a classification model using Spark ML. After training, I wanted to save the model.
I tried model.write().overwrite().save('spark-model'). This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.
Then I checked if they are in workers' files and they were in /home/jovyan/work in workers' file system:

When I collect the files into one place and tried to load the model using PipelineModel.load, I get this error:
----> [3](vscode-notebook-cell:/home/emre/etiya/stuff/mongo-spark-jupyter/Untitled.ipynb#Y113sZmlsZQ%3D%3D?line=2) pipeline_model = PipelineModel.load('spark-model')
File [/usr/local/spark/python/pyspark/ml/util.py:332](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:332), in MLReadable.load(cls, path)
329 @classmethod
330 def load(cls, path):
331 """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 332 return cls.read().load(path)
File [/usr/local/spark/python/pyspark/ml/pipeline.py:256](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/pipeline.py:256), in PipelineModelReader.load(self, path)
255 def load(self, path):
--> 256 metadata = DefaultParamsReader.loadMetadata(path, self.sc)
257 if 'language' not in metadata['paramMap'] or metadata['paramMap']['language'] != 'Python':
258 return JavaMLReader(self.cls).load(path)
File [/usr/local/spark/python/pyspark/ml/util.py:525](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:525), in DefaultParamsReader.loadMetadata(path, sc, expectedClassName)
514 """
515 Load metadata saved using :py:meth:`DefaultParamsWriter.saveMetadata`
516
(...)
522 If non empty, this is checked against the loaded metadata.
523 """
524 metadataPath = os.path.join(path, "metadata")
--> 525 metadataStr = sc.textFile(metadataPath, 1).first()
526 loadedVals = DefaultParamsReader._parseMetaData(metadataStr, expectedClassName)
527 return loadedVals
File [/usr/local/spark/python/pyspark/rdd.py:1591](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/rdd.py:1591), in RDD.first(self)
1589 if rs:
1590 return rs[0]
-> 1591 raise ValueError("RDD is empty")
ValueError: RDD is empty
How can I save and load the models without issues? Thanks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels