Skip to content

Saving and loading a trained model #7

@emremrah

Description

@emremrah

I run this using run.sh and trained a classification model using Spark ML. After training, I wanted to save the model.

I tried model.write().overwrite().save('spark-model'). This creates a spark-model directory but only saves the "_SUCCESS" files in it; no actual model fies were saved.

Then I checked if they are in workers' files and they were in /home/jovyan/work in workers' file system:
image

When I collect the files into one place and tried to load the model using PipelineModel.load, I get this error:

----> [3](vscode-notebook-cell:/home/emre/etiya/stuff/mongo-spark-jupyter/Untitled.ipynb#Y113sZmlsZQ%3D%3D?line=2) pipeline_model = PipelineModel.load('spark-model')

File [/usr/local/spark/python/pyspark/ml/util.py:332](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:332), in MLReadable.load(cls, path)
    329 @classmethod
    330 def load(cls, path):
    331     """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 332     return cls.read().load(path)

File [/usr/local/spark/python/pyspark/ml/pipeline.py:256](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/pipeline.py:256), in PipelineModelReader.load(self, path)
    255 def load(self, path):
--> 256     metadata = DefaultParamsReader.loadMetadata(path, self.sc)
    257     if 'language' not in metadata['paramMap'] or metadata['paramMap']['language'] != 'Python':
    258         return JavaMLReader(self.cls).load(path)

File [/usr/local/spark/python/pyspark/ml/util.py:525](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/ml/util.py:525), in DefaultParamsReader.loadMetadata(path, sc, expectedClassName)
    514 """
    515 Load metadata saved using :py:meth:`DefaultParamsWriter.saveMetadata`
    516 
   (...)
    522     If non empty, this is checked against the loaded metadata.
    523 """
    524 metadataPath = os.path.join(path, "metadata")
--> 525 metadataStr = sc.textFile(metadataPath, 1).first()
    526 loadedVals = DefaultParamsReader._parseMetaData(metadataStr, expectedClassName)
    527 return loadedVals

File [/usr/local/spark/python/pyspark/rdd.py:1591](https://file+.vscode-resource.vscode-cdn.net/usr/local/spark/python/pyspark/rdd.py:1591), in RDD.first(self)
   1589 if rs:
   1590     return rs[0]
-> 1591 raise ValueError("RDD is empty")

ValueError: RDD is empty

How can I save and load the models without issues? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions