Skip to content

incompatible sklearn/joblib version? #100

@yeus

Description

@yeus

Hi everyone,

First of all great project ;). I am trying to get dragnet to run but have a problem with loading the pickled models. This is probably due to a version conflict in either joblib, numpy or sklearn. At least that's what I assume due to this blogpost:

https://stackoverflow.com/questions/48948209/keyerror-when-loading-pickled-scikit-learn-model-using-joblib

My own versions of sklearn and joblib and numpy are:

sklearn.__version__
Out: '0.19.1'

from sklearn.externals import joblib
joblib.__version__
Out: '0.14.1'

import numpy
numpy.__version__
Out: '1.17.4'

I think that probably this code section:

https://github.com/dragnet-org/dragnet/blob/master/dragnet/compat.py#L265

takes care of loading the different pickle modules regarding the correct version. It doesn't say anything regarding joblib though. On my system (ubuntu/python3, sklearn installed with pip) sklearn makes use of the system-wide joblib version. So

import joblib == import sklearn.external.joblib

I hope that you can help me maybe I can even contribute a little to the project. Which versions of joblib & sklearn & numpy should I try to use? here is the error:

content = extract_content(doc.summary())
Traceback (most recent call last):

  File "<ipython-input-5-882782be121c>", line 1, in <module>
    content = extract_content(doc.summary())

  File "/home/tom/.local/lib/python3.6/site-packages/dragnet/__init__.py", line 12, in extract_content
    'kohlschuetter_readability_weninger_content_model.pkl.gz')

  File "/home/tom/.local/lib/python3.6/site-packages/dragnet/util.py", line 168, in load_pickled_model
    return joblib.load(filepath)

  File "/home/tom/.local/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 605, in load
    obj = _unpickle(fobj, filename, mmap_mode)

  File "/home/tom/.local/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle
    obj = unpickler.load()

  File "/usr/lib/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)

KeyError: 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions