Replies: 1 comment 1 reply
-
Hi @JGAUG26, I'm having the same issue, how do you resolve it? Thanks! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm encountering a binary incompatibility issue between scikit-learn and numpy while running a script in a Python environment. The error occurs when trying to import CountVectorizer from sklearn.feature_extraction.text. Below is the error traceback:
Traceback (most recent call last):
File "/tmp/dolphinscheduler/exec/process/root/1/14927164157472_1/564/683/py_564_683.py", line 13, in
from sklearn.feature_extraction.text import CountVectorizer
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/utils/init.py", line 19, in
from .murmurhash import murmurhash3_32
File "sklearn/utils/murmurhash.pyx", line 1, in init sklearn.utils.murmurhash
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Environment Details:
numpy version: 1.26.4
scikit-learn version: 1.0
Python version: 3.9
OS: Linux (running on server via dolphinscheduler)
I have tried upgrading scikit-learn and downgrading numpy to older versions, but the issue persists. I've also attempted recompiling scikit-learn using --no-binary :all: but that did not resolve the problem.
Code Snippet (for context):
Here's a simplified version of the operator I'm running, which uses whisper for audio transcription (note that the issue does not directly relate to this code but rather to the environment setup):
import os
import whisper
def read_audio_file(file_path):
"""Reads the audio file."""
with open(file_path, "rb") as file:
return file.read()
def write_text_file(file_path, text):
"""Writes the transcribed text to the output file."""
with open(file_path, "w", encoding="utf-8") as file:
file.write(text)
def clean_files(nas_source_path, nas_converted_path, config: dict):
"""Processes audio files and transcribes them using Whisper."""
files = os.listdir(nas_source_path)
Can you suggest how I can resolve this binary incompatibility issue between numpy and scikit-learn? Any guidance or suggestions would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions