-
Notifications
You must be signed in to change notification settings - Fork 82
Condense Exception traceback when import torchcodec fails
#1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| f"FFmpeg version {ffmpeg_major_version} is likely not installed or its libraries cannot be found on this system.\n", | ||
| ) | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to test any import fails to hit this else case?
I do not recall the particulars of the stack trace, but if a stack trace begins with an OSError, but later lists a root cause after The above exception was the direct cause of the following exception, we will append the condensed OSError exception instead of the potentially more informative root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scenario you mention above where we have an OSError and then an another exception stacked on top of it, we will only condense the error message if that other exception is of the form
OSError: libavcodec.so.60: cannot open shared object file: No such file or directory
i.e. only if it directly relates to libav* libraries.
In other words, we're only condensing this
FFmpeg version 7:
Traceback (most recent call last):
File "/home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/site-packages/torch/_ops.py", line 1487, in load_library
ctypes.CDLL(path)
File "/home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libavcodec.so.61: cannot open shared object file: No such file or directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/site-packages/torchcodec/_core/ops.py", line 57, in load_torchcodec_shared_libraries
torch.ops.load_library(core_library_path)
File "/home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/site-packages/torch/_ops.py", line 1489, in load_library
raise OSError(f"Could not load this library: {path}") from e
OSError: Could not load this library: /home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/site-packages/torchcodec/libtorchcodec_core7.so
into this:
FFmpeg version 7:
Got the following exception: OSError: libavcodec.so.61: cannot open shared object file: No such file or directory
FFmpeg version 7 is likely not installed or its libraries cannot be found on this system.
Any other import error will not be condensed and will hit this else branch, showing the full logs.
| if ( | ||
| isinstance(e, OSError) | ||
| and ("Could not load this library") in str(e) | ||
| and "libtorchcodec" in (str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To define missing_ffmpeg_lib above, we search the error message line by line, but here we are searching the entire string for the two elements of the error message. Is there some reason for that?
It seems Could not load this library and libtorchcodec appear on the same line, so reusing the pattern might help to clarify the conditions we check to add a condensed error message.
...
raise OSError(f"Could not load this library: {path}") from e
OSError: Could not load this library: /home/nicolashug/.opt/miniconda3/envs/codec/lib/python3.11/site-packages/torchcodec/libtorchcodec_core6.so
Dan-Flores
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good! I believe the failed CPP tests are unrelated, but please double check
In #1138 we decided to provide the full traceback to the user when
import torchcodecfails. This leads to the printed debug info to be really long and difficult to parse.This PR is a first attempt at condensing the error log to make them more informative and actionable. I think what I wrote is mostly specific to Linux for now. This is just a temporary fix, the long term solution would be to have a log system with verbosity level.
The main principle in this PR is that the new logic should not hide any information that was previously printed - only condense it. I think this principle is respected, but would appreciate more eyes on it.
New logs:
Previous logs: