Skip to content

Pyspark Driver Integration errors out with py4j.Py4JException: Method attemptId([]) does not exist #1099

@amCap1712

Description

@amCap1712

Environment

How do you use Sentry?
Self-hosted - 9.1.2

Which SDK and version?
sentry-sdk[pyspark] == 0.20.3

Steps to Reproduce

I setup the Pyspark Integration as described in the official docs. I have only added the Driver integration currently. As I have not added the worker integration, I am also not adding the daemon configuration to the spark-submit script.

Expected Result

Sentry correctly captures and reports the errors.

Actual Result

The log is filled with errors. The crux of the error seems to be py4j.Py4JException: Method attemptId([]) does not exist. I have attached two logs here https://gist.github.com/amCap1712/6000892a940b7c004dad28060ddfd90d . One is when running on Spark 2.4.5 and other with Spark 3.1.1. Also, sentry captures this error which seems to occur while its connecting the integration and reports it.

I'll be happy to assist as much as I can to debug and solve this issue.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions