Skip to content

Standardization can fail with Task not serializable when casting to date is requested #57

@yruslan

Description

@yruslan

Describe the bug

When conversion from an integer to a date in format of yyyyMM (possibly YYYYMM), Standardization had thrown a big error message starting from:

Job aborted. (Job aborted due to stage failure: Task not serializable: = java.io.NotSerializableException: za.co.absa.standardization.config.Default= StandardizationConfig$ Serialization stack: - object not serializable (class: za.co.absa.standardization.config.Defaul= tStandardizationConfig$, value: za.co.absa.standardization.config.DefaultSt= andardizationConfig$@3ba3e058) - element of array (index: 3) - array (class [Ljava.lang.Object;, size 5) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, typ= e: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[captur= ingClass=3Dclass za.co.absa.standardization.udf.UDFBuilder$, functionalInte= rfaceMethod=3Dscala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,= implementation=3DinvokeStatic za/co/absa/standardization/udf/UDFBuilder$.$= anonfun$stringUdfViaNumericParser$1:(Lza/co/absa/standardization/types/pars= ers/NumericParser;ZLjava/lang/String;Lza/co/absa/standardization/config/Sta= ndardizationConfig;Lscala/Option;Ljava/lang/String;)Lza/co/absa/standardiza= tion/udf/UDFResult;, instantiatedMethodType=3D(Ljava/lang/String;)Lza/co/ab= sa/standardization/udf/UDFResult;, numCaptured=3D5]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class za.co.absa.standardization.udf.UDFBuilder$$$Lambda$5214/36= 0225670,

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Go to '...'
  2. Use value '...'
  3. Run using
  4. See error

Try also to have an integer type with metadata:

{"name": "MY_DATE", "type": "integer", "metadata": {"pattern": "yyyyMM", "timezone": "Africa/Johannesburg"}, "nullable": true}

Expected behavior

  • The standardization should have data errors if incorrect date format is passed, not failing the pipeline.
  • If the above is not an option, at least, the error message should say which coulmn or condition have caused the error.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Versions of libraries (Spark, Scala, ...)
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions