Skip to content

[Bug]: [Python SDK] Beam.FlatMap doesn't correctly parse TypeVar typehints #34732

@hjtran

Description

@hjtran

What happened?

If you use a TypeVar for type hints in the callable passed to FlatMap, it sometimes will not infer the correct output type hint. Example:

@typehints.with_input_types(str)
class FooTransform(beam.PTransform):
  def expand(self, pcoll):
    return pcoll      

T = typing.TypeVar('T')

def list_str_identity(s: T) -> T:
  return s    

with beam.Pipeline() as p:

  (p | beam.Create([['hello world']])
     | beam.FlatMap(list_str_identity)
     | FooTransform()     
     | beam.LogElements())

This should log "hello world", but instead results in the following error:

utput: Traceback (most recent call last):
  File "/opt/playground/backend/executable_files/99dcf1c5-5131-46fe-b66a-f572a0d45ded/99dcf1c5-5131-46fe-b66a-f572a0d45ded.py", line 44, in <module>
    (p | beam.Create([['hello world']])
  File "/usr/local/lib/python3.10/site-packages/apache_beam/pvalue.py", line 138, in __or__
    return self.pipeline.apply(ptransform, self)
  File "/usr/local/lib/python3.10/site-packages/apache_beam/pipeline.py", line 781, in apply
    transform.type_check_inputs(pvalueish)
  File "/usr/local/lib/python3.10/site-packages/apache_beam/transforms/ptransform.py", line 465, in type_check_inputs
    self.type_check_inputs_or_outputs(pvalueish, 'input')
  File "/usr/local/lib/python3.10/site-packages/apache_beam/transforms/ptransform.py", line 494, in type_check_inputs_or_outputs
    raise TypeCheckError(
apache_beam.typehints.decorators.TypeCheckError: Input type hint violation at FooTransform: expected <class 'str'>, got List[<class 'str'>]

The default callable for FlatMap is an identity function typed with T = TypeVar("T"), which means this incorrect type hint comes up a lot

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions