Skip to content

Flaky test: [tcp routing] TCP Routing external ports with a second external port [It] maps both ports to the same application #1173

@jochenehret

Description

@jochenehret

The TCP Routing test that checks if one app can be reached from two ports is failing often here:

Expect(err).ToNot(HaveOccurred())

Example failures:
https://concourse.wg-ard.ci.cloudfoundry.org/teams/main/pipelines/cf-deployment/jobs/fips-cats/builds/82
https://concourse.wg-ard.ci.cloudfoundry.org/teams/main/pipelines/cf-deployment/jobs/fips-cats/builds/57
https://concourse.wg-ard.ci.cloudfoundry.org/teams/main/pipelines/cf-deployment/jobs/fips-cats/builds/113
https://concourse.wg-ard.ci.cloudfoundry.org/teams/main/pipelines/cf-deployment/jobs/fips-cats/builds/120

I've recreated the test setup manually on fips/snape. The setup works as expected: You can send data over two different TCP ports to the test app and the app responds as expected. Running the test in the CATs suite however fails often.

I've added some debug statements with timestamps. Here's the flow from a failed run:

# sending first test message to first port
# https://github.com/cloudfoundry/cf-acceptance-tests/blob/6f060209f7a55f0c4f8d0fffabb122c785ce914e/cats_suite_helpers/cats_suite_helpers.go#L406
starting SendAndReceive(tcp.cf.snape.env.wg-ard.ci.cloudfoundry.org, 1031) at Jul 11 14:49:45.862

# output from test app: https://github.com/cloudfoundry/cf-acceptance-tests/blob/6f060209f7a55f0c4f8d0fffabb122c785ce914e/assets/tcp-listener/main.go#L53
# "10.0.32.11" is one of the two tcp-routers
2024-07-11T12:49:45.97+0000 [APP/PROC/WEB/0] OUT Message to 10.0.32.11:41084: server1:Time is 938260798
2024-07-11T12:49:45.99+0000 [APP/PROC/WEB/0] OUT Jul 11 14:49:45.991 (read) Closing connection to 10.0.32.11:41084: EOF

# sending second test message to other port
starting SendAndReceive(tcp.cf.snape.env.wg-ard.ci.cloudfoundry.org, 1026) at Jul 11 14:49:45.955

# now we are failing here when reading the response:
# https://github.com/cloudfoundry/cf-acceptance-tests/blob/6f060209f7a55f0c4f8d0fffabb122c785ce914e/cats_suite_helpers/cats_suite_helpers.go#L437
Jul 11 14:54:46.575 error3: EOF
buff is:

When the second message is sent, the conn.Write(message) statement returns no error:


However, the test app doesn't seem to receive the message. There is no "Message to" log statement. What happens next is an error at the conn.Read(buff) statement:

Error is "EOF" and the buffer is empty.

Looks like a race condition. The Read function is probably called before the test app starts to write and fails immediately with EOF?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions