Skip to content

Conversation

@TakaHiR07
Copy link
Contributor

@TakaHiR07 TakaHiR07 commented Aug 7, 2023

Motivation

#14248 fix rackaware placement policy does not take effect after delete rack configuration. But after #16825, this fix is ineffective when delete zk rack info after bkclient initialize.

The reason is "register available bookie" is after "BookieRackAffinityMapping#setConf" in bookieClient constructor. So updateRacksWithHost(racksWithHost) would throw BookieIdNotResolvedException, make "racksWithHost" become null. "racksWithHost" is updated until "watchAvailableBookies()" listener is trigger.

I add some log of unittest testRackUpdate() to show the order of bookieClient constructor:

2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - ZkBookieRackAffinityMapping setConf(), racksWithHost:{default={10.0.0.1:56962=BookieInfoImpl(rack=rack-0, hostname=bookie-1), 10.0.0.2:56964=BookieInfoImpl(rack=rack-0, hostname=bookie-2), 10.0.0.3:56966=BookieInfoImpl(rack=rack-0, hostname=bookie-3), 10.0.0.4:56969=BookieInfoImpl(rack=rack-1, hostname=bookie-4), 10.0.0.5:56971=BookieInfoImpl(rack=rack-1, hostname=bookie-5), 10.0.0.6:56973=BookieInfoImpl(rack=rack-1, hostname=bookie-6)}}
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.1:56962, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.1:56962 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.1:56962, bookie does not exist or it is not running
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.2:56964, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.2:56964 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.2:56964, bookie does not exist or it is not running
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.3:56966, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.3:56966 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.3:56966, bookie does not exist or it is not running
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.4:56969, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,540+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.4:56969 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.4:56969, bookie does not exist or it is not running
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.5:56971, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.5:56971 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.5:56971, bookie does not exist or it is not running
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 10.0.0.6:56973, bookie is unknown org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - Network address for 10.0.0.6:56973 is unresolvable yet. error is org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId 10.0.0.6:56973, bookie does not exist or it is not running
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - ZkBookieRackAffinityMapping setConf() after updateRacksWithHost(), racksWithHost:{}
2023-08-07T15:47:23,541+0800 [main] WARN  org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Failed to resolve network location for 127.0.0.1, using default rack for it : /default-rack.
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Initialize rackaware ensemble placement policy @ <Bookie:127.0.0.1:0> @ /default-rack : org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping.
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Not weighted
2023-08-07T15:47:23,541+0800 [main] INFO  org.apache.bookkeeper.client.BookKeeper - Weighted ledger placement is not enabled
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.5:56971 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.5:56971, port=56971, host=10.0.0.5, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.5:56971 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.5:56971, port=56971, host=10.0.0.5, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.2:56964 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.2:56964, port=56964, host=10.0.0.2, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.2:56964 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.2:56964, port=56964, host=10.0.0.2, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.6:56973 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.6:56973, port=56973, host=10.0.0.6, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.6:56973 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.6:56973, port=56973, host=10.0.0.6, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.4:56969 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.4:56969, port=56969, host=10.0.0.4, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.4:56969 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.4:56969, port=56969, host=10.0.0.4, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.3:56966 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.3:56966, port=56966, host=10.0.0.3, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.3:56966 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.3:56966, port=56966, host=10.0.0.3, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.1:56962 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.1:56962, port=56962, host=10.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [metadata-store-172-1] INFO  org.apache.pulsar.metadata.bookkeeper.PulsarRegistrationClient - Update BookieInfoCache (writable bookie) 10.0.0.1:56962 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=10.0.0.1:56962, port=56962, host=10.0.0.1, protocol=bookie-rpc, auth=[], extensions=[]}]}
2023-08-07T15:47:23,545+0800 [pulsar-registration-client-209-1] INFO  org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping - watchAvailableBookies() trigger racksWithHost:{default={10.0.0.1:56962=BookieInfoImpl(rack=rack-0, hostname=bookie-1), 10.0.0.2:56964=BookieInfoImpl(rack=rack-0, hostname=bookie-2), 10.0.0.3:56966=BookieInfoImpl(rack=rack-0, hostname=bookie-3), 10.0.0.4:56969=BookieInfoImpl(rack=rack-1, hostname=bookie-4), 10.0.0.5:56971=BookieInfoImpl(rack=rack-1, hostname=bookie-5), 10.0.0.6:56973=BookieInfoImpl(rack=rack-1, hostname=bookie-6)}}
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-1/10.0.0.5:56971
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-0/10.0.0.2:56964
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-1/10.0.0.6:56973
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-1/10.0.0.4:56969
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-0/10.0.0.3:56966
2023-08-07T15:47:23,547+0800 [pulsar-registration-client-209-1] INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node: /rack-0/10.0.0.1:56962

Modifications

  1. modify the execute order in BookieRackAffinityMapping#setConf
  2. improve the testRackUpdate() to test this case.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: TakaHiR07#12

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 7, 2023
@TakaHiR07
Copy link
Contributor Author

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Sep 14, 2023
@TakaHiR07 TakaHiR07 force-pushed the fix_rackaware_policy_ineffective_after_delete_zk_rackInfo branch from e60964b to 8930c21 Compare September 19, 2023 08:37
@hangc0276 hangc0276 added type/bug The PR fixed a bug or issue reported a bug area/broker release/3.0.2 release/2.11.3 release/2.10.6 release/3.1.1 category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost ready-to-test and removed Stale labels Sep 19, 2023
@hangc0276 hangc0276 added this to the 3.2.0 milestone Sep 19, 2023
Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@horizonzy
Copy link
Member

/pulsarbot run-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.80%. Comparing base (57fbee4) to head (8930c21).
⚠️ Report is 1778 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20944       +/-   ##
=============================================
- Coverage     72.97%   36.80%   -36.18%     
+ Complexity    32157      374    -31783     
=============================================
  Files          1868     1698      -170     
  Lines        139164   130430     -8734     
  Branches      15314    14250     -1064     
=============================================
- Hits         101555    47999    -53556     
- Misses        29562    76104    +46542     
+ Partials       8047     6327     -1720     
Flag Coverage Δ
inttests 24.16% <100.00%> (?)
systests 24.69% <100.00%> (-0.36%) ⬇️
unittests 31.95% <100.00%> (-40.43%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ookie/rackawareness/BookieRackAffinityMapping.java 70.00% <100.00%> (-10.72%) ⬇️

... and 1453 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codelipenghui codelipenghui merged commit d9ebaf5 into apache:master Oct 7, 2023
poorbarcode pushed a commit that referenced this pull request Oct 7, 2023
…o after bkclient initialize (#20944)

(cherry picked from commit d9ebaf5)
liangyuanpeng pushed a commit to liangyuanpeng/pulsar that referenced this pull request Oct 11, 2023
vinayakmalik95 pushed a commit to tmdc-io/pulsar that referenced this pull request Oct 12, 2023
Technoboy- pushed a commit that referenced this pull request Oct 19, 2023
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 22, 2023
…o after bkclient initialize (apache#20944)

(cherry picked from commit d9ebaf5)
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 24, 2023
…o after bkclient initialize (apache#20944)

(cherry picked from commit d9ebaf5)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
nodece pushed a commit to nodece/pulsar that referenced this pull request Aug 15, 2024
…o after bkclient initialize (apache#20944)

(cherry picked from commit d9ebaf5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.10.7 release/2.11.3 release/3.0.2 release/3.1.2 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants