feat(pyth-lazer-agent) Allow deduplicating updates within each batch #2944

bplatak · 2025-08-11T22:32:48Z

Summary

Add an option to deduplicate updates within each of the batches before sending them over to lazer. The dedup logic will keep all distinct updates only removing any consecutive duplicates.

Rationale

Reduce pressure on the relayers for publishers that emit updates without any changes at frequency higher than the publish interval

How has this been tested?

Current tests cover my changes
Added new tests
Manually tested the code

linear · 2025-08-11T22:32:50Z

PF-366 Remove expired price updates in pyth-lazer-agent

vercel · 2025-08-11T22:32:53Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project	Deployment	Preview	Comments	Updated (UTC)
api-reference	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
component-library	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
developer-hub	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
entropy-debugger	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
entropy-explorer	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
insights	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
proposals	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm
staking	✅ Ready	Preview	Comment	Aug 13, 2025 2:18pm

keyvankhademi · 2025-08-11T23:19:54Z

apps/pyth-lazer-agent/config/config.toml

-relayer_urls = ["wss://relayer.pyth-lazer-staging.dourolabs.app/v1/transaction", "wss://relayer-1.pyth-lazer-staging.dourolabs.app/v1/transaction"]
-publish_keypair_path = "/path/to/solana/id.json"
+relayer_urls = ["ws://localhost:10001/v1/transaction"]
+publish_keypair_path = "/tmp/keypair.json"


in future we should have keys in the repo for local testing.

keyvankhademi · 2025-08-11T23:23:01Z

apps/pyth-lazer-agent/src/lazer_publisher.rs

+    let mut deduped_feed_updates = Vec::new();
+    let mut last_feed_update = HashMap::new();
+
+    // assume that feed_updates is already sorted by ts (within feed_update_id groups)


Suggested change

// assume that feed_updates is already sorted by ts (within feed_update_id groups)

// assume that feed_updates is already sorted by timestamp for each feed_update.feed_id

keyvankhademi · 2025-08-11T23:24:05Z

apps/pyth-lazer-agent/src/lazer_publisher.rs

+        if let Some(update) = feed_update.update.as_ref() {
+            if let Some(last_update) = last_feed_update.get(&feed_id) {
+                if update == last_update {
+                    continue;
+                }
+            }
+
+            deduped_feed_updates.push(feed_update.clone());
+            last_feed_update.insert(feed_id, update.clone());


I think we are keeping the update with the lowest timestamp, shouldn't we keep the highest timestamp?

I would insert all in the map and collect it to Vec at the end.

You can also use dedupe_by_key on the std vec.

shouldn't we keep the highest timestamp?

I don't think we should. My reasoning was that we generally care about lowest-latency so it's more fair to keep the first value we've seen and remove consecutive duplicates - that way we capture the 'earliest' timestamp in the batch (may not affect quality metrics but it's still a better representation of what the publisher does).

In contrast keeping "highest" timestamp is kind of like a "most recently seen" cache. If we do that, is there even a point in recording intra-batch history instead f just returning last seen?

What's your rationale for keeping the last rather than first? What's the benefit?

Given that this is intended for real-time consumption, I'd think we want to reflect the latest timestamp that this data is accurate as-of, but I'm not entirely clear. If so, yeah, for a single publisher stream, why not just send the most recently seen per batch.

why not just send the most recently seen per batch.

I don't want to censor data as much as I don't want to invent it. I'm OK with removing truly duplicate values but I'm not comfortable deleting things that actually contain information.

Maybe we should just do a simple deduplication on (feed_id, source_timestamp)?

IMHO the data is accurate "since" the first time it was seen, "until" the end of the aggregate window (at which point the publisher needs to retransmit unchanged-entries) or, notionally, until they send a changed value (for which we also need to know the earliest occurance). Knowing what the "most recent" occurrence of a value within an agg window doesn't give us any useful information. Knowing the "first time" the value showed up in a batch let's us better understand the timing characteristics of each publisher

The reason I was suggesting the latest timestamp was because of price expiry. When we do lowest timestamp, the price will expire faster than it should if they don't send another update. However, I think you are right about keeping the information about the earliest time a publisher sent us a new price is more important. Let's keep the lowest timestamp.

keyvankhademi

I think we should keep the latest timestamp

darunrs

Agree with your conclusions on earliest vs latest in the batch. You've got some small things to fix but overall LGTM!

darunrs · 2025-08-12T17:07:15Z

apps/pyth-lazer-agent/src/jrpc_handle.rs

+            enable_update_deduplication: false,
        };

        println!("{:?}", get_metadata(config).await.unwrap());


Is there a reason this is a println and not an info log?

we don't have tracing setup properly in unit tests making println the easiest choice. this is just for that one manual test that's only ever run locally during dev to grab example data

feat(pyth-lazer-agent) Allow deduplicating updates within each batch

4426d98

bplatak requested a review from merolish as a code owner August 11, 2025 22:32

vercel bot deployed to Preview – entropy-debugger August 11, 2025 22:37 View deployment

vercel bot deployed to Preview – insights August 11, 2025 22:38 View deployment

vercel bot deployed to Preview – api-reference August 11, 2025 22:38 View deployment

vercel bot deployed to Preview – developer-hub August 11, 2025 22:38 View deployment

fmt fix

fd94b29

vercel bot deployed to Preview – component-library August 11, 2025 22:43 View deployment

vercel bot deployed to Preview – insights August 11, 2025 22:43 View deployment

vercel bot deployed to Preview – developer-hub August 11, 2025 22:45 View deployment

vercel bot deployed to Preview – entropy-explorer August 11, 2025 22:45 View deployment

vercel bot deployed to Preview – api-reference August 11, 2025 22:47 View deployment

vercel bot deployed to Preview – proposals August 11, 2025 22:47 View deployment

vercel bot deployed to Preview – entropy-debugger August 11, 2025 22:47 View deployment

vercel bot deployed to Preview – staking August 11, 2025 22:49 View deployment

keyvankhademi reviewed Aug 11, 2025

View reviewed changes

keyvankhademi requested changes Aug 11, 2025

View reviewed changes

keyvankhademi approved these changes Aug 12, 2025

View reviewed changes

darunrs approved these changes Aug 12, 2025

View reviewed changes

replace dedup with dedup_by_key

3d1689c

vercel bot deployed to Preview – entropy-explorer August 13, 2025 14:15 View deployment

vercel bot deployed to Preview – developer-hub August 13, 2025 14:15 View deployment

vercel bot deployed to Preview – staking August 13, 2025 14:15 View deployment

vercel bot deployed to Preview – api-reference August 13, 2025 14:16 View deployment

vercel bot deployed to Preview – component-library August 13, 2025 14:17 View deployment

vercel bot deployed to Preview – insights August 13, 2025 14:18 View deployment

bplatak merged commit 2a543d4 into main Aug 13, 2025
9 of 10 checks passed

vercel bot deployed to Preview – proposals August 13, 2025 14:18 View deployment

bplatak deleted the feat/pf-366/dedup-updates-1 branch August 13, 2025 14:18

	// assume that feed_updates is already sorted by ts (within feed_update_id groups)
	// assume that feed_updates is already sorted by timestamp for each feed_update.feed_id

feat(pyth-lazer-agent) Allow deduplicating updates within each batch #2944

feat(pyth-lazer-agent) Allow deduplicating updates within each batch #2944

Uh oh!

Conversation

bplatak commented Aug 11, 2025

Summary

Rationale

How has this been tested?

Uh oh!

linear bot commented Aug 11, 2025

Uh oh!

vercel bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bplatak Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bplatak Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyvankhademi left a comment

Choose a reason for hiding this comment

Uh oh!

darunrs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vercel bot commented Aug 11, 2025 •

edited

Loading

bplatak Aug 12, 2025 •

edited

Loading

bplatak Aug 12, 2025 •

edited

Loading