Skip to content

Comments

refactor(cost-model): use linear costing for valueContains#7476

Merged
zliu41 merged 4 commits intomasterfrom
yura/valuecontains-linear-costing
Jan 13, 2026
Merged

refactor(cost-model): use linear costing for valueContains#7476
zliu41 merged 4 commits intomasterfrom
yura/valuecontains-linear-costing

Conversation

@Unisay
Copy link
Contributor

@Unisay Unisay commented Dec 4, 2025

Summary

  • Change valueContains costing from multiplied_sizes to const_above_diagonal with linear_in_x_and_y inner model
  • Add LinearInXAndY constructor to SimpleJSON.hs for JSON parsing
  • Update cost model visualization to support new model types
  • Update V1/V2/V3 ParamName enums to match new cost model structure
  • Improve benchmark data distribution (1000+ points → 570 points)

Motivation

The true complexity of valueContains is O(m*log(n/m+1)) where n is the container size and m is the contained size. Kenneth MacKenzie showed that this is bounded above by a linear function (m+n)/k for some constant k.

The new model better reflects this:

  • Above diagonal (y > x): constant cost (early exit, returns False immediately)
  • Below diagonal (x >= y): intercept + slope1*x + slope2*y

Benchmark Improvements

Addressed data distribution issues identified in code review:

  • Reduced clustering: Below-diagonal sampling (numPolicies >= tokensPerPolicy) reduces oversampling at small sizes
  • Symmetric ranges: Both container and contained now reach size 2047 (removed 1000 cap)
  • Better coverage: 55 below-diagonal + 2 above-diagonal combinations (57 total)
  • Fewer points: ~570 benchmark points (down from ~1000) → faster runtime
  • Maintained structure: All sizes remain powers of 2 for worst-case saturated BST branches

Changes

File Change
models.R Update valueContainsModel to use const_above_diagonal with linear_in_x_and_y
SimpleJSON.hs Add LinearInXAndY constructor for JSON parsing
utils.js Add evaluators for linear_in_x_and_y and const_above_diagonal
Benchmarks/Values.hs Improve valueContainsArgs distribution strategy
benching-conway.csv Regenerated with improved benchmark data
builtinCostModel{A,B,C}.json Regenerated with new model type and improved benchmarks
V{1,2,3}/ParamName.hs Update ParamName enums from 2 to 4 CPU parameters to match new structure
Conformance tests Updated budget expectations to match new cost parameters

New Cost Model Structure

"valueContains": {
  "cpu": {
    "arguments": {
      "constant": 358864,
      "model": {
        "arguments": {
          "intercept": 706629,
          "slope1": 2015,
          "slope2": 28294
        },
        "type": "linear_in_x_and_y"
      }
    },
    "type": "const_above_diagonal"
  },
  "memory": {
    "arguments": 32,
    "type": "constant_cost"
  }
}

Parameter Update

The new cost model structure required updating the ParamName enums in V1, V2, and V3:

Old (2 CPU parameters):

  • ValueContains'cpu'arguments'intercept
  • ValueContains'cpu'arguments'slope

New (4 CPU parameters):

  • ValueContains'cpu'arguments'constant
  • ValueContains'cpu'arguments'model'arguments'intercept
  • ValueContains'cpu'arguments'model'arguments'slope1
  • ValueContains'cpu'arguments'model'arguments'slope2

This fixes extractCostModelParamsLedgerOrder failures that occurred when the parameter count didn't match the expected structure.

Test plan

  • Build passes
  • Ledger API plugin tests pass (V1/V2/V3 validators)
  • Conformance tests pass with updated budget expectations
  • Cost model visualization works with new model type

Closes https://github.com/IntersectMBO/plutus-private/issues/1984

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

PR Preview Action v1.6.3

🚀 View preview at
https://IntersectMBO.github.io/plutus/pr-preview/cost-models/pr-7476/

Built to branch gh-pages at 2026-01-13 17:02 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@Unisay Unisay requested a review from kwxm December 4, 2025 13:07
@Unisay Unisay self-assigned this Dec 4, 2025
@Unisay Unisay added the No Changelog Required Add this to skip the Changelog Check label Dec 4, 2025
Copy link
Member

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks quite reasonable, but I'll leave it to @kwxm

@Unisay
Copy link
Contributor Author

Unisay commented Dec 5, 2025

This PR replaces #7476

"model": {
"arguments": {
"intercept": 386227,
"slope1": 1970,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theoretical linear upper bound of k*(m+n) would have slope1 = slope2 here, but here the two slopes are very different! I think that's OK though: the theoretical bound is very conservative and this seems to fit the data very well.

@Unisay Unisay requested a review from kwxm December 23, 2025 14:55
Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but again I urge you to reduce the number of benchmarking inputs: see this earlier comment. We have over 1000 points here and that makes the benchmark take a very long time (> 90 minutes for just valueContains) and doesn't contribute anything to the accuracy of the model: you can see this by taking subsets of the existing data and fitting models to them. Also, the data is very unevenly distributed, which is probably not a good idea. There are 500 inputs with x<200 and y<200, so 1/25 of the space of inputs is contributing half of the datapoints, which will bias the model. Something like the below-diagonal part of a 25x25 or 30x30 grid of evenly spaced inputs (plus some points above the diagonal) would probably suffice and not lose any precision.

Also, the inputs aren't symmetric: the x values go up to 2047, but the y values only go up to 1000, so we're not always checking cases like a value containing itself, which might take some time. Can you remind me if there's some reason for this? We had earlier discussions about what the worst case would be, but I don't recall the details at the moment.

@kwxm kwxm added Builtins Costing Anything relating to costs, fees, gas, etc. labels Jan 8, 2026
Unisay added a commit that referenced this pull request Jan 12, 2026
Add LinearInXAndY case to the renderModel function in print-cost-model
executable to handle the linear_in_x_and_y costing function type that
was introduced for valueContains builtin.

This addresses Kenneth's comment about the missing case in the
print-cost-model executable.

Fixes: #7476 (comment)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Unisay added a commit that referenced this pull request Jan 12, 2026
Add LinearInXAndY case to the renderModel function in print-cost-model
executable to handle the linear_in_x_and_y costing function type that
was introduced for valueContains builtin.

This addresses Kenneth's comment about the missing case in the
print-cost-model executable.

Fixes: #7476 (comment)
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from ccf8d7f to b57b8af Compare January 12, 2026 10:26
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from 5a3949b to bf514d6 Compare January 12, 2026 12:33
Unisay added a commit that referenced this pull request Jan 12, 2026
Add LinearInXAndY case to the renderModel function in print-cost-model
executable to handle the linear_in_x_and_y costing function type that
was introduced for valueContains builtin.

This addresses Kenneth's comment about the missing case in the
print-cost-model executable.

Fixes: #7476 (comment)
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from bf514d6 to 18ff964 Compare January 12, 2026 13:40
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from 18ff964 to d866053 Compare January 12, 2026 19:44
@Unisay Unisay requested a review from kwxm January 12, 2026 19:47
@kwxm
Copy link
Contributor

kwxm commented Jan 13, 2026

Did you update the contents of benching-conway.csv"? It still contains 1052 results for ValueContains` but the new version of the benchmarks only has 607.

@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from d866053 to 0ac769a Compare January 13, 2026 11:31
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from 0ac769a to 5c46c7b Compare January 13, 2026 11:39
@Unisay
Copy link
Contributor Author

Unisay commented Jan 13, 2026

Did you update the contents of benching-conway.csv"? It still contains 1052 results for ValueContains` but the new version of the benchmarks only has 607

My bad, fixed now:
image

@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from 5c46c7b to 8c562f4 Compare January 13, 2026 11:51
Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks: I think we can merge this now! We'lll probably need @zliu41 to do it though.

@kwxm kwxm self-requested a review January 13, 2026 12:51
Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I clicked Comment instead of Approve. Let's try try that again...

Replace the multiplicative cost model (x_mem * y_mem) with a piecewise linear model that distinguishes two cases:

1. Above diagonal (y > x): contained value larger than container, quick False return with constant cost

2. Below diagonal (x >= y): actual containment check with linear cost in both container size (x) and contained size (y)

Motivation: The multiplicative model was overly conservative. Real-world containment checks terminate early when impossible or perform linear BST lookups when possible. This model better reflects actual execution costs.

Implementation details:

- Update benchmark to use ValueTotalSize for both dimensions, enabling meaningful diagonal comparison where y > x implies impossibility

- Generate 55 below-diagonal cases (power-of-2 grid) plus 2 sparse above-diagonal cases for geometry diversity (~570 total data points)

- Fit separate models for each region: constant for above-diagonal, linear in x and y for below-diagonal

- Remove unnecessary defensive conditional now that benchmarks guarantee below-diagonal data exists

- Add print-cost-model support for "const_above_diagonal" model type

- Update SimpleJSON costing infrastructure to handle three-parameter linear models (intercept + two slopes)
Update expected CPU budgets for valueContains test cases to reflect the new linear two-variable cost model. All cases show reduced costs compared to the previous multiplicative model, confirming the model better captures actual execution behavior.
Update parameter names to reflect the piecewise linear cost model:

- ValueContains'cpu'arguments'constant (above-diagonal constant)

- ValueContains'cpu'arguments'model'arguments'intercept

- ValueContains'cpu'arguments'model'arguments'slope1 (x_mem coefficient)

- ValueContains'cpu'arguments'model'arguments'slope2 (y_mem coefficient)

These replace the previous two parameters (intercept, slope) from the multiplicative model.
Add missing LinearInXAndY constructor to the Agda RawModel type
and regenerate MAlonzo Haskell code. This constructor was added
to the Haskell Model type but was missing from the Agda source,
causing incomplete pattern match warnings during compilation.

The LinearInXAndY model is used for valueContains costing as
part of the const_above_diagonal approach.
@Unisay Unisay force-pushed the yura/valuecontains-linear-costing branch from 8c562f4 to af91015 Compare January 13, 2026 14:01
@zliu41 zliu41 merged commit aee28e0 into master Jan 13, 2026
4 of 6 checks passed
@zliu41 zliu41 deleted the yura/valuecontains-linear-costing branch January 13, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Builtins Costing Anything relating to costs, fees, gas, etc. No Changelog Required Add this to skip the Changelog Check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants