refactor(cost-model): use linear costing for valueContains#7476
refactor(cost-model): use linear costing for valueContains#7476
Conversation
|
|
This PR replaces #7476 |
| "model": { | ||
| "arguments": { | ||
| "intercept": 386227, | ||
| "slope1": 1970, |
There was a problem hiding this comment.
The theoretical linear upper bound of k*(m+n) would have slope1 = slope2 here, but here the two slopes are very different! I think that's OK though: the theoretical bound is very conservative and this seems to fit the data very well.
There was a problem hiding this comment.
This looks good, but again I urge you to reduce the number of benchmarking inputs: see this earlier comment. We have over 1000 points here and that makes the benchmark take a very long time (> 90 minutes for just valueContains) and doesn't contribute anything to the accuracy of the model: you can see this by taking subsets of the existing data and fitting models to them. Also, the data is very unevenly distributed, which is probably not a good idea. There are 500 inputs with x<200 and y<200, so 1/25 of the space of inputs is contributing half of the datapoints, which will bias the model. Something like the below-diagonal part of a 25x25 or 30x30 grid of evenly spaced inputs (plus some points above the diagonal) would probably suffice and not lose any precision.
Also, the inputs aren't symmetric: the x values go up to 2047, but the y values only go up to 1000, so we're not always checking cases like a value containing itself, which might take some time. Can you remind me if there's some reason for this? We had earlier discussions about what the worst case would be, but I don't recall the details at the moment.
plutus-core/plutus-core/src/PlutusCore/Evaluation/Machine/CostingFun/SimpleJSON.hs
Show resolved
Hide resolved
Add LinearInXAndY case to the renderModel function in print-cost-model executable to handle the linear_in_x_and_y costing function type that was introduced for valueContains builtin. This addresses Kenneth's comment about the missing case in the print-cost-model executable. Fixes: #7476 (comment) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add LinearInXAndY case to the renderModel function in print-cost-model executable to handle the linear_in_x_and_y costing function type that was introduced for valueContains builtin. This addresses Kenneth's comment about the missing case in the print-cost-model executable. Fixes: #7476 (comment)
ccf8d7f to
b57b8af
Compare
5a3949b to
bf514d6
Compare
Add LinearInXAndY case to the renderModel function in print-cost-model executable to handle the linear_in_x_and_y costing function type that was introduced for valueContains builtin. This addresses Kenneth's comment about the missing case in the print-cost-model executable. Fixes: #7476 (comment)
bf514d6 to
18ff964
Compare
18ff964 to
d866053
Compare
|
Did you update the contents of |
d866053 to
0ac769a
Compare
0ac769a to
5c46c7b
Compare
5c46c7b to
8c562f4
Compare
kwxm
left a comment
There was a problem hiding this comment.
Sorry, I clicked Comment instead of Approve. Let's try try that again...
Replace the multiplicative cost model (x_mem * y_mem) with a piecewise linear model that distinguishes two cases: 1. Above diagonal (y > x): contained value larger than container, quick False return with constant cost 2. Below diagonal (x >= y): actual containment check with linear cost in both container size (x) and contained size (y) Motivation: The multiplicative model was overly conservative. Real-world containment checks terminate early when impossible or perform linear BST lookups when possible. This model better reflects actual execution costs. Implementation details: - Update benchmark to use ValueTotalSize for both dimensions, enabling meaningful diagonal comparison where y > x implies impossibility - Generate 55 below-diagonal cases (power-of-2 grid) plus 2 sparse above-diagonal cases for geometry diversity (~570 total data points) - Fit separate models for each region: constant for above-diagonal, linear in x and y for below-diagonal - Remove unnecessary defensive conditional now that benchmarks guarantee below-diagonal data exists - Add print-cost-model support for "const_above_diagonal" model type - Update SimpleJSON costing infrastructure to handle three-parameter linear models (intercept + two slopes)
Update expected CPU budgets for valueContains test cases to reflect the new linear two-variable cost model. All cases show reduced costs compared to the previous multiplicative model, confirming the model better captures actual execution behavior.
Update parameter names to reflect the piecewise linear cost model: - ValueContains'cpu'arguments'constant (above-diagonal constant) - ValueContains'cpu'arguments'model'arguments'intercept - ValueContains'cpu'arguments'model'arguments'slope1 (x_mem coefficient) - ValueContains'cpu'arguments'model'arguments'slope2 (y_mem coefficient) These replace the previous two parameters (intercept, slope) from the multiplicative model.
Add missing LinearInXAndY constructor to the Agda RawModel type and regenerate MAlonzo Haskell code. This constructor was added to the Haskell Model type but was missing from the Agda source, causing incomplete pattern match warnings during compilation. The LinearInXAndY model is used for valueContains costing as part of the const_above_diagonal approach.
8c562f4 to
af91015
Compare

Summary
valueContainscosting frommultiplied_sizestoconst_above_diagonalwithlinear_in_x_and_yinner modelLinearInXAndYconstructor to SimpleJSON.hs for JSON parsingMotivation
The true complexity of
valueContainsisO(m*log(n/m+1))where n is the container size and m is the contained size. Kenneth MacKenzie showed that this is bounded above by a linear function(m+n)/kfor some constant k.The new model better reflects this:
intercept + slope1*x + slope2*yBenchmark Improvements
Addressed data distribution issues identified in code review:
Changes
models.RvalueContainsModelto useconst_above_diagonalwithlinear_in_x_and_ySimpleJSON.hsLinearInXAndYconstructor for JSON parsingutils.jslinear_in_x_and_yandconst_above_diagonalBenchmarks/Values.hsbenching-conway.csvbuiltinCostModel{A,B,C}.jsonV{1,2,3}/ParamName.hsNew Cost Model Structure
Parameter Update
The new cost model structure required updating the ParamName enums in V1, V2, and V3:
Old (2 CPU parameters):
ValueContains'cpu'arguments'interceptValueContains'cpu'arguments'slopeNew (4 CPU parameters):
ValueContains'cpu'arguments'constantValueContains'cpu'arguments'model'arguments'interceptValueContains'cpu'arguments'model'arguments'slope1ValueContains'cpu'arguments'model'arguments'slope2This fixes
extractCostModelParamsLedgerOrderfailures that occurred when the parameter count didn't match the expected structure.Test plan
Closes https://github.com/IntersectMBO/plutus-private/issues/1984