Skip to content

[BUG] Inconsistent Crash on markshare2 #593

@nguidotti

Description

@nguidotti

When solving the markshare2 instance on a Grace-Hopper node, cuopt abort due to the illegal memory access (see log below). Since this is inconsistent and not always in the same spot, I suspect this is caused by a race condition

❯ cuopt/cpp/build/solve_MIP --time-limit 200 --path datasets/miplib2017/markshare2.mps.bz2
running file markshare2.mps.bz2 on gpu : 0
cuOpt version: 25.12.0, git hash: 326eec6, host arch: aarch64, device archs: 90a-real
CPU: Unknown, threads (physical/logical): 1/72, RAM: 510.61 GiB
CUDA 13.0, device: NVIDIA GH200 480GB (ID 0), VRAM: 95.00 GiB
CUDA device UUID: 6c4b967c-f0a1-e1d9-c69e-eac1b8a2df98

Solving a problem with 7 constraints, 74 variables (60 integers), and 434 nonzeros
Problem scaling:
Objective coefficents range:          [1e+00, 1e+00]
Constraint matrix coefficients range: [1e+00, 1e+02]
Constraint rhs / bounds range:        [1e+03, 2e+03]
Variable bounds range:                [0e+00, 1e+00]

Original problem: 7 constraints, 74 variables, 434 nonzeros
Calling Papilo presolver
Presolve status: reduced the problem
Presolve removed: 0 constraints, 14 variables, 14 nonzeros
Presolved problem: 7 constraints, 60 variables, 420 nonzeros
Objective function is integral
Papilo presolve time: 0.063006
Objective offset 10512.000000 scaling_factor 1.000000
Running presolve!
After trivial presolve: 7 constraints, 60 variables, objective offset 10512.000000.
Using 72 CPU threads for B&B
Solving LP root relaxation
Scaling matrix. Maximum column norm 1.000000e+00
Dual Simplex Phase 1
Dual feasible solution found.
Dual Simplex Phase 2
 Iter     Objective           Num Inf.  Sum Inf.     Perturb  Time
    1 -1.0188571428571424e+04       6 9.11376686e+00 0.00e+00 0.00


Root relaxation solution found in 35 iterations and 0.00s
Root relaxation objective +5.45696821e-12


Strong branching using 72 threads and 7 fractional variables
Exploring the B&B tree using 18 best-first threads and 54 diving threads (72 threads)
 | Explored | Unexplored |    Objective    |     Bound     | Depth | Iter/Node |   Gap    |  Time  |
D       901          867    +1.129000e+03    -7.094059e-10       31   3.7e+01     100.0%      0.01
D       920          886    +7.820000e+02    -7.094059e-10       32   3.7e+01     100.0%      0.01
B       931          897    +7.300000e+02    -7.094059e-10       45   3.7e+01     100.0%      0.01
D       951          915    +6.950000e+02    -7.094059e-10       40   3.6e+01     100.0%      0.01
D       957          919    +5.390000e+02    -7.094059e-10       35   3.6e+01     100.0%      0.01
D       957          929    +3.830000e+02    -7.094059e-10       35   3.6e+01     100.0%      0.01
B       960          928    +2.410000e+02    -7.094059e-10       44   3.6e+01     100.0%      0.01
B       988          940    +2.220000e+02    -7.094059e-10       47   3.5e+01     100.0%      0.01
D      1143         1047    +2.220000e+02    -3.456080e-11       42   3.3e+01     100.0%      0.01
D      1191         1085    +2.180000e+02    -4.183676e-11       47   3.2e+01     100.0%      0.01
D      1267         1143    +2.110000e+02    -1.347871e-09       39   3.1e+01     100.0%      0.01
D      1281         1155    +1.370000e+02    -1.347871e-09       43   3.1e+01     100.0%      0.01
D      2001         1655    +1.290000e+02    -2.455636e-10       47   2.5e+01     100.0%      0.02
D      2842         2264    +9.100000e+01    -2.910383e-11       38   2.3e+01     100.0%      0.02
D      4304         3334    +9.000000e+01    -2.728484e-11       44   2.2e+01     100.0%      0.03
D      4411         3415    +8.700000e+01    -3.456080e-11       43   2.2e+01     100.0%      0.03
D      8129         6015    +6.900000e+01    -1.127773e-10       37   2.1e+01     100.0%      0.04
D      8787         6495    +6.700000e+01    -2.546585e-11       46   2.0e+01     100.0%      0.05
D     11099         8075    +6.300000e+01    -1.637090e-11       41   2.0e+01     100.0%      0.06
D     14102         9994    +5.700000e+01    -9.094947e-12       39   2.0e+01     100.0%      0.07
D     21952        14990    +4.900000e+01    -3.819878e-11       39   2.0e+01     100.0%      0.10
D     69168        42616    +4.700000e+01    -1.637090e-11       41   1.9e+01     100.0%      0.29
D    126516        74808    +3.700000e+01    -3.274181e-11       46   1.9e+01     100.0%      0.52
D    127253        75155    +3.300000e+01    -1.818989e-11       42   1.9e+01     100.0%      0.52
     253965       139751    +3.300000e+01    -1.273293e-11       23   1.9e+01     100.0%      1.01
     521557       266669    +3.300000e+01    -1.273293e-11       50   1.9e+01     100.0%      2.01
     785932       389790    +3.300000e+01    -2.728484e-11       50   1.9e+01     100.0%      3.01
D    973277       465073    +3.200000e+01    -1.637090e-11       41   1.8e+01     100.0%      3.68
    1069677       500497    +3.200000e+01    -1.091394e-11       38   1.8e+01     100.0%      4.01
    1338211       618639    +3.200000e+01    -1.455192e-11       46   1.8e+01     100.0%      5.01
D   1379706       637724    +2.900000e+01    -9.822543e-11       41   1.8e+01     100.0%      5.16
D   1515259       694559    +2.900000e+01    -1.455192e-11       47   1.8e+01     100.0%      5.66
    1614911       729827    +2.900000e+01    -1.818989e-12       44   1.8e+01     100.0%      6.01
    1904262       831832    +2.900000e+01    -1.273293e-11       37   1.8e+01     100.0%      7.01
    2191216       933892    +2.900000e+01    -2.182787e-11       43   1.8e+01     100.0%      8.01
    2471597      1037937    +2.900000e+01    -2.546585e-11       51   1.8e+01     100.0%      9.01
    2738439      1149261    +2.900000e+01    -1.818989e-12       43   1.8e+01     100.0%     10.01
D   2780317      1167583    +2.700000e+01    -2.546585e-11       45   1.8e+01     100.0%     10.17
H                           +1.500000e+01    -3.637979e-12                        100.0%     10.52
    3016412      1249720    +1.500000e+01    -1.455192e-11       40   1.8e+01     100.0%     11.01
    3299500      1344580    +1.500000e+01    -5.456968e-12       43   1.8e+01     100.0%     12.01
    3582351      1436345    +1.500000e+01    -1.818989e-12       44   1.9e+01     100.0%     13.01
    3873433      1520717    +1.500000e+01    -6.912160e-11       45   1.9e+01     100.0%     14.01
    4154338      1608470    +1.500000e+01    -8.731149e-11       45   1.9e+01     100.0%     15.01
    4438517      1696737    +1.500000e+01    -1.818989e-11       48   1.9e+01     100.0%     16.01
    4718195      1792871    +1.500000e+01    -2.364686e-11       40   1.9e+01     100.0%     17.01
    4998614      1882200    +1.500000e+01    -9.094947e-12       38   1.9e+01     100.0%     18.01
    5278403      1971455    +1.500000e+01    -9.094947e-12       41   1.9e+01     100.0%     19.01
    5549982      2061972    +1.500000e+01    -4.365575e-11       46   2.0e+01     100.0%     20.01
    5816598      2161190    +1.500000e+01    -4.183676e-11       46   2.0e+01     100.0%     21.01
    6092308      2255470    +1.500000e+01    -3.092282e-11       47   2.0e+01     100.0%     22.01
    6371439      2340977    +1.500000e+01    -3.637979e-12       36   2.0e+01     100.0%     23.01
    6648911      2426727    +1.500000e+01    -2.728484e-11       38   2.0e+01     100.0%     24.01
    6926601      2515035    +1.500000e+01    -4.183676e-11       37   2.0e+01     100.0%     25.01
    7202923      2600811    +1.500000e+01    -1.818989e-11       36   2.0e+01     100.0%     26.01
    7483937      2686571    +1.500000e+01    -2.910383e-11       48   2.0e+01     100.0%     27.01
    7758209      2776075    +1.500000e+01    -3.637979e-12       49   2.0e+01     100.0%     28.01
zsh: segmentation fault (core dumped)  cuopt/cpp/build/solve_MIP --time-limit 200 --path 

Metadata

Metadata

Labels

awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.bugSomething isn't working

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions