Skip to content

Update requirements#2348

Merged
mergify[bot] merged 2 commits intotskit-dev:mainfrom
benjeffery:update-requirements-2503
Mar 21, 2025
Merged

Update requirements#2348
mergify[bot] merged 2 commits intotskit-dev:mainfrom
benjeffery:update-requirements-2503

Conversation

@benjeffery
Copy link
Member

No description provided.

@benjeffery benjeffery force-pushed the update-requirements-2503 branch 3 times, most recently from db5a46b to a0b1b1f Compare March 19, 2025 23:56
@codecov
Copy link

codecov bot commented Mar 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.93%. Comparing base (35ee672) to head (725c912).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2348   +/-   ##
=======================================
  Coverage   90.93%   90.93%           
=======================================
  Files          20       20           
  Lines       12018    12018           
  Branches     2316     2316           
=======================================
  Hits        10929    10929           
  Misses        602      602           
  Partials      487      487           
Flag Coverage Δ
C 90.93% <ø> (ø)
python 98.70% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@benjeffery benjeffery force-pushed the update-requirements-2503 branch 2 times, most recently from 3b1d2ed to e0b748c Compare March 20, 2025 00:47
@benjeffery
Copy link
Member Author

benjeffery commented Mar 20, 2025

Very odd failure here on just windows, the matrix S looks completely different to linux/osx.

@jeromekelleher
Copy link
Member

Any idea what might have changed here @benjeffery? I.e., did some of the library updates change the input to the test?

Nonetheless, seems like a bug has been exposed here, which we should try and characterise?

@benjeffery
Copy link
Member Author

I'm guessing it is scipy/numpy related - will do some digging. Maybe something related to default types in the CPython API, or something fiddly like that.

@jeromekelleher
Copy link
Member

So it's not unexpected that S would look different on Windows vs Unix here if there's random numbers involved - what seems most likely to me is that random numbers have changed and Windows has picked out some very unlikely edge case that happens to cause problems. If that's the case, we can skipIf that test to get the build running, and file an issue for follow-up

@benjeffery benjeffery force-pushed the update-requirements-2503 branch 4 times, most recently from c009b6c to c01ecd0 Compare March 20, 2025 12:37
@benjeffery
Copy link
Member Author

benjeffery commented Mar 20, 2025

Ok, digging complete, there is no randomess. On Linux, the largest eigenvalue is exactly 1, which clearly meets the tolerance of 1e-8. On Windows this value is 0.99885! Likely due to very different underlying BLAS libraries where numerical error is accumulating on Windows.

I've had a look around for a fix here and these are the options:

  1. Keep the current code but relax the tolerance.
  2. Switch to power iteration - less sensitive to numerical issues but might be a slower
  3. Solve the linear system directly. Probably doable, no iteration, but much more complex

I went with option 2 in this PR. Would like an expert review though as I am stretching ancient neurons from long ago here.

@benjeffery benjeffery force-pushed the update-requirements-2503 branch 2 times, most recently from e9f75f8 to 3810cd0 Compare March 20, 2025 12:43
@benjeffery
Copy link
Member Author

Also: Maybe error or warn on failure to converge?

@benjeffery
Copy link
Member Author

@andrewkern I believe you wrote this code initially, so some review would be appreciated.

@andrewkern
Copy link
Member

wow this is weird!

I think the eigenvalue solver I originally implemented can be unstable for really large matrices, but I don't think we are in that space here. The power iteration you implemented should also work fine and be more stable for large matrices, but IIRC it is slower.

Just catching up -- you downgraded the CI test reqs to a different scipy version and this started throwing an error? looking over the reqs we aren't pinning a specific numpy version.... I'm guessing what's happening is the downgraded scipy version is using a different BLAS backend on Windows. any chance you could get the output from np.show_config() on both systems to compare?

@andrewkern
Copy link
Member

also is there a reason we don't have numpy pinned in the CI reqs?

@benjeffery benjeffery force-pushed the update-requirements-2503 branch from 3810cd0 to 7f3e1e4 Compare March 20, 2025 14:25
@andrewkern
Copy link
Member

looking through the install conda deps step on the windows test i see this:

image

I wonder if the MKL version is getting us

@benjeffery
Copy link
Member Author

The change in this test is upgrading scipy 1.11.3 -> 1.13.1. (There are multiple CI's one got downgraded by mistake so thanks for the spot!) We don't pin dependencies that come as a result of other dependencies - this is a trade off of stability vs detecting breakage.

Here is the np.show_configs:
Linux:

{
    "c": {
      "name": "gcc",
      "linker": "ld.bfd",
      "version": "13.3.0",
      "commands": "/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/_build_env/bin/x86_64-conda-linux-gnu-cc",
      "args": "-march=nocona, -mtune=haswell, -ftree-vectorize, -fPIC, -fstack-protector-strong, -fno-plt, -O2, -ffunction-sections, -pipe, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include, -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/work=/usr/local/src/conda/numpy-2.2.4, -fdebug-prefix-map=/usr/share/miniconda/envs/anaconda-client-env=/usr/local/src/conda-prefix, -DNDEBUG, -D_FORTIFY_SOURCE=2, -O2, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include",
      "linker args": "-Wl,-O2, -Wl,--sort-common, -Wl,--as-needed, -Wl,-z,relro, -Wl,-z,now, -Wl,--disable-new-dtags, -Wl,--gc-sections, -Wl,--allow-shlib-undefined, -Wl,-rpath,/usr/share/miniconda/envs/anaconda-client-env/lib, -Wl,-rpath-link,/usr/share/miniconda/envs/anaconda-client-env/lib, -L/usr/share/miniconda/envs/anaconda-client-env/lib, -march=nocona, -mtune=haswell, -ftree-vectorize, -fPIC, -fstack-protector-strong, -fno-plt, -O2, -ffunction-sections, -pipe, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include, -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/work=/usr/local/src/conda/numpy-2.2.4, -fdebug-prefix-map=/usr/share/miniconda/envs/anaconda-client-env=/usr/local/src/conda-prefix, -DNDEBUG, -D_FORTIFY_SOURCE=2, -O2, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.12",
      "commands": "cython"
    },
    "c++": {
      "name": "gcc",
      "linker": "ld.bfd",
      "version": "13.3.0",
      "commands": "/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/_build_env/bin/x86_64-conda-linux-gnu-c++",
      "args": "-fvisibility-inlines-hidden, -fmessage-length=0, -march=nocona, -mtune=haswell, -ftree-vectorize, -fPIC, -fstack-protector-strong, -fno-plt, -O2, -ffunction-sections, -pipe, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include, -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/work=/usr/local/src/conda/numpy-2.2.4, -fdebug-prefix-map=/usr/share/miniconda/envs/anaconda-client-env=/usr/local/src/conda-prefix, -DNDEBUG, -D_FORTIFY_SOURCE=2, -O2, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include",
      "linker args": "-Wl,-O2, -Wl,--sort-common, -Wl,--as-needed, -Wl,-z,relro, -Wl,-z,now, -Wl,--disable-new-dtags, -Wl,--gc-sections, -Wl,--allow-shlib-undefined, -Wl,-rpath,/usr/share/miniconda/envs/anaconda-client-env/lib, -Wl,-rpath-link,/usr/share/miniconda/envs/anaconda-client-env/lib, -L/usr/share/miniconda/envs/anaconda-client-env/lib, -fvisibility-inlines-hidden, -fmessage-length=0, -march=nocona, -mtune=haswell, -ftree-vectorize, -fPIC, -fstack-protector-strong, -fno-plt, -O2, -ffunction-sections, -pipe, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include, -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/numpy_1742254798654/work=/usr/local/src/conda/numpy-2.2.4, -fdebug-prefix-map=/usr/share/miniconda/envs/anaconda-client-env=/usr/local/src/conda-prefix, -DNDEBUG, -D_FORTIFY_SOURCE=2, -O2, -isystem, /usr/share/miniconda/envs/anaconda-client-env/include"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    }
  },
  "Build Dependencies": {
    "blas": {
      "name": "blas",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/usr/share/miniconda/envs/anaconda-client-env/include",
      "lib directory": "/usr/share/miniconda/envs/anaconda-client-env/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/usr/share/miniconda/envs/anaconda-client-env/lib/pkgconfig"
    },
    "lapack": {
      "name": "lapack",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "/usr/share/miniconda/envs/anaconda-client-env/include",
      "lib directory": "/usr/share/miniconda/envs/anaconda-client-env/lib",
      "openblas configuration": "unknown",
      "pc file directory": "/usr/share/miniconda/envs/anaconda-client-env/lib/pkgconfig"
    }
  },
  "Python Information": {
    "path": "/usr/share/miniconda/envs/anaconda-client-env/bin/python",
    "version": "3.12"
  },
  "SIMD Extensions": {
    "baseline": [
      "SSE",
      "SSE2",
      "SSE3"
    ],
    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2"
    ],
    "not found": [
      "AVX512F",
      "AVX512CD",
      "AVX512_KNL",
      "AVX512_KNM",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL",
      "AVX512_SPR"
    ]
  }
}

Windows:

{
  "Compilers": {
    "c": {
      "name": "msvc",
      "linker": "link",
      "version": "19.29.30158",
      "commands": "cl.exe"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.12",
      "commands": "cython"
    },
    "c++": {
      "name": "msvc",
      "linker": "link",
      "version": "19.29.30158",
      "commands": "cl.exe"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "windows"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "windows"
    }
  },
  "Build Dependencies": {
    "blas": {
      "name": "blas",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "C:/Miniconda/envs/anaconda-client-env/Library/include",
      "lib directory": "C:/Miniconda/envs/anaconda-client-env/Library/lib",
      "openblas configuration": "unknown",
      "pc file directory": "D:\\bld\\numpy_1742254808786\\_h_env\\Library\\lib\\pkgconfig"
    },
    "lapack": {
      "name": "lapack",
      "found": true,
      "version": "3.9.0",
      "detection method": "pkgconfig",
      "include directory": "C:/Miniconda/envs/anaconda-client-env/Library/include",
      "lib directory": "C:/Miniconda/envs/anaconda-client-env/Library/lib",
      "openblas configuration": "unknown",
      "pc file directory": "D:\\bld\\numpy_1742254808786\\_h_env\\Library\\lib\\pkgconfig"
    }
  },
  "Python Information": {
    "path": "D:\\bld\\numpy_1742254808786\\_h_env\\python.exe",
    "version": "3.12"
  },
  "SIMD Extensions": {
    "baseline": [
      "SSE",
      "SSE2",
      "SSE3"
    ],
    "found": [
      "SSSE3",
      "SSE41",
      "POPCNT",
      "SSE42",
      "AVX",
      "F16C",
      "FMA3",
      "AVX2"
    ],
    "not found": [
      "AVX512F",
      "AVX512CD",
      "AVX512_SKX",
      "AVX512_CLX",
      "AVX512_CNL",
      "AVX512_ICL"
    ]
  }
}

@benjeffery
Copy link
Member Author

BTW Even if we can pin CI to a version so this doesn't happen, we can't pin numpy in the distributed package so it needs a fix.

@andrewkern
Copy link
Member

andrewkern commented Mar 20, 2025

I'd be curious if adding nomkl to the conda deps would fix this. We probably want to make sure that backends aren't changing around in the distributed version either.

@petrelharp might have strong feelings about using power iteration to find the stationary distribution versus solving the eigensystem

@petrelharp
Copy link
Contributor

Power iteration is a great way to go. Nice work.

I don't think we want to be pinning the linear algebra backend, since then if it changes and causes a bug we find out about it?

hi=None,
root_distribution=None,
):
print(np.show_config())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray print

@petrelharp
Copy link
Contributor

Hm, actually, I see the solve is in production, not testing code, so thinking harder about it: I'm not worried about it being slower because we only have to do this once per ts.mutate( ), and if someone is calling ts.mutate() a lot then they could pre-calculate the root_distribution.

I never saw the error, though - how big was the matrix? And, what was the second eigenvalue?

We should throw a error on lack of convergence; it's possible that the transition matrix is periodic, I suppose.

@jeromekelleher
Copy link
Member

Maybe we could spin this into a specific issue and follow-up PR? We can pytest skip the test on Windows to get the major stuff through and then the actual problem will come up.

My interpretation is that this change as exposed a bug that we need to fix.

@benjeffery benjeffery force-pushed the update-requirements-2503 branch from 7f3e1e4 to 725c912 Compare March 21, 2025 10:08
@benjeffery
Copy link
Member Author

Ok, reverted code and skipped the test. Issue filed at #2349.

@mergify mergify bot merged commit 52d3b8d into tskit-dev:main Mar 21, 2025
16 checks passed
@benjeffery benjeffery deleted the update-requirements-2503 branch March 21, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants