Skip to content

Commit 301fbb9

Browse files
authored
Merge pull request #363 from fabinsch/topic/gil
[reopen2] Release GIL when doing standalone solves
2 parents 898270b + 3169275 commit 301fbb9

File tree

6 files changed

+293
-78
lines changed

6 files changed

+293
-78
lines changed

.github/workflows/ci-linux-osx-win-conda.yml

Lines changed: 8 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -57,40 +57,29 @@ jobs:
5757
with:
5858
submodules: recursive
5959

60-
- uses: conda-incubator/setup-miniconda@v2
61-
if: matrix.os != 'macos-14'
60+
- uses: conda-incubator/setup-miniconda@v3
6261
with:
63-
miniforge-variant: Mambaforge
6462
miniforge-version: latest
65-
channels: conda-forge
66-
python-version: "3.10"
6763
activate-environment: proxsuite
6864

69-
- uses: conda-incubator/setup-miniconda@v3
70-
if: matrix.os == 'macos-14'
71-
with:
72-
channels: conda-forge
73-
python-version: "3.10"
74-
activate-environment: proxsuite
75-
installer-url: https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-MacOSX-arm64.sh
7665

7766
- name: Install dependencies [Conda]
7867
shell: bash -l {0}
7968
run: |
8069
# Workaround for https://github.com/conda-incubator/setup-miniconda/issues/186
8170
conda config --remove channels defaults
8271
# Compilation related dependencies
83-
mamba install cmake compilers make pkg-config doxygen ninja graphviz typing_extensions llvm-openmp clang
72+
conda install cmake compilers make pkg-config doxygen ninja graphviz typing_extensions llvm-openmp clang
8473
# Main dependencies
85-
mamba install eigen simde
74+
conda install eigen simde
8675
# Test dependencies
87-
mamba install libmatio numpy scipy
76+
conda install libmatio numpy scipy
8877
89-
- name: Install julia [macOS/Linux]
90-
if: contains(matrix.os, 'macos-latest') || contains(matrix.os, 'ubuntu')
78+
- name: Install julia [Linux]
79+
if: contains(matrix.os, 'ubuntu')
9180
shell: bash -l {0}
9281
run: |
93-
mamba install julia
82+
conda install julia
9483
9584
- name: Activate ccache [Conda]
9685
uses: hendrikmuhs/[email protected]
@@ -102,7 +91,7 @@ jobs:
10291
shell: bash -l {0}
10392
run: |
10493
conda info
105-
mamba list
94+
conda list
10695
env
10796
10897
- name: Configure [Conda/Linux&macOS]
@@ -142,7 +131,6 @@ jobs:
142131
shell: bash -l {0}
143132
run: |
144133
echo $(where ccache)
145-
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
146134
git submodule update --init
147135
mkdir build
148136
cd build
@@ -155,7 +143,6 @@ jobs:
155143
shell: bash -l {0}
156144
run: |
157145
echo $(where ccache)
158-
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
159146
git submodule update --init
160147
mkdir build
161148
cd build
@@ -168,7 +155,6 @@ jobs:
168155
shell: bash -l {0}
169156
run: |
170157
echo $(where ccache)
171-
ls C:\\Miniconda3\\envs\\proxsuite\\Library\\lib
172158
git submodule update --init
173159
mkdir build
174160
cd build

.github/workflows/gh-pages.yml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,9 @@ jobs:
1212
with:
1313
submodules: recursive
1414

15-
- uses: conda-incubator/setup-miniconda@v2
15+
- uses: conda-incubator/setup-miniconda@v3
1616
with:
17-
miniforge-variant: Mambaforge
1817
miniforge-version: latest
19-
channels: conda-forge
2018
python-version: "3.10"
2119
activate-environment: doc
2220

@@ -27,16 +25,16 @@ jobs:
2725
conda config --remove channels defaults
2826
2927
# Compilation related dependencies
30-
mamba install cmake make pkg-config doxygen graphviz
28+
conda install cmake make pkg-config doxygen graphviz
3129
3230
# Main dependencies
33-
mamba install eigen
31+
conda install eigen
3432
3533
- name: Print environment
3634
shell: bash -l {0}
3735
run: |
3836
conda info
39-
mamba list
37+
conda list
4038
env
4139
4240
- name: Configure

.github/workflows/release-osx-win.yml

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -35,38 +35,25 @@ jobs:
3535
git submodule update
3636
3737
- name: Setup conda
38-
if: contains(matrix.os, 'macos-13') || contains(matrix.os, 'windows')
39-
uses: conda-incubator/setup-miniconda@v2
40-
with:
41-
miniforge-variant: Mambaforge
42-
miniforge-version: latest
43-
channels: conda-forge
44-
python-version: ${{ matrix.python-version }}
45-
activate-environment: proxsuite
46-
47-
- name: Setup conda
48-
if: matrix.os == 'macos-14'
4938
uses: conda-incubator/setup-miniconda@v3
5039
with:
51-
channels: conda-forge
40+
miniforge-version: latest
5241
python-version: ${{ matrix.python-version }}
5342
activate-environment: proxsuite
54-
installer-url: https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-MacOSX-arm64.sh
5543

5644
- name: Install dependencies [Conda]
5745
if: contains(matrix.os, 'macos') || contains(matrix.os, 'windows')
5846
shell: bash -l {0}
5947
run: |
6048
# Workaround for https://github.com/conda-incubator/setup-miniconda/issues/186
6149
conda config --remove channels defaults
62-
mamba install doxygen graphviz eigen simde cmake compilers typing_extensions
50+
conda install doxygen graphviz eigen simde cmake compilers typing_extensions
6351
6452
- name: Print environment [Conda]
65-
if: contains(matrix.os, 'macos') || contains(matrix.os, 'windows')
6653
shell: bash -l {0}
6754
run: |
6855
conda info
69-
mamba list
56+
conda list
7057
env
7158
7259
- name: Build wheel

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
1212

1313
### Added
1414
* Stub files for Python bindings, using [nanobind's native support](https://nanobind.readthedocs.io/en/latest/typing.html#stub-generation) ([#340](https://github.com/Simple-Robotics/proxsuite/pull/340))
15+
* Add `solve_no_gil` for dense backend (multithreading via python) ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))
16+
* Add benchmarks for `solve_no_gil` vs `solve_in_parallel` (openmp) ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))
1517

1618
### Changed
1719
* Change Python bindings to use nanobind instead of pybind11 ([#340](https://github.com/Simple-Robotics/proxsuite/pull/340))
20+
* Update setup-minicondav2 to v3 ([#363](https://github.com/Simple-Robotics/proxsuite/pull/363))
1821

1922

2023
## [0.6.7] - 2024-08-27

benchmark/timings-parallel.py

Lines changed: 107 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22
import numpy as np
33
import scipy.sparse as spa
44
from time import perf_counter_ns
5+
from concurrent.futures import ThreadPoolExecutor
6+
7+
"""
8+
There are two interfaces to solve a QP problem with the dense backend. a) create a qp object by passing the problem data (matrices, vectors) to the qp.init method (this does memory allocation and the preconditioning) and then calling qp.solve or b) use the solve function directly taking the problem data as input (this does everything in one go).
9+
10+
Currently, only the qp.solve method (a) is parallelized (using openmp). Therefore the memory alloc + preconditioning is done in serial when building a batch of qps that is then passed to the `solve_in_parallel` function. The solve function (b) is not parallelized but can easily be parallelized in Python using ThreadPoolExecutor.
11+
12+
Here we do some timings to compare the two approaches. We generate a batch of QP problems and solve them in parallel using the `solve_in_parallel` function and compare the timings (need to add the timings for building the batch of qps + the parallel solving) with solving each problem in parallel using ThreadPoolExecutor for the solve function.
13+
"""
14+
15+
num_threads = proxsuite.proxqp.omp_get_max_threads()
516

617

718
def generate_mixed_qp(n, n_eq, n_in, seed=1):
@@ -23,45 +34,109 @@ def generate_mixed_qp(n, n_eq, n_in, seed=1):
2334
u = A @ v
2435
l = -1.0e20 * np.ones(m)
2536

26-
return P.toarray(), q, A[:n_eq, :], u[:n_eq], A[n_in:, :], u[n_in:], l[n_in:]
37+
return P.toarray(), q, A[:n_eq, :], u[:n_eq], A[n_in:, :], l[n_in:], u[n_in:]
2738

2839

29-
n = 500
30-
n_eq = 200
31-
n_in = 200
40+
problem_specs = [
41+
# (n, n_eq, n_in),
42+
(50, 20, 20),
43+
(100, 40, 40),
44+
(200, 80, 80),
45+
(500, 200, 200),
46+
(1000, 200, 200),
47+
]
3248

3349
num_qps = 128
3450

35-
# qps = []
36-
timings = {}
37-
qps = proxsuite.proxqp.dense.VectorQP()
38-
39-
tic = perf_counter_ns()
40-
for j in range(num_qps):
41-
qp = proxsuite.proxqp.dense.QP(n, n_eq, n_in)
42-
H, g, A, b, C, u, l = generate_mixed_qp(n, n_eq, n_in, seed=j)
43-
qp.init(H, g, A, b, C, l, u)
44-
qp.settings.eps_abs = 1e-9
45-
qp.settings.verbose = False
46-
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
47-
qps.append(qp)
48-
timings["problem_data"] = (perf_counter_ns() - tic) * 1e-6
49-
50-
tic = perf_counter_ns()
51-
for qp in qps:
52-
qp.solve()
53-
timings["solve_serial"] = (perf_counter_ns() - tic) * 1e-6
51+
for n, n_eq, n_in in problem_specs:
5452

55-
num_threads = proxsuite.proxqp.omp_get_max_threads()
56-
for j in range(1, num_threads):
53+
print(f"\nProblem specs: {n=} {n_eq=} {n_in=}. Generating {num_qps} such problems.")
54+
problems = [generate_mixed_qp(n, n_eq, n_in, seed=j) for j in range(num_qps)]
55+
print(
56+
f"Generated problems. Solving {num_qps} problems with proxsuite.proxqp.omp_get_max_threads()={num_threads} threads."
57+
)
58+
59+
timings = {}
60+
61+
# create a vector of QP objects. This is not efficient because memory is allocated when creating the qp object + when it is appended to the vector which creates a copy of the object.
62+
qps_vector = proxsuite.proxqp.dense.VectorQP()
5763
tic = perf_counter_ns()
58-
proxsuite.proxqp.dense.solve_in_parallel(j, qps)
59-
timings[f"solve_parallel_{j}_threads"] = (perf_counter_ns() - tic) * 1e-6
64+
print("\nSetting up vector of qps")
65+
for H, g, A, b, C, l, u in problems:
66+
qp = proxsuite.proxqp.dense.QP(n, n_eq, n_in)
67+
qp.init(H, g, A, b, C, l, u)
68+
qp.settings.eps_abs = 1e-9
69+
qp.settings.verbose = False
70+
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
71+
qps_vector.append(qp)
72+
timings["setup_vector_of_qps"] = (perf_counter_ns() - tic) * 1e-6
6073

74+
# use BatchQP, which can initialize the qp objects in place and is more efficient
75+
qps_batch = proxsuite.proxqp.dense.BatchQP()
76+
tic = perf_counter_ns()
77+
print("Setting up batch of qps")
78+
for H, g, A, b, C, l, u in problems:
79+
qp = qps_batch.init_qp_in_place(n, n_eq, n_in)
80+
qp.init(H, g, A, b, C, l, u)
81+
qp.settings.eps_abs = 1e-9
82+
qp.settings.verbose = False
83+
qp.settings.initial_guess = proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS
84+
timings["setup_batch_of_qps"] = (perf_counter_ns() - tic) * 1e-6
6185

62-
tic = perf_counter_ns()
63-
proxsuite.proxqp.dense.solve_in_parallel(qps=qps)
64-
timings[f"solve_parallel_heuristics_threads"] = (perf_counter_ns() - tic) * 1e-6
86+
print("Solving batch of qps using solve_in_parallel with default thread config")
87+
tic = perf_counter_ns()
88+
proxsuite.proxqp.dense.solve_in_parallel(qps=qps_batch)
89+
timings[f"solve_in_parallel_heuristics_threads"] = (perf_counter_ns() - tic) * 1e-6
90+
91+
print("Solving vector of qps serially")
92+
tic = perf_counter_ns()
93+
for qp in qps_vector:
94+
qp.solve()
95+
timings["qp_solve_serial"] = (perf_counter_ns() - tic) * 1e-6
96+
97+
print("Solving batch of qps using solve_in_parallel with various thread configs")
98+
for j in range(1, num_threads, 2):
99+
tic = perf_counter_ns()
100+
proxsuite.proxqp.dense.solve_in_parallel(qps=qps_batch, num_threads=j)
101+
timings[f"solve_in_parallel_{j}_threads"] = (perf_counter_ns() - tic) * 1e-6
102+
103+
def solve_problem_with_dense_backend(
104+
problem,
105+
):
106+
H, g, A, b, C, l, u = problem
107+
return proxsuite.proxqp.dense.solve_no_gil(
108+
H,
109+
g,
110+
A,
111+
b,
112+
C,
113+
l,
114+
u,
115+
initial_guess=proxsuite.proxqp.InitialGuess.NO_INITIAL_GUESS,
116+
eps_abs=1e-9,
117+
)
118+
119+
# add final timings for the solve_in_parallel function considering setup time for batch of qps
120+
for k, v in list(timings.items()):
121+
if "solve_in_parallel" in k:
122+
k_init = k + "_and_setup_batch_of_qps"
123+
timings[k_init] = timings["setup_batch_of_qps"] + v
124+
125+
print("Solving each problem serially with solve function.")
126+
# Note: here we just pass the problem data to the solve function. This does not require running the init method separately.
127+
tic = perf_counter_ns()
128+
for problem in problems:
129+
solve_problem_with_dense_backend(problem)
130+
timings["solve_fun_serial"] = (perf_counter_ns() - tic) * 1e-6
131+
132+
print(
133+
"Solving each problem in parallel (with a ThreadPoolExecutor) with solve function."
134+
)
135+
tic = perf_counter_ns()
136+
with ThreadPoolExecutor(max_workers=num_threads) as executor:
137+
results = list(executor.map(solve_problem_with_dense_backend, problems))
138+
timings["solve_fun_parallel"] = (perf_counter_ns() - tic) * 1e-6
65139

66-
for k, v in timings.items():
67-
print(f"{k}: {v}ms")
140+
print("\nTimings:")
141+
for k, v in timings.items():
142+
print(f"{k}: {v:.3f}ms")

0 commit comments

Comments
 (0)