Skip to content

feat: release GIL when possible#388

Merged
lukapeschke merged 9 commits intoToucanToco:mainfrom
JonAnCla:release-the-gil
Sep 20, 2025
Merged

feat: release GIL when possible#388
lukapeschke merged 9 commits intoToucanToco:mainfrom
JonAnCla:release-the-gil

Conversation

@JonAnCla
Copy link
Contributor

@JonAnCla JonAnCla commented Sep 9, 2025

Hi there, I was trying to use multiple threads to simultaneously extract many excel files and didn't see the all cores being used, so suspected that the GIL was not being released within fastexcel

I've added a GIL-releasing closure in two places as noted in the title: rust implementation of read_excel and ExcelReader.build_sheet

With these in place, I see all cores being used in a multi-threaded setup :)

I expect more could be done to release the GIL for longer, but I think I've covered the two major places where most time is spent without needing a wider refactor

Also apologies for confusion, commits have been made under the user jc-5s which is also mine (work account)

Thanks!

… when reading a sheet "not eagerly" in ExcelReader.build_sheet
@arnabanimesh
Copy link
Contributor

arnabanimesh commented Sep 9, 2025

How much reduction in read times are you getting?

@JonAnCla
Copy link
Contributor Author

JonAnCla commented Sep 9, 2025

About 3-4x improvement Vs current 0.15.1, in both cases reading 150 files using concurrent.futures ThreadPoolExecutor

I'm not sure how much a difference it makes but I did not make a production build with profiling for my "release Gil" build, so it could be a bit faster with that added on top

@JonAnCla
Copy link
Contributor Author

JonAnCla commented Sep 9, 2025

Also just to add, without this change I was seeing about 20% total usage across all cores for the duration of reading the 150 files. With this change I see 100% usage across all cores for the duration

@arnabanimesh
Copy link
Contributor

arnabanimesh commented Sep 10, 2025

I guess you could use ProcessPoolExecutor. But this is fine too.

I am a bit worried about the overhead due to context switching when running multiple threads on a single core when using ThreadPoolExecutor in free threaded python builds. But I guess the overhead will be negligible since CPython devs are smart enough and should have considered this scenario.

@JonAnCla
Copy link
Contributor Author

I guess you could use ProcessPoolExecutor. But this is fine too.

Threads have lower startup, memory and communication overhead, but need the GIL to be released to bring true parallelism in python (pre nogil)

I am a bit worried about the overhead due to context switching when running multiple threads on a single core when using ThreadPoolExecutor in free threaded python builds. But I guess the overhead will be negligible since CPython devs are smart enough and should have considered this scenario.

I agree it should be low overhead, there's some slightly relevant text here: https://pyo3.rs/v0.25.1/free-threading.html#many-symbols-exposed-by-pyo3-have-gil-in-the-name. In "with GIL" python the thread holding the GIL can switch every ~5ms so I assume switching has been fairly well tuned and any logic still needed carried over to nogil python

@JonAnCla
Copy link
Contributor Author

In this section of same doc linked above, its noted that its still important to detach threads doing long running-work from the python interpreter (even in nogil python), so that they do not block global synchronisation i.e. "stop the world" events like Garbage Collection https://pyo3.rs/v0.25.1/free-threading.html#global-synchronization-events-can-cause-hangs-and-deadlocks

So adding the the GIL releasing code would actually help tie in with nogil python too :)

@lukapeschke lukapeschke changed the title - release python's GIL in read_excel and ExcelReader.build_sheet feat: release GIL when possible Sep 10, 2025
@lukapeschke
Copy link
Collaborator

@hottwaj Thank you, great idea! I think it should also be added to lazy code paths though. Also, could you please provide your benchmark script ?

@jc-5s
Copy link
Contributor

jc-5s commented Sep 10, 2025

Great thanks, I've added a change that also releases the GIL in the "eager" case for ExcelReader.build_sheet (I had done the "lazy" case in first set of changes). I'm not able to see how to make similar changes to the "load/build_table" methods though - do they ultimately rely on build_sheet?

the test I've been running is quite basic - see below. you can use a list of files or open the same file many times in parallel.

  • reading the files sequentially using openpyxl took about 160s
  • reading them sequentially with fastexcel was about 22s
  • reading them in multiple threads with fastexcel 0.15.1 was about 11s (so there is currently some multi-threading benefit)
  • reading them in multiple threads with the GIL-releasing changes was about 2.7s
import fastexcel
import os
from concurrent.futures import ThreadPoolExecutor

def process_file(filepath):
    reader = fastexcel.read_excel(filepath)
    reader.load_sheet(0)

with ThreadPoolExecutor() as executor:
    results = list(executor.map(process_file, files_list))

@JonAnCla
Copy link
Contributor Author

sorry for confusing things, that was me - keep forgetting to switch accounts

@lukapeschke
Copy link
Collaborator

Thank you for the script and the updates :) I've left a comment regarding worksheet_range_ref.

Regarding the table loading functions, I guess you could wrap the call to load_table here , since this method is working on owned rust types

- re-instate reading range for both eager&not eager code paths for ExcelReader.build_sheet
@JonAnCla
Copy link
Contributor Author

Great have made further changes that hopefully cover those points. LMK if anything else, thanks!

@JonAnCla
Copy link
Contributor Author

Hi there just checking if there's anything else needed from me at the moment? the PR seems to say changes have been requested, but they seem to already been committed @lukapeschke and I'm fine with them, so unless I've missed something I think this is waiting on maintainers? LMK if I'm missing something, thanks!

Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
@lukapeschke
Copy link
Collaborator

@JonAnCla I ran a few benchmarks and released the GIL in a few more places. Most low-hanging fruits regarding gil-related perf should have been adressed, I'll merge once the CI passes. Thanks again for your contribution! 🙂

@lukapeschke lukapeschke merged commit 854404a into ToucanToco:main Sep 20, 2025
23 checks passed
@JonAnCla
Copy link
Contributor Author

Nice, thanks!

@lukapeschke
Copy link
Collaborator

My measurements are available here: #397 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants