For scripting, it would be nice if this tool could fetch text via X, Y, W, H.
It would need to use a different API (not ScreenCapture), because for scripting you also want to avoid any sound and the crosshair for obvious reasons.
My goal is to periodically scan a section on my screen for text.
EDIT: Made a PR on a fork of your project, which already has some more options and leverages the screencapture binary instead of the API: adam-zethraeus#2
This binary gives all the options I need, i.e. defining a rect or disabling the sound playing. I wonder how much performance it costs to take a screenshot every second.