Skip to content

Commit 4988a30

Browse files
authored
Merge branch 'main' into load-from-memview
2 parents 5431882 + c4ba81e commit 4988a30

19 files changed

+709
-90
lines changed

.github/workflows/test_sysinstall.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ on:
44
schedule:
55
- cron: '13 4 * * *'
66
workflow_dispatch:
7+
inputs:
8+
args:
9+
description: 'Extra args for scripts/sysinstall.py.'
710

811
jobs:
912

@@ -31,12 +34,17 @@ jobs:
3134
# # sees `--venv` and defers to a venv, so we currently have to force use of python 3.11.
3235
# python-version: '3.11'
3336

37+
3438
- name: sysinstall_venv
39+
env:
40+
PYMUDF_SCRIPTS_SYSINSTALL_ARGS_POST: ${{inputs.args}}
3541
run:
3642
# Use venv.
3743
python3 scripts/sysinstall.py --mupdf-git '--branch master https://github.com/ArtifexSoftware/mupdf.git'
3844

3945
- name: sysinstall_sudo
46+
env:
47+
PYMUDF_SCRIPTS_SYSINSTALL_ARGS_POST: ${{inputs.args}}
4048
run:
4149
# Do not use a venv, instead install required packages with sudo.
4250
python3 scripts/sysinstall.py --mupdf-git '--branch master https://github.com/ArtifexSoftware/mupdf.git' --pip sudo --root /

changes.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Change Log
2424
* Add Widget Support to `Document.insert_pdf()`.
2525
* Add `bibi` to span dicts.
2626
* Add `synthetic' to char dict.
27+
* Fixed Pyodide builds.
2728

2829

2930
**Changes in version 1.25.2 (2025-01-17)**

docs/about-feature-matrix.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,7 @@
448448
<tr>
449449
<td><cite id="transFM43">PDF Page Labels</cite></td>
450450
<td class="yes"></td>
451-
<td class="no"></td>
451+
<td class="limited">Read-only</td>
452452
<td class="no"></td>
453453
<td class="no"></td>
454454
<td class="no"></td>

docs/document.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1294,9 +1294,10 @@ For details on **embedded files** refer to Appendix 3.
12941294
pair: links; Document.insert_pdf
12951295
pair: annots; Document.insert_pdf
12961296
pair: widgets; Document.insert_pdf
1297+
pair: join_duplicates; Document.insert_pdf
12971298
pair: show_progress; Document.insert_pdf
12981299

1299-
.. method:: insert_pdf(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True, widgets=True, show_progress=0, final=1)
1300+
.. method:: insert_pdf(docsrc, from_page=-1, to_page=-1, start_at=-1, rotate=-1, links=True, annots=True, widgets=True, join_duplicates=False, show_progress=0, final=1)
13001301

13011302
PDF only: Copy the page range **[from_page, to_page]** (including both) of PDF document *docsrc* into the current one. Inserts will start with page number *start_at*. Value -1 indicates default values. All pages thus copied will be rotated as specified. Links, annotations and widgets can be excluded in the target, see below. All page numbers are 0-based.
13021303

@@ -1312,9 +1313,19 @@ For details on **embedded files** refer to Appendix 3.
13121313
:arg int rotate: All copied pages will be rotated by the provided value (degrees, integer multiple of 90).
13131314

13141315
:arg bool links: Choose whether (internal and external) links should be included in the copy. Default is `True`. *Named* links (:data:`LINK_NAMED`) and internal links to outside the copied page range are **always excluded**.
1316+
13151317
:arg bool annots: choose whether annotations should be included in the copy.
1318+
13161319
:arg bool widgets: choose whether annotations should be included in the copy. If `True` and at least one of the source pages contains form fields, the target PDF will be turned into a Form PDF (if not already being one).
1320+
1321+
:arg bool join_duplicates: *(New in version 1.25.5)* Choose how to handle duplicate root field names in the source pages. This parameter is ignored if `widgets=False`.
1322+
1323+
Default is ``False`` which will add unifying strings to the name of those source root fields which have a duplicate in the target. For instance, if "name" already occurs in the target, the source widget's name will be changed to "name [text]" with a suitably chosen string "text".
1324+
1325+
If ``True``, root fields with duplicate names in source and target will be converted to so-called "Kids" of a "Parent" object (which lists all kid widgets in a PDF array). This will effectively turn those kids into instances of the "same" widget: if e.g. one of the kids is changed, then all its instances will automatically inherit this change -- no matter on which page they happen to be displayed.
1326+
13171327
:arg int show_progress: *(new in v1.17.7)* specify an interval size greater zero to see progress messages on `sys.stdout`. After each interval, a message like `Inserted 30 of 47 pages.` will be printed.
1328+
13181329
:arg int final: *(new in v1.18.0)* controls whether the list of already copied objects should be **dropped** after this method, default *True*. Set it to 0 except for the last one of multiple insertions from the same source PDF. This saves target file size and speeds up execution considerably.
13191330

13201331
.. note::

scripts/sysinstall.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ def main():
118118
log(f'{sys.executable=}')
119119
log(f'{platform.python_version()=}')
120120
log(f'{__file__=}')
121+
log(f'{os.environ.get("PYMUDF_SCRIPTS_SYSINSTALL_ARGS_PRE")=}')
122+
log(f'{os.environ.get("PYMUDF_SCRIPTS_SYSINSTALL_ARGS_POST")=}')
121123
log(f'{sys.argv=}')
122124
log(f'{sysconfig.get_path("platlib")=}')
123125
run_command(f'python -V', check=0)
@@ -152,7 +154,9 @@ def main():
152154

153155
# Parse command-line.
154156
#
155-
args = iter(sys.argv[1:])
157+
env_args_pre = shlex.split(os.environ.get('PYMUDF_SCRIPTS_SYSINSTALL_ARGS_PRE', ''))
158+
env_args_post = shlex.split(os.environ.get('PYMUDF_SCRIPTS_SYSINSTALL_ARGS_POST', ''))
159+
args = iter(env_args_pre + sys.argv[1:] + env_args_post)
156160
while 1:
157161
try:
158162
arg = next(args)
@@ -240,6 +244,9 @@ def run(command, env_extra=None):
240244
command += f' HAVE_LEPTONICA=yes'
241245
command += f' HAVE_TESSERACT=yes'
242246
command += f' USE_SYSTEM_LIBS=yes'
247+
# We need latest zxingcpp so system version not ok.
248+
command += f' USE_SYSTEM_ZXINGCPP=no'
249+
command += f' barcode=yes'
243250
command += f' VENV_FLAG={"--venv" if pip == "venv" else ""}'
244251
if mupdf_so_mode:
245252
command += f' SO_INSTALL_MODE={mupdf_so_mode}'
@@ -291,7 +298,7 @@ def run(command):
291298
run(f'{sudo}rm -r {p}/site-packages/pymupdf.py || true')
292299
run(f'{sudo}rm -r {p}/site-packages/fitz || true')
293300
run(f'{sudo}rm -r {p}/site-packages/fitz.py || true')
294-
run(f'{sudo}rm -r {p}/site-packages/PyMuPDF-*.dist-info || true')
301+
run(f'{sudo}rm -r {p}/site-packages/pymupdf-*.dist-info || true')
295302
run(f'{sudo}rm -r {root_prefix}/bin/pymupdf || true')
296303
if pip == 'venv':
297304
run(f'{sudo}{venv_name}/bin/python -m installer --destdir {root} --prefix {prefix} {wheel}')

scripts/test.py

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,10 @@
9898
Experimental, for investigating
9999
https://github.com/pymupdf/PyMuPDF/issues/3869. Runs run basic code
100100
inside C++ pybind. Requires `sudo apt install pybind11-dev` or similar.
101+
--pymupdf-pypi <name>
102+
Do not build PyMuPDF, instead install with `pip install <name>`. For
103+
example allows testing of a specific version with `--pymupdf-pypi
104+
pymupdf==1.25.0`.
101105
--system-site-packages 0|1
102106
If 1, use `--system-site-packages` when creating venv. Defaults is 0.
103107
--timeout <seconds>
@@ -123,6 +127,11 @@
123127
`pyodide build`. This runs our setup.py with CC etc set up
124128
to create Pyodide binaries in a wheel called, for example,
125129
`PyMuPDF-1.23.2-cp311-none-emscripten_3_1_32_wasm32.whl`.
130+
131+
It seems that sys.version must match the Python version inside emsdk;
132+
as of 2025-02-14 this is 3.12. Otherwise we get build errors such as:
133+
[wasm-validator error in function 723] unexpected false: all used features should be allowed, on ...
134+
126135
127136
Environment:
128137
PYMUDF_SCRIPTS_TEST_options
@@ -159,6 +168,9 @@ def main(argv):
159168
if len(argv) == 1:
160169
show_help()
161170
return
171+
172+
log(f'{sys.executable=}')
173+
log(f'{sys.version=}')
162174

163175
build_isolation = None
164176
valgrind = False
@@ -180,6 +192,7 @@ def main(argv):
180192
system_site_packages = False
181193
pyodide_build_version = None
182194
packages = False
195+
pymupdf_pypi = None
183196

184197
options = os.environ.get('PYMUDF_SCRIPTS_TEST_options', '')
185198
options = shlex.split(options)
@@ -245,6 +258,8 @@ def main(argv):
245258
valgrind_args = next(args)
246259
elif arg == '--pyodide-build-version':
247260
pyodide_build_version = next(args)
261+
elif arg == '--pymupdf-pypi':
262+
pymupdf_pypi = next(args)
248263
else:
249264
assert 0, f'Unrecognised option: {arg=}.'
250265

@@ -276,14 +291,17 @@ def main(argv):
276291
return
277292

278293
def do_build(wheel=False):
279-
build(
280-
build_type=build_type,
281-
build_isolation=build_isolation,
282-
venv_quick=venv_quick,
283-
build_mupdf=build_mupdf,
284-
build_flavour=build_flavour,
285-
wheel=wheel,
286-
)
294+
if pymupdf_pypi:
295+
run(f'pip install --force-reinstall {pymupdf_pypi}')
296+
else:
297+
build(
298+
build_type=build_type,
299+
build_isolation=build_isolation,
300+
venv_quick=venv_quick,
301+
build_mupdf=build_mupdf,
302+
build_flavour=build_flavour,
303+
wheel=wheel,
304+
)
287305
def do_test():
288306
test(
289307
implementations=implementations,

setup.py

Lines changed: 93 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,11 @@
152152
153153
PYMUPDF_SETUP_PY_LIMITED_API
154154
If not '0', we build for current Python's stable ABI.
155+
156+
However if unset and we are on Python-3.13 or later, we do
157+
not build for the stable ABI because as of 2025-03-04 SWIG
158+
generates incorrect stable ABI code with Python-3.13 - see:
159+
https://github.com/swig/swig/issues/3059
155160
156161
PYMUPDF_SETUP_URL_WHEEL
157162
If set, we use an existing wheel instead of building a new wheel.
@@ -208,6 +213,11 @@ def log( text):
208213
sys.stdout.flush()
209214

210215

216+
def run(command, check=1):
217+
log(f'Running: {command}')
218+
return subprocess.run( command, shell=1, check=check)
219+
220+
211221
if 1:
212222
# For debugging.
213223
log(f'### Starting.')
@@ -218,6 +228,7 @@ def log( text):
218228
log(f'CPU bits: {32 if sys.maxsize == 2**31 - 1 else 64} {sys.maxsize=}')
219229
log(f'__file__: {__file__!r}')
220230
log(f'os.getcwd(): {os.getcwd()!r}')
231+
log(f'getconf ARG_MAX: {pipcl.run("getconf ARG_MAX", capture=1, check=0, verbose=0)!r}')
221232
log(f'sys.argv ({len(sys.argv)}):')
222233
for i, arg in enumerate(sys.argv):
223234
log(f' {i}: {arg!r}')
@@ -236,10 +247,16 @@ def log( text):
236247
# Name of file that identifies that we are in a PyMuPDF sdist.
237248
g_pymupdfb_sdist_marker = 'pymupdfb_sdist'
238249

250+
python_version_tuple = tuple(int(x) for x in platform.python_version_tuple()[:2])
251+
239252
PYMUPDF_SETUP_PY_LIMITED_API = os.environ.get('PYMUPDF_SETUP_PY_LIMITED_API')
240253
assert PYMUPDF_SETUP_PY_LIMITED_API in (None, '', '0', '1'), \
241254
f'Should be "", "0", "1" or undefined: {PYMUPDF_SETUP_PY_LIMITED_API=}.'
242-
g_py_limited_api = (PYMUPDF_SETUP_PY_LIMITED_API != '0')
255+
if PYMUPDF_SETUP_PY_LIMITED_API is None and python_version_tuple >= (3, 13):
256+
log(f'Not defaulting to Python limited api because {platform.python_version_tuple()=}.')
257+
g_py_limited_api = False
258+
else:
259+
g_py_limited_api = (PYMUPDF_SETUP_PY_LIMITED_API != '0')
243260

244261
PYMUPDF_SETUP_URL_WHEEL = os.environ.get('PYMUPDF_SETUP_URL_WHEEL')
245262
log(f'{PYMUPDF_SETUP_URL_WHEEL=}')
@@ -271,11 +288,6 @@ def error_fn(fn, path, excinfo):
271288
assert not os.path.exists( path)
272289

273290

274-
def run(command, check=1):
275-
log(f'Running: {command}')
276-
return subprocess.run( command, shell=1, check=check)
277-
278-
279291
def _git_get_branch( directory):
280292
command = f'cd {directory} && git branch --show-current'
281293
log( f'Running: {command}')
@@ -364,7 +376,7 @@ def tar_extract(path, mode='r:gz', prefix=None, exists='raise'):
364376
return prefix_actual
365377

366378

367-
def get_git_id( directory):
379+
def git_info( directory):
368380
'''
369381
Returns `(sha, comment, diff, branch)`, all items are str or None if not
370382
available.
@@ -390,10 +402,42 @@ def get_git_id( directory):
390402
)
391403
if cp.returncode == 0:
392404
branch = cp.stdout.strip()
393-
log(f'get_git_id(): directory={directory!r} returning branch={branch!r} sha={sha!r} comment={comment!r}')
405+
log(f'git_info(): directory={directory!r} returning branch={branch!r} sha={sha!r} comment={comment!r}')
394406
return sha, comment, diff, branch
395407

396408

409+
def git_patch(directory, patch, hard=False):
410+
'''
411+
Applies string <patch> with `git patch` in <directory>.
412+
413+
If <hard> is true we clean the tree with `git checkout .` and then apply
414+
the patch.
415+
416+
Otherwise we apply patch only if it is not already applied; this might fail
417+
if there are conflicting changes in the tree.
418+
'''
419+
log(f'Applying patch in {directory}:\n{textwrap.indent(patch, " ")}')
420+
if not patch:
421+
return
422+
# Carriage returns break `git apply` so we use `newline='\n'` in open().
423+
path = os.path.abspath(f'{directory}/pymupdf_patch.txt')
424+
with open(path, 'w', newline='\n') as f:
425+
f.write(patch)
426+
log(f'Using patch file: {path}')
427+
if hard:
428+
run(f'cd {directory} && git checkout .')
429+
run(f'cd {directory} && git apply {path}')
430+
log(f'Have applied patch in {directory}.')
431+
else:
432+
e = run( f'cd {directory} && git apply --check --reverse {path}', check=0)
433+
if e == 0:
434+
log(f'Not patching {directory} because already patched.')
435+
else:
436+
run(f'cd {directory} && git apply {path}')
437+
log(f'Have applied patch in {directory}.')
438+
run(f'cd {directory} && git diff')
439+
440+
397441
mupdf_tgz = os.path.abspath( f'{__file__}/../mupdf.tgz')
398442

399443
def get_mupdf_internal(out, location=None, sha=None, local_tgz=None):
@@ -444,7 +488,8 @@ def get_mupdf_internal(out, location=None, sha=None, local_tgz=None):
444488
if e:
445489
# No existing git checkout, so do a fresh clone.
446490
_fs_remove(local_dir)
447-
run(f'git clone --recursive --depth 1 --shallow-submodules {location[4:]} {local_dir}')
491+
gitargs = location[4:]
492+
run(f'git clone --recursive --depth 1 --shallow-submodules {gitargs} {local_dir}')
448493

449494
# Show sha of checkout.
450495
run( f'cd {local_dir} && git show --pretty=oneline|head -n 1', check=False)
@@ -856,6 +901,34 @@ def build_mupdf_unix(
856901

857902
if openbsd or freebsd:
858903
env_add(env, 'CXX', 'c++', ' ')
904+
905+
if darwin and os.environ.get('GITHUB_ACTIONS') == 'true':
906+
if os.environ.get('ImageOS') == 'macos13':
907+
# On Github macos13 we need to use Clang/LLVM (Homebrew) 15.0.7,
908+
# otherwise mupdf:thirdparty/tesseract/src/api/baseapi.cpp fails to
909+
# compile with:
910+
#
911+
# thirdparty/tesseract/src/api/baseapi.cpp:150:25: error: 'recursive_directory_iterator' is unavailable: introduced in macOS 10.15
912+
#
913+
# See:
914+
# https://github.com/actions/runner-images/blob/main/images/macos/macos-13-Readme.md
915+
#
916+
log(f'Using llvm@15 clang and clang++')
917+
cl15 = pipcl.run(f'brew --prefix llvm@15', capture=1)
918+
log(f'{cl15=}')
919+
cl15 = cl15.strip()
920+
pipcl.run(f'ls -lL {cl15}')
921+
pipcl.run(f'ls -lL {cl15}/bin')
922+
cc = f'{cl15}/bin/clang'
923+
cxx = f'{cl15}/bin/clang++'
924+
env['CC'] = cc
925+
env['CXX'] = cxx
926+
927+
# Show compiler versions.
928+
cc = env.get('CC', 'cc')
929+
cxx = env.get('CXX', 'c++')
930+
pipcl.run(f'{cc} --version')
931+
pipcl.run(f'{cxx} --version')
859932

860933
# Add extra flags for MacOS cross-compilation, where ARCHFLAGS can be
861934
# '-arch arm64'.
@@ -865,6 +938,8 @@ def build_mupdf_unix(
865938
env_add(env, 'XCFLAGS', archflags)
866939
env_add(env, 'XLIBS', archflags)
867940

941+
mupdf_version_tuple = get_mupdf_version(mupdf_local)
942+
868943
# We specify a build directory path containing 'pymupdf' so that we
869944
# coexist with non-PyMuPDF builds (because PyMuPDF builds have a
870945
# different config.h).
@@ -877,7 +952,16 @@ def build_mupdf_unix(
877952
# $_PYTHON_HOST_PLATFORM allows cross-compiled cibuildwheel builds
878953
# to coexist, e.g. on github.
879954
#
955+
# Have experimented with looking at getconf_ARG_MAX to decide whether to
956+
# omit `PyMuPDF-` from the build directory, to avoid command-too-long
957+
# errors with mupdf-1.26. But it seems that `getconf ARG_MAX` returns
958+
# a system limit, not the actual limit of the current shell, and there
959+
# doesn't seem to be a way to find the current shell's limit.
960+
#
880961
build_prefix = f'PyMuPDF-'
962+
if mupdf_version_tuple >= (1, 26):
963+
# Avoid link command length problems seen on musllinux.
964+
build_prefix = ''
881965
if pyodide:
882966
build_prefix += 'pyodide-'
883967
else:
@@ -894,7 +978,6 @@ def build_mupdf_unix(
894978
log(f'PYMUPDF_SETUP_MUPDF_TESSERACT=0 so building mupdf without tesseract.')
895979
else:
896980
build_prefix += 'tesseract-'
897-
mupdf_version_tuple = get_mupdf_version(mupdf_local)
898981
if (
899982
linux
900983
and os.environ.get('PYMUPDF_SETUP_MUPDF_BSYMBOLIC', '1') == '1'

0 commit comments

Comments
 (0)