Skip to content

Commit 791e4f6

Browse files
committed
Merge branch 'main' into api/virtualfile-from-stringio
2 parents 0489783 + d7560fa commit 791e4f6

File tree

8 files changed

+216
-39
lines changed

8 files changed

+216
-39
lines changed

doc/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The project was started in 2017 by [Leonardo Uieda](http://www.leouieda.com) and
2222
[Paul Wessel](http://www.soest.hawaii.edu/wessel) (the co-creator and main developer of
2323
GMT) at the University of Hawaiʻi at Mānoa. The development of PyGMT has been supported
2424
by NSF grants [OCE-1558403](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1558403)
25-
and [EAR-1948603](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1948602).
25+
and [EAR-1948602](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1948602).
2626

2727
We welcome any feedback and ideas! Let us know by submitting
2828
[issues on GitHub](https://github.com/GenericMappingTools/pygmt/issues) or by posting on

doc/techref/encodings.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
# Supported Encodings and Non-ASCII Characters
22

3-
GMT supports a number of encodings and each encoding contains a set of ASCII and non-ASCII
4-
characters. Below are some of the most common encodings and characters that are supported.
3+
GMT supports a number of encodings and each encoding contains a set of ASCII and
4+
non-ASCII characters. In PyGMT, you can use any of these ASCII and non-ASCII characters
5+
in arguments and text strings. When using non-ASCII characters in PyGMT, the easiest way
6+
is to copy and paste the character from the encoding tables below.
57

6-
In PyGMT, you can use any of these ASCII and non-ASCII characters in arguments and text
7-
strings. When using non-ASCII characters in PyGMT, the easiest way is to copy and paste
8-
the character from the tables below.
9-
10-
**Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate that
11-
the character is not defined in the encoding.
8+
**Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate
9+
that the character is not defined in the encoding.
1210

1311
## Adobe ISOLatin1+ Encoding
1412

@@ -106,3 +104,27 @@ the Unicode character set.
106104
| **\35x** | ➨ | ➩ | ➪ | ➫ | ➬ | ➭ | ➮ | ➯ |
107105
| **\36x** | � | ➱ | ➲ | ➳ | ➴ | ➵ | ➶ | ➷ |
108106
| **\37x** | ➸ | ➹ | ➺ | ➻ | ➼ | ➽ | ➾ | � |
107+
108+
## ISO/IEC 8859
109+
110+
GMT also supports the ISO/IEC 8859 standard for 8-bit character encodings. Refer to
111+
<https://en.wikipedia.org/wiki/ISO/IEC_8859> for descriptions of the different parts of
112+
the standard.
113+
114+
For a list of the characters in each part of the standard, refer to the following links:
115+
116+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>
117+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-2>
118+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-3>
119+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-4>
120+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-5>
121+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-6>
122+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-7>
123+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-8>
124+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-9>
125+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-10>
126+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-11>
127+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-13>
128+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-14>
129+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-15>
130+
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-16>

pygmt/encodings.py

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
"""
2-
Adobe character encodings supported by GMT.
2+
Character encodings supported by GMT.
33
4-
Currently, only Adobe Symbol, Adobe ZapfDingbats, and Adobe ISOLatin1+ encodings are
5-
supported.
4+
Currently, Adobe Symbol, Adobe ZapfDingbats, Adobe ISOLatin1+ and ISO-8859-x (x can be
5+
1-11, 13-16) encodings are supported. Adobe Standard encoding is not supported.
66
7-
The corresponding Unicode characters in each Adobe character encoding are generated
8-
from the mapping table and conversion script in the GMT-octal-codes
9-
(https://github.com/seisman/GMT-octal-codes) repository. Refer to that repository for
10-
details.
7+
The corresponding Unicode characters in each Adobe character encoding are generated from
8+
the mapping tables and conversion scripts in the
9+
`GMT-octal-codes repository <https://github.com/seisman/GMT-octal-codes>`__. Refer to
10+
that repository for details.
1111
1212
Some code points are undefined and are assigned with the replacement character
1313
(``\ufffd``).
@@ -16,14 +16,17 @@
1616
----------
1717
1818
- GMT-octal-codes: https://github.com/seisman/GMT-octal-codes
19-
- GMT official documentation: https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html
19+
- GMT documentation: https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html
2020
- Adobe Postscript Language Reference: https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf
21-
- ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
21+
- Adobe ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
2222
- Adobe Symbol: https://en.wikipedia.org/wiki/Symbol_(typeface)
23-
- Zapf Dingbats: https://en.wikipedia.org/wiki/Zapf_Dingbats
23+
- Adobe ZapfDingbats: https://en.wikipedia.org/wiki/Zapf_Dingbats
2424
- Adobe Glyph List: https://github.com/adobe-type-tools/agl-aglfn
25+
- ISO-8859: https://en.wikipedia.org/wiki/ISO/IEC_8859
2526
"""
2627

28+
import codecs
29+
2730
# Dictionary of character mappings for different encodings.
2831
charset: dict = {}
2932

@@ -129,3 +132,12 @@
129132
strict=False,
130133
)
131134
)
135+
136+
# ISO-8859-x charsets and x can be 1-11, 13-16.
137+
for i in range(1, 17):
138+
if i == 12: # ISO-8859-12 was abandoned.
139+
continue
140+
charset[f"ISO-8859-{i}"] = {
141+
code: codecs.decode(bytes([code]), f"iso8859_{i}", errors="replace")
142+
for code in [*range(0o040, 0o200), *range(0o240, 0o400)]
143+
}

pygmt/helpers/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
unique_name,
1616
)
1717
from pygmt.helpers.utils import (
18+
_check_encoding,
1819
_validate_data_input,
1920
args_in_kwargs,
2021
build_arg_list,

pygmt/helpers/utils.py

Lines changed: 123 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,78 @@ def _validate_data_input(
116116
raise GMTInvalidInput("data must provide x, y, and z columns.")
117117

118118

119+
def _check_encoding(
120+
argstr: str,
121+
) -> Literal[
122+
"ascii",
123+
"ISOLatin1+",
124+
"ISO-8859-1",
125+
"ISO-8859-2",
126+
"ISO-8859-3",
127+
"ISO-8859-4",
128+
"ISO-8859-5",
129+
"ISO-8859-6",
130+
"ISO-8859-7",
131+
"ISO-8859-8",
132+
"ISO-8859-9",
133+
"ISO-8859-10",
134+
"ISO-8859-11",
135+
"ISO-8859-13",
136+
"ISO-8859-14",
137+
"ISO-8859-15",
138+
"ISO-8859-16",
139+
]:
140+
"""
141+
Check the charset encoding of a string.
142+
143+
All characters in the string must be in the same charset encoding, otherwise the
144+
default ``ISOLatin1+`` encoding is returned. Characters in the Adobe Symbol and
145+
ZapfDingbats encodings are also checked because they're independent on the choice of
146+
encodings.
147+
148+
Parameters
149+
----------
150+
argstr
151+
The string to be checked.
152+
153+
Returns
154+
-------
155+
encoding
156+
The encoding of the string.
157+
158+
Examples
159+
--------
160+
>>> _check_encoding("123ABC+-?!") # ASCII characters only
161+
'ascii'
162+
>>> _check_encoding("12AB±β①②") # Characters in ISOLatin1+
163+
'ISOLatin1+'
164+
>>> _check_encoding("12ABāáâãäåβ①②") # Characters in ISO-8859-4
165+
'ISO-8859-4'
166+
>>> _check_encoding("12ABŒā") # Mix characters in ISOLatin1+ (Œ) and ISO-8859-4 (ā)
167+
'ISOLatin1+'
168+
>>> _check_encoding("123AB中文") # Characters not in any charset encoding
169+
'ISOLatin1+'
170+
"""
171+
# Return "ascii" if the string only contains ASCII characters.
172+
if all(32 <= ord(c) <= 126 for c in argstr):
173+
return "ascii"
174+
# Loop through all supported encodings and check if all characters in the string
175+
# are in the charset of the encoding. If all characters are in the charset, return
176+
# the encoding. The ISOLatin1+ encoding is checked first because it is the default
177+
# and most common encoding.
178+
adobe_chars = set(charset["Symbol"].values()) | set(
179+
charset["ZapfDingbats"].values()
180+
)
181+
for encoding in ["ISOLatin1+"] + [f"ISO-8859-{i}" for i in range(1, 17)]:
182+
if encoding == "ISO-8859-12": # ISO-8859-12 was abandoned. Skip it.
183+
continue
184+
if all(c in (set(charset[encoding].values()) | adobe_chars) for c in argstr):
185+
return encoding # type: ignore[return-value]
186+
# Return the "ISOLatin1+" encoding if the string contains characters from multiple
187+
# charset encodings or contains characters that are not in any charset encoding.
188+
return "ISOLatin1+"
189+
190+
119191
def data_kind(
120192
data: Any = None, required: bool = True
121193
) -> Literal["arg", "file", "geojson", "grid", "image", "matrix", "vectors"]:
@@ -199,17 +271,41 @@ def data_kind(
199271
return kind
200272

201273

202-
def non_ascii_to_octal(argstr: str) -> str:
274+
def non_ascii_to_octal(
275+
argstr: str,
276+
encoding: Literal[
277+
"ascii",
278+
"ISOLatin1+",
279+
"ISO-8859-1",
280+
"ISO-8859-2",
281+
"ISO-8859-3",
282+
"ISO-8859-4",
283+
"ISO-8859-5",
284+
"ISO-8859-6",
285+
"ISO-8859-7",
286+
"ISO-8859-8",
287+
"ISO-8859-9",
288+
"ISO-8859-10",
289+
"ISO-8859-11",
290+
"ISO-8859-13",
291+
"ISO-8859-14",
292+
"ISO-8859-15",
293+
"ISO-8859-16",
294+
] = "ISOLatin1+",
295+
) -> str:
203296
r"""
204297
Translate non-ASCII characters to their corresponding octal codes.
205298
206-
Currently, only characters in the ISOLatin1+ charset and Symbol/ZapfDingbats fonts
207-
are supported.
299+
Currently, only non-ASCII characters in the Adobe ISOLatin1+, Adobe Symbol, Adobe
300+
ZapfDingbats, and ISO-8850-x (x can be in 1-11, 13-17) encodings are supported.
301+
The Adobe Standard encoding is not supported yet.
208302
209303
Parameters
210304
----------
211305
argstr
212306
The string to be translated.
307+
encoding
308+
The encoding of characters in the string.
213309
214310
Returns
215311
-------
@@ -226,9 +322,11 @@ def non_ascii_to_octal(argstr: str) -> str:
226322
'@%34%\\041@%%@%34%\\176@%%@%34%\\241@%%@%34%\\376@%%'
227323
>>> non_ascii_to_octal("ABC ±120° DEF α ♥")
228324
'ABC \\261120\\260 DEF @~\\141@~ @%34%\\252@%%'
325+
>>> non_ascii_to_octal("12ABāáâãäåβ①②", encoding="ISO-8859-4")
326+
'12AB\\340\\341\\342\\343\\344\\345@~\\142@~@%34%\\254@%%@%34%\\255@%%'
229327
""" # noqa: RUF002
230-
# Return the string if it only contains printable ASCII characters from 32 to 126.
231-
if all(32 <= ord(c) <= 126 for c in argstr):
328+
# Return the input string if it only contains ASCII characters.
329+
if encoding == "ascii" or all(32 <= ord(c) <= 126 for c in argstr):
232330
return argstr
233331

234332
# Dictionary mapping non-ASCII characters to octal codes
@@ -239,15 +337,15 @@ def non_ascii_to_octal(argstr: str) -> str:
239337
mapping.update(
240338
{c: f"@%34%\\{i:03o}@%%" for i, c in charset["ZapfDingbats"].items()}
241339
)
242-
# Adobe ISOLatin1+ charset. Put at the end.
243-
mapping.update({c: f"\\{i:03o}" for i, c in charset["ISOLatin1+"].items()})
340+
# ISOLatin1+ or ISO-8859-x charset.
341+
mapping.update({c: f"\\{i:03o}" for i, c in charset[encoding].items()})
244342

245343
# Remove any printable characters
246344
mapping = {k: v for k, v in mapping.items() if k not in string.printable}
247345
return argstr.translate(str.maketrans(mapping))
248346

249347

250-
def build_arg_list(
348+
def build_arg_list( # noqa: PLR0912
251349
kwdict: dict[str, Any],
252350
confdict: dict[str, str] | None = None,
253351
infile: str | pathlib.PurePath | Sequence[str | pathlib.PurePath] | None = None,
@@ -317,6 +415,10 @@ def build_arg_list(
317415
... )
318416
... )
319417
['f1.txt', 'f2.txt', '-A0', '-B', '--FORMAT_DATE_MAP=o dd', '->out.txt']
418+
>>> build_arg_list(dict(B="12ABāβ①②"))
419+
['-B12AB\\340@~\\142@~@%34%\\254@%%@%34%\\255@%%', '--PS_CHAR_ENCODING=ISO-8859-4']
420+
>>> build_arg_list(dict(B="12ABāβ①②"), confdict=dict(PS_CHAR_ENCODING="ISO-8859-5"))
421+
['-B12AB\\340@~\\142@~@%34%\\254@%%@%34%\\255@%%', '--PS_CHAR_ENCODING=ISO-8859-5']
320422
>>> print(build_arg_list(dict(R="1/2/3/4", J="X4i", watre=True)))
321423
Traceback (most recent call last):
322424
...
@@ -331,11 +433,22 @@ def build_arg_list(
331433
elif value is True:
332434
gmt_args.append(f"-{key}")
333435
elif is_nonstr_iter(value):
334-
gmt_args.extend(non_ascii_to_octal(f"-{key}{_value}") for _value in value)
436+
gmt_args.extend(f"-{key}{_value}" for _value in value)
335437
else:
336-
gmt_args.append(non_ascii_to_octal(f"-{key}{value}"))
438+
gmt_args.append(f"-{key}{value}")
439+
440+
# Convert non-ASCII characters (if any) in the arguments to octal codes
441+
encoding = _check_encoding("".join(gmt_args))
442+
if encoding != "ascii":
443+
gmt_args = [non_ascii_to_octal(arg, encoding=encoding) for arg in gmt_args]
337444
gmt_args = sorted(gmt_args)
338445

446+
# Set --PS_CHAR_ENCODING=encoding if necessary
447+
if encoding not in {"ascii", "ISOLatin1+"} and not (
448+
confdict and "PS_CHAR_ENCODING" in confdict
449+
):
450+
gmt_args.append(f"--PS_CHAR_ENCODING={encoding}")
451+
339452
if confdict:
340453
gmt_args.extend(f"--{key}={value}" for key, value in confdict.items())
341454

pygmt/src/text.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from pygmt.clib import Session
77
from pygmt.exceptions import GMTInvalidInput
88
from pygmt.helpers import (
9+
_check_encoding,
910
build_arg_list,
1011
data_kind,
1112
fmt_docstring,
@@ -59,13 +60,12 @@ def text_( # noqa: PLR0912
5960
- ``x``/``y``, and ``text``
6061
- ``position`` and ``text``
6162
62-
The text strings passed via the ``text`` parameter can contain ASCII
63-
characters and non-ASCII characters defined in the ISOLatin1+ encoding
64-
(i.e., IEC_8859-1), and the Symbol and ZapfDingbats character sets.
65-
See :gmt-docs:`reference/octal-codes.html` for the full list of supported
66-
non-ASCII characters.
63+
The text strings passed via the ``text`` parameter can contain ASCII characters and
64+
non-ASCII characters defined in the Adobe ISOLatin1+, Adobe Symbol, Adobe
65+
ZapfDingbats and ISO-8859-x (x can be 1-11, 13-16) encodings. Refer to
66+
:doc:`techref/encodings` for the full list of supported non-ASCII characters.
6767
68-
Full option list at :gmt-docs:`text.html`
68+
Full option list at :gmt-docs:`text.html`.
6969
7070
{aliases}
7171
@@ -226,13 +226,24 @@ def text_( # noqa: PLR0912
226226
kwargs["t"] = ""
227227

228228
# Append text at last column. Text must be passed in as str type.
229+
confdict = {}
229230
if kind == "vectors":
230-
extra_arrays.append(
231-
np.vectorize(non_ascii_to_octal)(np.atleast_1d(text).astype(str))
232-
)
231+
text = np.atleast_1d(text).astype(str)
232+
encoding = _check_encoding("".join(text))
233+
if encoding != "ascii":
234+
text = np.vectorize(non_ascii_to_octal, excluded="encoding")(
235+
text, encoding=encoding
236+
)
237+
extra_arrays.append(text)
238+
239+
if encoding not in {"ascii", "ISOLatin1+"}:
240+
confdict = {"PS_CHAR_ENCODING": encoding}
233241

234242
with Session() as lib:
235243
with lib.virtualfile_in(
236244
check_kind="vector", data=textfiles, x=x, y=y, extra_arrays=extra_arrays
237245
) as vintbl:
238-
lib.call_module(module="text", args=build_arg_list(kwargs, infile=vintbl))
246+
lib.call_module(
247+
module="text",
248+
args=build_arg_list(kwargs, infile=vintbl, confdict=confdict),
249+
)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
outs:
2+
- md5: a0f35a1d58c95e6589c7397e7660e946
3+
size: 17089
4+
hash: md5
5+
path: test_text_nonascii_iso8859.png

pygmt/tests/test_text.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,3 +434,16 @@ def test_text_quotation_marks():
434434
fig.basemap(projection="X4c/2c", region=[0, 4, 0, 2], frame=0)
435435
fig.text(x=2, y=1, text='\\234 ‘ ’ " “ ”', font="20p") # noqa: RUF001
436436
return fig
437+
438+
439+
@pytest.mark.mpl_image_compare
440+
def test_text_nonascii_iso8859():
441+
"""
442+
Test passing text strings with non-ascii characters in ISO-8859-4 encoding.
443+
"""
444+
fig = Figure()
445+
fig.basemap(region=[0, 10, 0, 10], projection="X10c", frame=["WSEN+tAāáâãäåB"])
446+
fig.text(position="TL", text="position-text:1ÉĘËĖ2")
447+
fig.text(x=1, y=1, text="xytext:1éęëė2")
448+
fig.text(x=[5, 5], y=[3, 5], text=["xytext1:ųúûüũūαζ∆❡", "xytext2:íîī∑π∇✉"])
449+
return fig

0 commit comments

Comments
 (0)