Skip to content

Commit 63603b4

Browse files
committed
DOC: add reference docs for NpyString C API
[skip azp] [skip cirrus] [skip actions]
1 parent 2a9b913 commit 63603b4

File tree

2 files changed

+269
-0
lines changed

2 files changed

+269
-0
lines changed

doc/source/reference/c-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ code.
4747
iterator
4848
ufunc
4949
generalized-ufuncs
50+
strings
5051
coremath
5152
datetimes
5253
deprecations
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
NpyString API
2+
=============
3+
4+
.. sectionauthor:: Nathan Goldbaum
5+
6+
.. versionadded:: 2.0
7+
8+
This API allows access to the UTF-8 string data stored in NumPy StringDType
9+
arrays. See `NEP-55 <https://numpy.org/neps/nep-0055-string_dtype.html>`_ for
10+
more in-depth details into the design of StringDType.
11+
12+
Examples
13+
--------
14+
15+
Loading a String
16+
^^^^^^^^^^^^^^^^
17+
18+
Say we are writing a ufunc implementation for ``StringDType``. If we are given
19+
``const char *buf`` pointer to the beginning of a ``StringDType`` array entry, and a
20+
``PyArray_Descr *`` pointer to the array descriptor, one can
21+
access the underlying string data like so:
22+
23+
.. code-block:: C
24+
25+
npy_string_allocator *allocator = NpyString_acquire_allocator(
26+
(PyArray_StringDTypeObject *)descr);
27+
28+
npy_static_string sdata = {0, NULL};
29+
npy_packed_static_string *packed_string = (npy_packed_static_string *)buf;
30+
int is_null = 0;
31+
32+
is_null = NpyString_load(allocator, packed_string, &sdata);
33+
34+
if (is_null == -1) {
35+
// failed to load string, set error
36+
return -1;
37+
}
38+
else if (is_null) {
39+
// handle missing string
40+
// sdata->buf is NULL
41+
// sdata->size is 0
42+
}
43+
else {
44+
// sdata->buf is a pointer to the beginning of a string
45+
// sdata->size is the size of the string
46+
}
47+
NpyString_release_allocator(allocator);
48+
49+
Packing a String
50+
^^^^^^^^^^^^^^^^
51+
52+
This example shows how to pack a new string entry into an array:
53+
54+
.. code-block:: C
55+
56+
char *str = "Hello world";
57+
size_t size = 11;
58+
npy_packed_static_string *packed_string = (npy_packed_static_string *)buf;
59+
60+
npy_string_allocator *allocator = NpyString_acquire_allocator(
61+
(PyArray_StringDTypeObject *)descr);
62+
63+
// copy contents of str into packed_string
64+
if (NpyString_pack(allocator, packed_string, str, size) == -1) {
65+
// string packing failed, set error
66+
return -1;
67+
}
68+
69+
// packed_string contains a copy of "Hello world"
70+
71+
NpyString_release_allocator(allocator);
72+
73+
Types
74+
-----
75+
76+
.. c:type:: npy_packed_static_string
77+
78+
An opaque struct that represents "packed" encoded strings. Individual
79+
entries in array buffers are instances of this struct. Direct access
80+
to the data in the struct is undefined and future version of the library may
81+
change the packed representation of strings.
82+
83+
.. c:type:: npy_static_string
84+
85+
An unpacked string allowing access to the UTF-8 string data.
86+
87+
.. code-block:: c
88+
89+
typedef struct npy_unpacked_static_string {
90+
size_t size;
91+
const char *buf;
92+
} npy_static_string;
93+
94+
.. c:member:: size_t size
95+
96+
The size of the string, in bytes.
97+
98+
.. c:member:: const char *buf
99+
100+
The string buffer. Holds UTF-8-encoded bytes. Does not currently end in
101+
a null string but we may decide to add null termination in the
102+
future, so do not rely on the presence or absence of null-termination.
103+
104+
Note that this is a ``const`` buffer. If you want to alter an
105+
entry in an array, you should create a new string and pack it
106+
into the array entry.
107+
108+
.. c:type:: npy_string_allocator
109+
110+
An opaque pointer to an object that handles string allocation.
111+
Before using the allocator, you must acquire the allocator lock and release
112+
the lock after you are done interacting with strings managed by the
113+
allocator.
114+
115+
.. c:type:: PyArray_StringDTypeObject
116+
117+
The C struct backing instances of StringDType in Python. Attributes store
118+
the settings the object was created with, an instance of
119+
``npy_string_allocator`` that manages string allocations for arrays
120+
associated with the DType instance, and several attributes caching
121+
information about the missing string object that is commonly needed in cast
122+
and ufunc loop implementations.
123+
124+
.. code-block:: c
125+
126+
typedef struct {
127+
PyArray_Descr base;
128+
PyObject *na_object;
129+
char coerce;
130+
char has_nan_na;
131+
char has_string_na;
132+
char array_owned;
133+
npy_static_string default_string;
134+
npy_static_string na_name;
135+
npy_string_allocator *allocator;
136+
} PyArray_StringDTypeObject;
137+
138+
.. c:member:: PyArray_Descr base
139+
140+
The base object. Use this member to access fields common to all
141+
descriptor objects.
142+
143+
.. c:member:: PyObject *na_object
144+
145+
A reference to the object representing the null value. If there is no
146+
null value (the default) this will be NULL.
147+
148+
.. c:member:: char coerce
149+
150+
1 if string coercion is enabled, 0 otherwise.
151+
152+
.. c:member:: char has_nan_na
153+
154+
1 if the missing string object (if any) is NaN-like, 0 otherwise.
155+
156+
.. c:member:: char has_string_na
157+
158+
1 if the missing string object (if any) is a string, 0 otherwise.
159+
160+
.. c:member:: char array_owned
161+
162+
1 if an array owns the StringDType instance, 0 otherwise.
163+
164+
.. c:member:: npy_static_string default_string
165+
166+
The default string to use in operations. If the missing string object
167+
is a string, this will contain the string data for the missing string.
168+
169+
.. c:member:: npy_static_string na_name
170+
171+
The name of the missing string object, if any. An empty string
172+
otherwise.
173+
174+
.. c:member:: npy_string_allocator allocator
175+
176+
The allocator instance associated with the array that owns this
177+
descriptor instance. The allocator should only be directly accessed
178+
after acquiring the allocator_lock and the lock should be released
179+
immediately after the allocator is no longer needed
180+
181+
182+
Functions
183+
---------
184+
185+
.. c:function:: npy_string_allocator *NpyString_acquire_allocator( \
186+
const PyArray_StringDTypeObject *descr)
187+
188+
Acquire the mutex locking the allocator attached to
189+
``descr``. ``NpyString_release_allocator`` must be called on the allocator
190+
returned by this function exactly once. Note that functions requiring the
191+
GIL should not be called while the allocator mutex is held, as doing so may
192+
cause deadlocks.
193+
194+
.. c:function:: void NpyString_acquire_allocators( \
195+
size_t n_descriptors, PyArray_Descr *const descrs[], \
196+
npy_string_allocator *allocators[])
197+
198+
Simultaneously acquire the mutexes locking the allocators attached to
199+
multiple descriptors. Writes a pointer to the associated allocator in the
200+
allocators array for each StringDType descriptor in the array. If any of
201+
the descriptors are not StringDType instances, write NULL to the allocators
202+
array for that entry.
203+
204+
``n_descriptors`` is the number of descriptors in the descrs array that
205+
should be examined. Any descriptor after ``n_descriptors`` elements is
206+
ignored. A buffer overflow will happen if the ``descrs`` array does not
207+
contain n_descriptors elements.
208+
209+
If pointers to the same descriptor are passed multiple times, only acquires
210+
the allocator mutex once but sets identical allocator pointers appropriately.
211+
The allocator mutexes must be released after this function returns, see
212+
``NpyString_release_allocators``.
213+
214+
Note that functions requiring the GIL should not be called while the
215+
allocator mutex is held, as doing so may cause deadlocks.
216+
217+
.. c:function:: void NpyString_release_allocator( \
218+
npy_string_allocator *allocator)
219+
220+
Release the mutex locking an allocator. This must be called exactly once
221+
after acquiring the allocator mutex and all operations requiring the
222+
allocator are done.
223+
224+
If you need to release multiple allocators, see
225+
NpyString_release_allocators, which can correctly handle releasing the
226+
allocator once when given several references to the same allocator.
227+
228+
.. c:function:: void NpyString_release_allocators( \
229+
size_t length, npy_string_allocator *allocators[])
230+
231+
Release the mutexes locking N allocators. ``length`` is the length of the
232+
allocators array. NULL entries are ignored.
233+
234+
If pointers to the same allocator are passed multiple times, only releases
235+
the allocator mutex once.
236+
237+
.. c:function:: int NpyString_load(npy_string_allocator *allocator, \
238+
const npy_packed_static_string *packed_string, \
239+
npy_static_string *unpacked_string)
240+
241+
Extract the packed contents of ``packed_string`` into ``unpacked_string``.
242+
243+
The ``unpacked_string`` is a read-only view onto the ``packed_string`` data
244+
and should not be used to modify the string data. If ``packed_string`` is
245+
the null string, sets ``unpacked_string.buf`` to the NULL
246+
pointer. Returns -1 if unpacking the string fails, returns 1 if
247+
``packed_string`` is the null string, and returns 0 otherwise.
248+
249+
A useful pattern is to define a stack-allocated npy_static_string instance
250+
initialized to ``{0, NULL}`` and pass a pointer to the stack-allocated
251+
unpacked string to this function. This function can be used to
252+
simultaneously unpack a string and determine if it is a null string.
253+
254+
.. c:function:: int NpyString_pack_null( \
255+
npy_string_allocator *allocator, \
256+
npy_packed_static_string *packed_string)
257+
258+
Pack the null string into ``packed_string``. Returns 0 on success and -1 on
259+
failure.
260+
261+
.. c:function:: int NpyString_pack( \
262+
npy_string_allocator *allocator, \
263+
npy_packed_static_string *packed_string, \
264+
const char *buf, \
265+
size_t size)
266+
267+
Copy and pack the first ``size`` entries of the buffer pointed to by ``buf``
268+
into the ``packed_string``. Returns 0 on success and -1 on failure.

0 commit comments

Comments
 (0)