diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 83ba21e013e2e..f25973f14181f 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -20,15 +20,17 @@ this area. .. note:: - The Python and NumPy indexing operators ``[]`` and attribute operator ``.`` - provide quick and easy access to pandas data structures across a wide range - of use cases. This makes interactive work intuitive, as there's little new - to learn if you already know how to deal with Python dictionaries and NumPy + The Python and NumPy indexing operators ``[]`` and the attribute operator ``.`` + provide quick and easy access to pandas data structures across a wide range of + use cases. This makes interactive work intuitive, as there's little new to + learn if you already know how to deal with Python dictionaries and NumPy arrays. However, since the type of the data to be accessed isn't known in - advance, directly using standard operators has some optimization limits. For - production code, we recommended that you take advantage of the optimized - pandas data access methods exposed in this chapter. + advance, directly using these standard operators has some optimization limits. + For performance-critical or production code, we recommend using the optimized + pandas data access methods (such as ``.loc`` and ``.iloc``) described in this + chapter. + See the :ref:`MultiIndex / Advanced Indexing ` for ``MultiIndex`` and more advanced indexing documentation. See the :ref:`cookbook` for some advanced strategies. diff --git a/doc/source/user_guide/text.rst b/doc/source/user_guide/text.rst index 8f404dbf461c8..ebe8201955089 100644 --- a/doc/source/user_guide/text.rst +++ b/doc/source/user_guide/text.rst @@ -91,6 +91,41 @@ See :ref:`text.four_string_variants` section below for details. .. _text.string_methods: +String storage: pyarrow vs python +--------------------------------- + +Pandas supports different storage backends for string data. +Depending on the configuration and installed dependencies, +string data may be stored using either a Python object-based +implementation or a pyarrow-backed implementation. + +In general, the pyarrow-backed string storage is recommended +for most users, as it provides better performance and a more +compact memory representation. + +**pyarrow-backed string storage** + +- Pros: + - More compact memory footprint + - Faster vectorized string operations +- Cons: + - Strings are immutable; modifying values results in new arrays + - Some edge-case behavior differences compared to Python strings + +**Python object string storage** + +- Pros: + - Uses Python string objects (mutable at the array level) + - Behavior consistent with standard Python string semantics +- Cons: + - Higher memory usage + - Slower performance due to lack of vectorization + +While pandas aims to provide identical results regardless of +the underlying string storage, some behavior differences may +exist in edge cases (for example, certain Unicode operations). +These differences are documented where relevant. + String methods ==============