DOC: clarify note about optimized indexing methods #63524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

joddeepesh-cloud wants to merge 2 commits into pandas-dev:main from joddeepesh-cloud:docs-indexing-clarifying

doc/source/user_guide/indexing.rst

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -20,15 +20,17 @@ this area.
  
    .. note::

       The Python and NumPy indexing operators ``[]`` and attribute operator ``.``

       provide quick and easy access to pandas data structures across a wide range

       of use cases. This makes interactive work intuitive, as there's little new

       to learn if you already know how to deal with Python dictionaries and NumPy

       The Python and NumPy indexing operators ``[]`` and the attribute operator ``.``

       provide quick and easy access to pandas data structures across a wide range of

       use cases. This makes interactive work intuitive, as there's little new to

       learn if you already know how to deal with Python dictionaries and NumPy

       arrays. However, since the type of the data to be accessed isn't known in

       advance, directly using standard operators has some optimization limits. For

       production code, we recommended that you take advantage of the optimized

       pandas data access methods exposed in this chapter.

       advance, directly using these standard operators has some optimization limits.

       For performance-critical or production code, we recommend using the optimized

       pandas data access methods (such as ``.loc`` and ``.iloc``) described in this

       chapter.

    See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and more advanced indexing documentation.

    See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.

doc/source/user_guide/text.rst

-Original file line number
+Diff line change
@@ Expand Up @@
     .. _text.string_methods:
+    String storage: pyarrow vs python
+    ---------------------------------
+    Pandas supports different storage backends for string data.
+    Depending on the configuration and installed dependencies,
+    string data may be stored using either a Python object-based
+    implementation or a pyarrow-backed implementation.
+    In general, the pyarrow-backed string storage is recommended
+    for most users, as it provides better performance and a more
+    compact memory representation.
+    **pyarrow-backed string storage**
+    - Pros:
+      - More compact memory footprint
+      - Faster vectorized string operations
+    - Cons:
+      - Strings are immutable; modifying values results in new arrays
+      - Some edge-case behavior differences compared to Python strings
+    **Python object string storage**
+    - Pros:
+      - Uses Python string objects (mutable at the array level)
+      - Behavior consistent with standard Python string semantics
+    - Cons:
+      - Higher memory usage
+      - Slower performance due to lack of vectorization
+    While pandas aims to provide identical results regardless of
+    the underlying string storage, some behavior differences may
+    exist in edge cases (for example, certain Unicode operations).
+    These differences are documented where relevant.
     String methods
     ==============
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: clarify note about optimized indexing methods #63524

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

DOC: clarify note about optimized indexing methods #63524

Are you sure you want to change the base?

Uh oh!

DOC: clarify note about optimized indexing methods #63524

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing