Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DOC: User Guide Page on user-defined functions #61195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
May 18, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updated table of definitions and added .pipe discussion under perform…
…ance section
  • Loading branch information
arthurlw committed Apr 20, 2025
commit 467bc938a721f808b007cc74a81d7e42252d8ad6
42 changes: 37 additions & 5 deletions 42 doc/source/user_guide/user_defined_functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,19 +90,19 @@ User-Defined Functions can be applied across various pandas methods:
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| Method | Function Input | Function Output | Description |
+============================+========================+==========================+===========================================================================+
| :meth:`map` | Scalar | Scalar | Maps each element to the element returned by the function element-wise |
| :meth:`map` | Scalar | Scalar | Apply a function to each element |
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`agg` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`transform` | Series/DataFrame | Same shape as input | Transform values while preserving shape |
| :meth:`transform` | Series/DataFrame | Same shape as input | Apply a function while preserving shape; raises error if shape changes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| :meth:`transform` | Series/DataFrame | Same shape as input | Apply a function while preserving shape; raises error if shape changes |
| :meth:`transform` (axis=0) | Column (Series) | Column(Series) | Same as :meth:`apply`, but it raises an exception if the function changes the shape of the data |

apply and transform are almost the same, I'd also have it twice for axis=0 and axis=1 like apply.

+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`filter` | Series/DataFrame | Series/DataFrame | Filter data using a boolean array |
| :meth:`filter` | - | - | Return rows that satisfy a boolean condition |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| :meth:`filter` | - | - | Return rows that satisfy a boolean condition |
| :meth:`filter` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function it's called for each group, and the group is removed from the result if the function returns ``False`` |

+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain UDFs together to apply to Series or Dataframe |
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+

.. note::
Expand Down Expand Up @@ -249,7 +249,7 @@ Documentation can be found at :meth:`~DataFrame.pipe`.
Performance
-----------

While UDFs provide flexibility, their use is currently discouraged as they can introduce
While UDFs provide flexibility, their use is generally discouraged as they can introduce
performance issues, especially when written in pure Python. To improve efficiency,
consider using built-in ``NumPy`` or ``pandas`` functions instead of UDFs
for common operations.
Expand Down Expand Up @@ -302,3 +302,35 @@ especially for computationally heavy tasks.
.. note::
You may also refer to the user guide on `Enhancing performance <https://pandas.pydata.org/pandas-docs/dev/user_guide/enhancingperf.html#numba-jit-compilation>`_
for a more detailed guide to using **Numba**.

Using :meth:`DataFrame.pipe` for Composable Logic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another useful pattern for improving readability and composability—especially when mixing
vectorized logic with UDFs—is to use the :meth:`DataFrame.pipe` method.

The ``.pipe`` method doesn't improve performance directly, but it enables cleaner
method chaining by passing the entire object into a function. This is especially helpful
when chaining custom transformations:

.. code-block:: python

def add_ratio_column(df):
df["ratio"] = 100 * (df["one"] / df["two"])
return df

df = (
df
.query("one > 0")
.pipe(add_ratio_column)
.dropna()
)

This is functionally equivalent to calling ``add_ratio_column(df)``, but keeps your code
clean and composable. The function you pass to ``.pipe`` can use vectorized operations,
row-wise UDFs, or any other logic—``.pipe`` is agnostic.

.. note::
While :meth:`DataFrame.pipe` does not improve performance on its own,
it promotes clean, modular design and allows both vectorized and UDF-based logic
to be composed in method chains.
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.