-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: User Guide Page on user-defined functions #61195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
3f94137
udf user guide introduction
arthurlw bf984ca
added apply method
arthurlw fe67ec8
added agg, transform and filter
arthurlw 4ec5697
added map, pipe and vectorized operations
arthurlw 11392d7
bugfix
arthurlw f322d9e
updated map method
arthurlw b6b7b02
precommit
arthurlw d20bcc7
trim trailing whitespace
arthurlw 72f7b62
toctree
arthurlw 90a2d24
restructured udf user guide
arthurlw 0d02d64
updated documentation links
arthurlw 214f0ac
precommit
arthurlw fffaad0
fix links
arthurlw 561a1f5
change links
arthurlw c6891a0
updated user guide
arthurlw f56ec28
updated udf user guide based on reviews
arthurlw c00d1d2
updated definition section and performance section title
arthurlw 8d41537
updated definition table
arthurlw 467bc93
updated table of definitions and added .pipe discussion under perform…
arthurlw efd5201
precommit
arthurlw af7964b
updated udf user guide based on reviews
arthurlw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
updated table of definitions and added .pipe discussion under perform…
…ance section
- Loading branch information
commit 467bc938a721f808b007cc74a81d7e42252d8ad6
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -90,19 +90,19 @@ User-Defined Functions can be applied across various pandas methods: | |||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| Method | Function Input | Function Output | Description | | ||||
+============================+========================+==========================+===========================================================================+ | ||||
| :meth:`map` | Scalar | Scalar | Maps each element to the element returned by the function element-wise | | ||||
| :meth:`map` | Scalar | Scalar | Apply a function to each element | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`agg` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`transform` | Series/DataFrame | Same shape as input | Transform values while preserving shape | | ||||
| :meth:`transform` | Series/DataFrame | Same shape as input | Apply a function while preserving shape; raises error if shape changes | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`filter` | Series/DataFrame | Series/DataFrame | Filter data using a boolean array | | ||||
| :meth:`filter` | - | - | Return rows that satisfy a boolean condition | | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain UDFs together to apply to Series or Dataframe | | ||||
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe | | ||||
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+ | ||||
|
||||
.. note:: | ||||
|
@@ -249,7 +249,7 @@ Documentation can be found at :meth:`~DataFrame.pipe`. | |||
Performance | ||||
----------- | ||||
|
||||
While UDFs provide flexibility, their use is currently discouraged as they can introduce | ||||
While UDFs provide flexibility, their use is generally discouraged as they can introduce | ||||
performance issues, especially when written in pure Python. To improve efficiency, | ||||
consider using built-in ``NumPy`` or ``pandas`` functions instead of UDFs | ||||
for common operations. | ||||
|
@@ -302,3 +302,35 @@ especially for computationally heavy tasks. | |||
.. note:: | ||||
You may also refer to the user guide on `Enhancing performance <https://pandas.pydata.org/pandas-docs/dev/user_guide/enhancingperf.html#numba-jit-compilation>`_ | ||||
for a more detailed guide to using **Numba**. | ||||
|
||||
Using :meth:`DataFrame.pipe` for Composable Logic | ||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|
||||
Another useful pattern for improving readability and composability—especially when mixing | ||||
vectorized logic with UDFs—is to use the :meth:`DataFrame.pipe` method. | ||||
|
||||
The ``.pipe`` method doesn't improve performance directly, but it enables cleaner | ||||
method chaining by passing the entire object into a function. This is especially helpful | ||||
when chaining custom transformations: | ||||
|
||||
.. code-block:: python | ||||
|
||||
def add_ratio_column(df): | ||||
df["ratio"] = 100 * (df["one"] / df["two"]) | ||||
return df | ||||
|
||||
df = ( | ||||
df | ||||
.query("one > 0") | ||||
.pipe(add_ratio_column) | ||||
.dropna() | ||||
) | ||||
|
||||
This is functionally equivalent to calling ``add_ratio_column(df)``, but keeps your code | ||||
clean and composable. The function you pass to ``.pipe`` can use vectorized operations, | ||||
row-wise UDFs, or any other logic—``.pipe`` is agnostic. | ||||
|
||||
.. note:: | ||||
While :meth:`DataFrame.pipe` does not improve performance on its own, | ||||
it promotes clean, modular design and allows both vectorized and UDF-based logic | ||||
to be composed in method chains. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply
andtransform
are almost the same, I'd also have it twice for axis=0 and axis=1 like apply.