feat(python/sedonadb): add Expr operator overloads#823
feat(python/sedonadb): add Expr operator overloads#823jiayuasu merged 2 commits intoapache:mainapache/sedona-db:mainfrom jiayuasu:feature/expr-operatorsjiayuasu/sedona-db:feature/expr-operatorsCopy head branch name to clipboard
Conversation
Wire arithmetic, comparison, and boolean Python operators to the
underlying DataFusion Expr. Composition with plain Python scalars
auto-coerces the scalar through `_to_expr()`, so `col("x") > 5` works
without an explicit literal wrap.
- Binary: `+`, `-`, `*`, `/`, `==`, `!=`, `<`, `<=`, `>`, `>=`,
`&` (AND), `|` (OR), with reflected variants so `1 - col("x")`
also produces an Expr.
- Unary: `-` (negation, via `negate()`) and `~` (logical NOT).
- `Expr.__hash__` is set to `None` to make the unhashable contract
explicit — `__eq__` returns an Expr rather than a bool, so using
Expr as a dict key or set member would otherwise produce a
surprising error.
Operator dispatch is centralised: every dunder routes through a
single Python `_binary` helper, which in turn calls one Rust
factory `expr_binary(op, lhs, rhs)`. Adding an operator is one
match arm + one dunder.
The class docstring is updated to describe the now-shipped
operator coercion (it previously called out coercion only inside
methods like `isin`).
While here, tightens six inherited foundation tests from substring
to exact `repr() == ...` assertions so the entire test module is
consistent in pinning DataFusion's Display output.
There was a problem hiding this comment.
Pull request overview
This PR extends the Python expression layer (sedonadb.expr.Expr) with operator overloading so users can compose DataFusion expressions using native Python operators (arithmetic, comparisons, boolean composition, unary ops). It builds on the existing Expr foundation by centralizing binary-operator construction through a single Rust factory and adding Python-side coercion of scalars into literal expressions.
Changes:
- Add
Exprdunder methods for arithmetic/comparison/boolean ops (including reflected variants), unary-, and~(NOT), with scalar auto-coercion via_to_expr(). - Add Rust
_libfactoriesexpr_binary(op, lhs, rhs)andexpr_not(expr)and export them from the PyO3 module. - Expand and tighten expression repr tests to pin exact rendering for the new operators and some existing behaviors.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/sedonadb/python/sedonadb/expr/expression.py | Implements operator overloads and centralized _binary() helper; wires to Rust factories; sets __hash__ = None. |
| python/sedonadb/src/expr.rs | Adds expr_binary (operator string → DataFusion Operator) and expr_not factories. |
| python/sedonadb/src/lib.rs | Exposes expr_binary and expr_not in the _lib PyO3 module. |
| python/sedonadb/tests/expr/test_expression.py | Adds operator overload test coverage and pins several existing repr assertions to exact equality. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # | ||
| # `&` / `|` / `~` rather than `and` / `or` / `not` because Python does | ||
| # not allow overloading the keyword forms — they always coerce to bool. | ||
|
|
There was a problem hiding this comment.
Done in 60428b2 — Expr.__bool__ and Expr.__len__ now both raise TypeError with guidance toward &/|/~ and DataFrame.filter(). Six new tests cover bool(), if, and/or, not, and len().
paleolimbot
left a comment
There was a problem hiding this comment.
The Copilot suggestion is good here but other than that this looks great!
Without these guards, `if col("x") > 0: ...`, `col("x") and col("y")`,
and `not col("x").is_null()` would all silently coerce Exprs through
Python's default truthiness and either always take the truth branch
or drop one side of an `and`/`or` short-circuit. This is the same
footgun pandas/polars/pyspark/ibis all guard against — operator
overloads return Expr objects that look like booleans, so users
naturally write `if comparison: ...` and have it silently no-op.
Now both `__bool__` and `__len__` raise `TypeError` with messages
pointing users at `&`/`|`/`~` for boolean composition and at
`DataFrame.filter()` / `count()` for evaluation.
Six new tests exercise direct `bool()`, `if`/`and`/`or`/`not`, and
`len()` paths.
Adds operator overloading on
sedonadb.expr.Expr, building on the foundation that landed in #807.This is the second of four small stacked PRs implementing Phase P1 of #791.
What's new
+,-,*,/,==,!=,<,<=,>,>=,&(AND),|(OR), with reflected variants so1 - col("x")works the same ascol("x") - 1.-(arithmetic negation, vianegate()) and~(logical NOT).col("x") > 5works without an explicit literal wrap; the scalar is routed through the existing_to_expr()path internally.Expr.__hash__ = Noneso use as a dict key or set member fails clearly —__eq__returns anExpr, not abool.Implementation notes
Operator dispatch is centralised: every Python dunder routes through a single
_binary(op, lhs, rhs)helper, which calls into a single Rust factoryexpr_binary(op, lhs, rhs)that maps the string opcode to adatafusion_expr::Operator. Adding a new operator is one Rust match arm plus one Python dunder. Mirrors the pattern in the R bindings (SedonaDBExprFactory::binary).~(logical NOT) goes through a separateexpr_notfactory because DataFusion models it asExpr::Notrather than a binary operator.Tests
__hash__ = Nonebehaviour. All assertions are exactrepr() == ...per the test module's pinning policy.Test plan
tests/expr/test_expression.pypass locally.tests/test_dataframe.py.