Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(python/sedonadb): add Expr operator overloads#823

Merged
jiayuasu merged 2 commits intoapache:mainapache/sedona-db:mainfrom
jiayuasu:feature/expr-operatorsjiayuasu/sedona-db:feature/expr-operatorsCopy head branch name to clipboard
May 9, 2026
Merged

feat(python/sedonadb): add Expr operator overloads#823
jiayuasu merged 2 commits intoapache:mainapache/sedona-db:mainfrom
jiayuasu:feature/expr-operatorsjiayuasu/sedona-db:feature/expr-operatorsCopy head branch name to clipboard

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 8, 2026

Adds operator overloading on sedonadb.expr.Expr, building on the foundation that landed in #807.

This is the second of four small stacked PRs implementing Phase P1 of #791.

What's new

  • Binary operators: +, -, *, /, ==, !=, <, <=, >, >=, & (AND), | (OR), with reflected variants so 1 - col("x") works the same as col("x") - 1.
  • Unary operators: - (arithmetic negation, via negate()) and ~ (logical NOT).
  • Auto-coercion of Python scalars: col("x") > 5 works without an explicit literal wrap; the scalar is routed through the existing _to_expr() path internally.
  • Expr.__hash__ = None so use as a dict key or set member fails clearly — __eq__ returns an Expr, not a bool.

Implementation notes

Operator dispatch is centralised: every Python dunder routes through a single _binary(op, lhs, rhs) helper, which calls into a single Rust factory expr_binary(op, lhs, rhs) that maps the string opcode to a datafusion_expr::Operator. Adding a new operator is one Rust match arm plus one Python dunder. Mirrors the pattern in the R bindings (SedonaDBExprFactory::binary).

~ (logical NOT) goes through a separate expr_not factory because DataFusion models it as Expr::Not rather than a binary operator.

Tests

  • 18 new tests covering arithmetic, comparison, boolean, reflected, unary, chained, expr-with-expr, and __hash__ = None behaviour. All assertions are exact repr() == ... per the test module's pinning policy.
  • While here, tightens six inherited foundation tests from substring to exact equality so the module is consistent end-to-end.

Test plan

  • 33/33 tests in tests/expr/test_expression.py pass locally.
  • No regressions in existing tests/test_dataframe.py.
  • CI green.

Wire arithmetic, comparison, and boolean Python operators to the
underlying DataFusion Expr. Composition with plain Python scalars
auto-coerces the scalar through `_to_expr()`, so `col("x") > 5` works
without an explicit literal wrap.

- Binary: `+`, `-`, `*`, `/`, `==`, `!=`, `<`, `<=`, `>`, `>=`,
  `&` (AND), `|` (OR), with reflected variants so `1 - col("x")`
  also produces an Expr.
- Unary: `-` (negation, via `negate()`) and `~` (logical NOT).
- `Expr.__hash__` is set to `None` to make the unhashable contract
  explicit — `__eq__` returns an Expr rather than a bool, so using
  Expr as a dict key or set member would otherwise produce a
  surprising error.

Operator dispatch is centralised: every dunder routes through a
single Python `_binary` helper, which in turn calls one Rust
factory `expr_binary(op, lhs, rhs)`. Adding an operator is one
match arm + one dunder.

The class docstring is updated to describe the now-shipped
operator coercion (it previously called out coercion only inside
methods like `isin`).

While here, tightens six inherited foundation tests from substring
to exact `repr() == ...` assertions so the entire test module is
consistent in pinning DataFusion's Display output.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Python expression layer (sedonadb.expr.Expr) with operator overloading so users can compose DataFusion expressions using native Python operators (arithmetic, comparisons, boolean composition, unary ops). It builds on the existing Expr foundation by centralizing binary-operator construction through a single Rust factory and adding Python-side coercion of scalars into literal expressions.

Changes:

  • Add Expr dunder methods for arithmetic/comparison/boolean ops (including reflected variants), unary -, and ~ (NOT), with scalar auto-coercion via _to_expr().
  • Add Rust _lib factories expr_binary(op, lhs, rhs) and expr_not(expr) and export them from the PyO3 module.
  • Expand and tighten expression repr tests to pin exact rendering for the new operators and some existing behaviors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
python/sedonadb/python/sedonadb/expr/expression.py Implements operator overloads and centralized _binary() helper; wires to Rust factories; sets __hash__ = None.
python/sedonadb/src/expr.rs Adds expr_binary (operator string → DataFusion Operator) and expr_not factories.
python/sedonadb/src/lib.rs Exposes expr_binary and expr_not in the _lib PyO3 module.
python/sedonadb/tests/expr/test_expression.py Adds operator overload test coverage and pins several existing repr assertions to exact equality.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#
# `&` / `|` / `~` rather than `and` / `or` / `not` because Python does
# not allow overloading the keyword forms — they always coerce to bool.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 60428b2Expr.__bool__ and Expr.__len__ now both raise TypeError with guidance toward &/|/~ and DataFrame.filter(). Six new tests cover bool(), if, and/or, not, and len().

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Copilot suggestion is good here but other than that this looks great!

Without these guards, `if col("x") > 0: ...`, `col("x") and col("y")`,
and `not col("x").is_null()` would all silently coerce Exprs through
Python's default truthiness and either always take the truth branch
or drop one side of an `and`/`or` short-circuit. This is the same
footgun pandas/polars/pyspark/ibis all guard against — operator
overloads return Expr objects that look like booleans, so users
naturally write `if comparison: ...` and have it silently no-op.

Now both `__bool__` and `__len__` raise `TypeError` with messages
pointing users at `&`/`|`/`~` for boolean composition and at
`DataFrame.filter()` / `count()` for evaluation.

Six new tests exercise direct `bool()`, `if`/`and`/`or`/`not`, and
`len()` paths.
@jiayuasu jiayuasu marked this pull request as ready for review May 8, 2026 21:33
@jiayuasu jiayuasu merged commit 27c88a8 into apache:main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.