Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Performance Improvement: Replace Py4J-based Implementation with Native PyArrow #49

Copy link
Copy link
@chenghuichen

Description

@chenghuichen
Issue body actions

Current Implementation and Issues

Currently, paimon-python leverages Py4J to reuse Java's read/write capabilities, with data serialization between Java and Python processes handled through ArrowUtils.serializeToIpc. This implementation has several performance bottlenecks:

  1. Process Communication Overhead: The Py4J bridge requires inter-process communication (IPC) between Java and Python processes, introducing significant latency.
  2. Serialization/Deserialization Cost: Each data transfer requires serialization to Arrow IPC format and subsequent deserialization, which is computationally expensive.
  3. Memory Management Complexity: The current implementation requires careful management of memory allocators and resources across process boundaries.

Proposed Solution

We propose to refactor paimon-python to use native PyArrow implementations for read/write operations. This would:

  1. Eliminate Process Communication: Remove the need for Py4J bridge and IPC, allowing direct memory access.
  2. Reduce Serialization Overhead: Enable zero-copy data transfer between Python and native code.
  3. Simplify Memory Management: Leverage PyArrow's built-in memory management capabilities.
Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.